Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned

  • A chance to test your rules of thumb for quickly evaluating clinical trials of alternative or integrative  medicine in prestigious journals.
  • A chance to increase your understanding of the importance of  well-defined control groups and blinding in evaluating the risk of bias of clinical trials.
  • A chance to understand the difference between merely evidence-based treatments versus science-based treatments.
  • Lessons learned can be readily applied to many wasteful evaluations of psychotherapy with shared characteristics.

A press release from the University of Michigan about a study of acupressure for fatigue in cancer patients was churnaled  – echoed – throughout the media. It was reproduced dozens of times, with little more than an editor’s title change from one report to the next.

Fortunately, the article that inspired all the fuss was freely available from the prestigious JAMA: Oncology. But when I gained access, I quickly saw that it was not worth my attention, based on what I already knew or, as I often say, my prior probabilities. Rules of thumb is a good enough term.

So the article became another occasion for us to practice our critical appraisal skills, including, importantly, being able to make reliable and valid judgments that some attention in the media is worth dismissing out of hand, even when tied to an article in a prestigious medical journal.

The press release is here: Acupressure reduced fatigue in breast cancer survivors: Relaxing acupressure improved sleep, quality of life.

A sampling of the coverage:

sample coverage

As we’ve come to expect, the UK Daily Mail editor added its own bit of spin:

daily mailHere is the article:

Zick SM, Sen A, Wyatt GK, Murphy SL, Arnedt J, Harris RE. Investigation of 2 Types of Self-administered Acupressure for Persistent Cancer-Related Fatigue in Breast Cancer Survivors: A Randomized Clinical Trial. JAMA Oncol. Published online July 07, 2016. doi:10.1001/jamaoncol.2016.1867.

Here is the Trial registration:

All I needed to know was contained in a succinct summary at the Journal website:

key points

This is a randomized clinical trial (RCT) in which two active treatments that

  • Lacked credible scientific mechanisms
  • Were predictably shown to be better than
  • A routine care that lacked the positive expectations and support.
  • A primary outcome assessed by  subjectiveself-report amplified the illusory effectiveness of the treatments.

But wait!

The original research appeared in a prestigious peer-reviewed journal published by the American Medical Association, not a  disreputable journal on Beall’s List of Predatory Publishers.

Maybe  this means publication in a peer-reviewed prestigious journal is insufficient to erase our doubts about the validity of claims.

The original research was performed with a $2.65 million peer-reviewed grant from the National Cancer Institute.

Maybe NIH is wasting scarce money on useless research.

What is acupressure?

 According to the article

Acupressure, a method derived from traditional Chinese medicine (TCM), is a treatment in which pressure is applied with fingers, thumbs, or a device to acupoints on the body. Acupressure has shown promise for treating fatigue in patients with cancer,23 and in a study24 of 43 cancer survivors with persistent fatigue, our group found that acupressure decreased fatigue by approximately 45% to 70%. Furthermore, acupressure points termed relaxing (for their use in TCM to treat insomnia) were significantly better at improving fatigue than another distinct set of acupressure points termed stimulating (used in TCM to increase energy).24 Despite such promise, only 5 small studies24– 28 have examined the effect of acupressure for cancer fatigue.

290px-Acupuncture_point_Hegu_(LI_4)You can learn more about acupressure here. It is a derivative of acupuncture, that does not involve needles, but the same acupuncture pressure points or acupoints as acupuncture.

Don’t be fooled by references to traditional Chinese medicine (TCM) as a basis for claiming a scientific mechanism.

See Chairman Mao Invented Traditional Chinese Medicine.

Chairman Mao is quoted as saying “Even though I believe we should promote Chinese medicine, I personally do not believe in it. I don’t take Chinese medicine.”

 

Alan Levinovitz, author of the Slate article further argues:

 

In truth, skepticism, empiricism, and logic are not uniquely Western, and we should feel free to apply them to Chinese medicine.

After all, that’s what Wang Qingren did during the Qing Dynasty when he wrote Correcting the Errors of Medical Literature. Wang’s work on the book began in 1797, when an epidemic broke out in his town and killed hundreds of children. The children were buried in shallow graves in a public cemetery, allowing stray dogs to dig them up and devour them, a custom thought to protect the next child in the family from premature death. On daily walks past the graveyard, Wang systematically studied the anatomy of the children’s corpses, discovering significant differences between what he saw and the content of Chinese classics.

And nearly 2,000 years ago, the philosopher Wang Chong mounted a devastating (and hilarious) critique of yin-yang five phases theory: “The horse is connected with wu (fire), the rat with zi (water). If water really conquers fire, [it would be much more convincing if] rats normally attacked horses and drove them away. Then the cock is connected with ya (metal) and the hare with mao (wood). If metal really conquers wood, why do cocks not devour hares?” (The translation of Wang Chong and the account of Wang Qingren come from Paul Unschuld’s Medicine in China: A History of Ideas.)

Trial design

A 10-week randomized, single-blind trial comparing self-administered relaxing acupressure with stimulating acupressure once daily for 6 weeks vs usual care with a 4-week follow-up was conducted. There were 5 research visits: at screening, baseline, 3 weeks, 6 weeks (end of treatment), and 10 weeks (end of washout phase). The Pittsburgh Sleep Quality Index (PSQI) and Long-Term Quality of Life Instrument (LTQL) were administered at baseline and weeks 6 and 10. The Brief Fatigue Inventory (BFI) score was collected at baseline and weeks 1 through 10.

Note that the trial was “single-blind.” It compared two forms of acupressure, relaxing versus stimulating. Only the patient was blinded to which of these two treatments was being provided, except patients clearly knew whether or not they were randomized to usual care. The providers were not blinded and were carefully supervised by the investigators and provided feedback on their performance.

The combination of providers not being blinded, patients knowing whether they were randomized to routine care, and subjective self-report outcomes together are the makings of a highly biased trial.

Interventions

Usual care was defined as any treatment women were receiving from health care professionals for fatigue. At baseline, women were taught to self-administer acupressure by a trained acupressure educator.29 The 13 acupressure educators were taught by one of the study’s principal investigators (R.E.H.), an acupuncturist with National Certification Commission for Acupuncture and Oriental Medicine training. This training included a 30-minute session in which educators were taught point location, stimulation techniques, and pressure intensity.

Relaxing acupressure points consisted of yin tang, anmian, heart 7, spleen 6, and liver 3. Four acupoints were performed bilaterally, with yin tang done centrally. Stimulating acupressure points consisted of du 20, conception vessel 6, large intestine 4, stomach 36, spleen 6, and kidney 3. Points were administered bilaterally except for du 20 and conception vessel 6, which were done centrally (eFigure in Supplement 2). Women were told to perform acupressure once per day and to stimulate each point in a circular motion for 3 minutes.

Note that the control/comparison condition was an ill-defined usual care in which it is not clear that patients received any attention and support for their fatigue. As I have discussed before, we need to ask just what was being controlled by this condition. There is no evidence presented that patients had similar positive expectations and felt similar support in this condition to what was provided in the two active treatment conditions. There is no evidence of equivalence of time with a provider devoted exclusively to the patients’ fatigue. Unlike patients assigned to usual care, patients assigned to one of the acupressure conditions received a ritual delivered with enthusiasm by a supervised educator.

Note the absurdity of the  naming of the acupressure points,  for which the authority of traditional Chinese medicine is invoked, not evidence. This absurdity is reinforced by a look at a diagram of acupressure points provided as a supplement to the article.

relaxation acupuncture pointsstimulation acupressure points

 

Among the many problems with “acupuncture pressure points” is that sham stimulation generally works as well as actual stimulation, especially when the sham is delivered with appropriate blinding of both providers and patients. Another is that targeting places of the body that are not defined as acupuncture pressure points can produce the same results. For more elaborate discussion see Can we finally just say that acupuncture is nothing more than an elaborate placebo?

 Worth looking back at credible placebo versus weak control condition

In a recent blog post   I discussed an unusual study in the New England Journal of Medicine  that compared an established active treatment for asthma to two credible control conditions, one, an inert spray that was indistinguishable from the active treatment and the other, acupuncture. Additionally, the study involved a no-treatment control. For subjective self-report outcomes, the active treatment, the inert spray and acupuncture were indistinguishable, but all were superior to the no treatment control condition. However, for the objective outcome measure, the active treatment was more effective than all of the three comparison conditions. The message is that credible placebo control conditions are superior to control conditions lacking and positive expectations, including no treatment and, I would argue, ill-defined usual care that lacks positive expectations. A further message is ‘beware of relying on subjective self-report measures to distinguish between active treatments and placebo control conditions’.

Results

At week 6, the change in BFI score from baseline was significantly greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.6 [1.5] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.1 [1.6] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P  = .29). At week 10, the change in BFI score from baseline was greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.3 [1.4] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.0 [1.5] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P > .99) (Figure 2). The mean percentage fatigue reductions at 6 weeks were 34%, 27%, and −1% in relaxing acupressure, stimulating acupressure, and usual care, respectively.

These are entirely expectable results. Nothing new was learned in this study.

The bottom line for this study is that there was absolutely nothing to be gained by comparing an inert placebo condition to another inert placebo condition to an uninformative condition without clear evidence the control condition offered control of nonspecific factors – positive expectations, support, and attention. This was a waste of patient time and effort, as well as government funds, and produced results that were potentially misleading to patients. Namely, results are likely to be misinterpreted the acupressure is an effective, evidence-based treatment for cancer-related fatigue.

How the authors explained their results

Why might both acupressure arms significantly improve fatigue? In our group’s previous work, we had seen that cancer fatigue may arise through multiple distinct mechanisms.15 Similarly, it is also known in the acupuncture literature that true and sham acupuncture can improve symptoms equally, but they appear to work via different mechanisms.40 Therefore, relaxing acupressure and stimulating acupressure could elicit improvements in symptoms through distinct mechanisms, including both specific and nonspecific effects. These results are also consistent with TCM theory for these 2 acupoint formulas, whereby the relaxing acupressure acupoints were selected to treat insomnia by providing more restorative sleep and improving fatigue and the stimulating acupressure acupoints were chosen to improve daytime activity levels by targeting alertness.

How could acupressure lead to improvements in fatigue? The etiology of persistent fatigue in cancer survivors is related to elevations in brain glutamate levels, as well as total creatine levels in the insula.15 Studies in acupuncture research have demonstrated that brain physiology,41 chemistry,42 and function43 can also be altered with acupoint stimulation. We posit that self-administered acupressure may have similar effects.

Among the fallacies of the authors’ explanation is the key assumption that they are dealing with a specific, active treatment effect rather than a nonspecific placebo intervention. Supposed differences between relaxing versus stimulating acupressure arise in trials with a high risk of bias due to unblinded providers of treatment and inadequate control/comparison conditions. ‘There is no there there’ to be explained, to paraphrase a quote attributed to Gertrude Stein

How much did this project cost?

 According to the NIH Research Portfolios Online Reporting Tools website, this five-year project involved support by the federal government of $2,265,212 in direct and indirect costs. The NCI program officer for investigator-initiated  R01CA151445 is Ann O’Marawho serves ina similar role for a number of integrative medicine projects.

How can expenditure of this money be justified for determining whether so-called stimulating acupressure is better than relaxing acupressure for cancer-related fatigue?

 Consider what could otherwise have been done with these monies.

 Evidence-based versus science based medicine

Proponents of unproven “integrative cancer treatments” can claim on the basis of the study the acupressure is an evidence-based treatment. Future Cochrane Collaboration Reviews may even cite this study as evidence for this conclusion.

I normally label myself as an evidence-based skeptic. I require evidence for claims of the efficacy of treatments and am skeptical of the quality of the evidence that is typically provided, especially when it comes from enthusiasts of particular treatments. However, in other contexts, I describe myself as a science based medicine skeptic. The stricter criteria for this term is that not only do I require evidence of efficacy for treatments, I require evidence for the plausibility of the science-based claims of mechanism. Acupressure might be defined by some as an evidence-based treatment, but it is certainly not a science-based treatment.

For further discussion of this important distinction, see Why “Science”-Based Instead of “Evidence”-Based?

Broader relevance to psychotherapy research

The efficacy of psychotherapy is often overestimated because of overreliance on RCTs that involve inadequate comparison/control groups. Adequately powered studies of the comparative efficacy of psychotherapy that include active comparison/control groups are infrequent and uniformly provide lower estimates of just how efficacious psychotherapy is. Most psychotherapy research includes subjective patient self-report measures as the primary outcomes, although some RCTs provide independent, blinded interview measures. A dependence on subjective patient self-report measures amplifies the bias associated with inadequate comparison/control groups.

I have raised these issues with respect to mindfulness-based stress reduction (MBSR) for physical health problems  and for prevention of relapse in recurrence in patients being tapered from antidepressants .

However, there is a broader relevance to trials of psychotherapy provided to medically ill patients with a comparison/control condition that is inadequate in terms of positive expectations and support, along with a reliance on subjective patient self-report outcomes. The relevance is particularly important to note for conditions in which objective measures are appropriate, but not obtained, or obtained but suppressed in reports of the trial in the literature.

Study protocol violations, outcomes switching, adverse events misreporting: A peek under the hood

An extraordinary, must-read article is now available open access:

Jureidini, JN, Amsterdam, JD, McHenry, LB. The citalopram CIT-MD-18 pediatric depression trial: Deconstruction of medical ghostwriting, data mischaracterisation and academic malfeasance. International Journal of Risk & Safety in Medicine, vol. 28, no. 1, pp. 33-43, 2016

The authors had access to internal documents written with the belief that they would be left buried in corporate files. However, these documents became publicly available in a class-action product liability suit concerning the marketing of the antidepressant citalopram for treating children and adolescents.

Detailed evidence of ghost writing by industry sponsors has considerable shock value. But there is a broader usefulness to this article allowing us to peek in on the usually hidden processes by which null findings and substantial adverse events are spun into a positive report of the efficacy and safety of a treatment.

another peeking under the hoodWe are able to see behind the scenes how an already underspecified protocol was violated, primary and secondary outcomes were switched or dropped, and adverse events were suppressed in order to obtain the kind of results needed for a planned promotional effort and the FDA approval for use of the drug in these populations.

We can see how subtle changes in analyses that would otherwise go unnoticed can have a profound impact on clinical and public policy.

In so many other situations, we are left only with our skepticism about results being too good to be true. We are usually unable to evaluate independently investigators’ claims because protocols are unavailable, deviations are not noted, analyses are conducted and reported without transparency. Importantly, there usually is no access to data that would be necessary for reanalysis.

ghostwriter_badThe authors whose work is being criticized are among the most prestigious child psychiatrists in the world. The first author is currently President-elect of the American Academy of Child and Adolescent Psychiatry. The journal is among the top psychiatry journals in the world. A subscription is provided as part of membership in the American Psychiatric Association. Appearing in this journal is thus strategic because its readership includes many practitioners and clinicians who will simply defer to academics publishing in a journal they respect, without inclination to look carefully.

Indeed, I encourage readers to go to the original article and read it before proceeding further in the blog. Witness the unmasking of how null findings were turned positive. Unless you had been alerted, would you have detected that something was amiss?

Some readers have participated in multisite trials other than as a lead investigator.  I ask them to imagine that they had had received the manuscript for review and approval and assumed it was vetted by the senior investigators – and only the senior investigators.  Would they have subjected it to the scrutiny needed to detect data manipulation?

I similarly ask reviewers for scientific journals if they would have detected something amiss. Would they have compared the manuscript to the study protocol? Note that when this article was published, they probably would’ve had to contact the authors or the pharmaceutical company.

Welcome to a rich treasure trove

Separate from the civil action that led to these documents and data being released, the federal government later filed criminal charges and false claims act allegations against Forest Laboratories. The pharmaceutical company pleaded guilty and accepted a $313 million fine.

Links to the filing and the announcement from the federal government of a settlement is available in a supplementary blog at Quick Thoughts. That blog post also has rich links to the actual emails accessed by the authors, as well as blog posts by John M Nardo, M.D. that detail the difficulties these authors had publishing the paper we are discussing.

Aside from his popular blog, Dr. Nardo is one of the authors of a reanalysis that was published in The BMJ of a related trial:

Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, Abi-Jaoude E. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ 2015; 351: h4320

My supplementary blog post contains links to discussions of that reanalysis obtained from GlaxoSmithKline, the original publication based on these data, 30 Rapid Responses to the reanalysis The BMJ, as well as federal criminal complaints and the guilty pleading of GlaxoSmithKline.

With Dr. Nardo’s assistance, I’ve assembled a full set of materials that should be valuable in stimulating discussion among senior and junior investigators, as well in student seminars. I agree with Dr. Nardo’s assessment:

I think it’s now our job to insure that all this dedicated work is rewarded with a wide readership, one that helps us move closer to putting this tawdry era behind us…John Mickey Nardo

The citalopram CIT-MD-18 pediatric depression trial

The original article that we will be discussing is:

Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. American Journal of Psychiatry. 2004 Jun 1;161(6):1079-83.

It reports:

An 8-week, randomized, double-blind, placebo-controlled study compared the safety and efficacy of citalopram with placebo in the treatment of children (ages 7–11) and adolescents (ages 12–17) with major depressive disorder.

The results and conclusion:

Results: The overall mean citalopram dose was approximately 24 mg/day. Mean Children’s Depression Rating Scale—Revised scores decreased significantly more from baseline in the citalopram treatment group than in the placebo treatment group, beginning at week 1 and continuing at every observation point to the end of the study (effect size=2.9). The difference in response rate at week 8 between placebo (24%) and citalopram (36%) also was statistically significant. Citalopram treatment was well tolerated. Rates of discontinuation due to adverse events were comparable in the placebo and citalopram groups (5.9% versus 5.6%, respectively). Rhinitis, nausea, and abdominal pain were the only adverse events to occur with a frequency exceeding 10% in either treatment group.

Conclusions: In this population of children and adolescents, treatment with citalopram reduced depressive symptoms to a significantly greater extent than placebo treatment and was well tolerated.

The article ends with an elaboration of what is said in the abstract:

In conclusion, citalopram treatment significantly improved depressive symptoms compared with placebo within 1 week in this population of children and adolescents. No serious adverse events were reported, and the rate of discontinuation due to adverse events among the citalopram-treated patients was comparable to that of placebo. These findings further support the use of citalopram in children and adolescents suffering from major depression.

The study protocol

The protocol for CIT-MD-I8, IND Number 22,368 was obtained from Forest Laboratories. It was dated September 1, 1999 and amended April 8, 2002.

The primary outcome measure was the change from baseline to week 8 on the Children’s Depression Rating Scale-Revised (CDRS-R) total score.

Comparison between citalopram and placebo will be performed using three-way analysis of covariance (ANCOVA) with age group, treatment group and center as the three factors, and the baseline CDRS-R score as covariate.

The secondary outcome measures were the Clinical Global Impression severity and improvement subscales, Kiddie Schedule for Affective Disorders and Schizophrenia – depression module, and Children’s Global Assessment Scale.

Comparison between citalopram and placebo will be performed using the same approach as for the primary efficacy parameter. Two-way ANOVA will be used for CGI-I, since improvement relative to Baseline is inherent in the score.

 There was no formal power analysis but:

The primary efficacy variable is the change from baseline in CDRS-R score at Week 8.

Assuming an effect size (treatment group difference relative to pooled standard deviation) of 0.5, a sample size of 80 patients in each treatment group will provide at least 85% power at an alpha level of 0.05 (two-sided).

The deconstruction

 Selective reporting of subtle departures from the protocol could easily have been missed or simply excused as accidental and inconsequential, except that there was unrestricted access to communication within Forest Laboratories and to the data for reanalysis.

3.2 Data

The fact that Forest controlled the CIT-MD-18 manuscript production allowed for selection of efficacy results to create a favourable impression. The published Wagner et al. article concluded that citalopram produced a significantly greater reduction in depressive symptoms than placebo in this population of children and adolescents [10]. This conclusion was supported by claims that citalopram reduced the mean CDRS-R scores significantly more than placebo beginning at week 1 and at every week thereafter (effect size = 2.9); and that response rates at week 8 were significantly greater for citalopram (36% ) versus placebo (24% ). It was also claimed that there were comparable rates of tolerability and treatment discontinuation for adverse events (citalopram = 5.6% ; placebo = 5.9% ). Our analysis of these data and documents has led us to conclude that these claims were based on a combination of: misleading analysis of the primary outcome and implausible calculation of effect size; introduction of post hoc measures and failure to report negative secondary outcomes; and misleading analysis and reporting of adverse events.

3.2.1 Mischaracterisation of primary outcome

Contrary to the protocol, Forest’s final study report synopsis increased the study sample size by adding eight of nine subjects who, per protocol, should have been excluded because they were inadvertently dispensed unblinded study drug due to a packaging error [23]. The protocol stipulated: “Any patient for whom the blind has been broken will immediately be discontinued from the study and no further efficacy evaluations will be performed” [10]. Appendix Table 6 of the CIT-MD-18 Study Report [24] showed that Forest had performed a primary outcome calculation excluding these subjects (see our Fig. 2). This per protocol exclusion resulted in a ‘negative’ primary efficacy outcome.

Ultimately however, eight of the excluded subjects were added back into the analysis, turning the (albeit marginally) statistically insignificant outcome (p <  0.052) into a statistically significant outcome (p  <  0.038). Despite this change, there was still no clinically meaningful difference in symptom reduction between citalopram and placebo on the mean CDRS-R scores (Fig. 3).

The unblinding error was not reported in the published article.

Forest also failed to follow their protocol stipulated plan for analysis of age-by-treatment interaction. The primary outcome variable was the change in total CDRS-R score at week 8 for the entire citalopram versus placebo group, using a 3-way ANCOVA test of efficacy [24]. Although a significant efficacy value favouring citalopram was produced after including the unblinded subjects in the ANCOVA, this analysis resulted in an age-by-treatment interaction with no significant efficacy demonstrated in children. This important efficacy information was withheld from public scrutiny and was not presented in the published article. Nor did the published article report the power analysis used to determine the sample size, and no adequate description of this analysis was available in either the study protocol or the study report. Moreover, no indication was made in these study documents as to whether Forest originally intended to examine citalopram efficacy in children and adolescent subgroups separately or whether the study was powered to show citalopram efficacy in these subgroups. If so, then it would appear that Forest could not make a claim for efficacy in children (and possibly not even in adolescents). However, if Forest powered the study to make a claim for efficacy in the combined child plus adolescent group, this may have been invalidated as a result of the ANCOVA age-by-treatment interaction and would have shown that citalopram was not effective in children.

A further exaggeration of the effect of citalopram was to report “effect size on the primary outcome measure” of 2.9, which was extraordinary and not consistent with the primary data. This claim was questioned by Martin et al. who criticized the article for miscalculating effect size or using an unconventional calculation, which clouded “communication among investigators and across measures” [25]. The origin of the effect size calculation remained unclear even after Wagner et al. publicly acknowledged an error and stated that “With Cohens method, the effect size was 0.32,” [20] which is more typical of antidepressant trials. Moreover, we note that there was no reference to the calculation of effect size in the study protocol.

3.2.2 Failure to publish negative secondary outcomes, and undeclared inclusion of Post Hoc Outcomes

Wagner et al. failed to publish two of the protocol-specified secondary outcomes, both of which were unfavourable to citalopram. While CGI-S and CGI-I were correctly reported in the published article as negative [10], (see p1081), the Kiddie Schedule for Affective Disorders and Schizophrenia-Present (depression module) and the Children’s Global Assessment Scale (CGAS) were not reported in either the methods or results sections of the published article.

In our view, the omission of secondary outcomes was no accident. On October 15, 2001, Ms. Prescott wrote: “Ive heard through the grapevine that not all the data look as great as the primary outcome data. For these reasons (speed and greater control) I think it makes sense to prepare a draft in-house that can then be provided to Karen Wagner (or whomever) for review and comments” (see Fig. 1). Subsequently, Forest’s Dr. Heydorn wrote on April 17, 2002: “The publications committee discussed target journals, and recommended that the paper be submitted to the American Journal of Psychiatry as a Brief Report. The rationale for this was the following: … As a Brief Report, we feel we can avoid mentioning the lack of statistically significant positive effects at week 8 or study termination for secondary endpoints” [13].

Instead the writers presented post hoc statistically positive results that were not part of the original study protocol or its amendment (visit-by-visit comparison of CDRS-R scores, and ‘Response’, defined as a score of ≤28 on the CDRS-R) as though they were protocol-specified outcomes. For example, ‘Response’ was reported in the results section of the Wagner et al. article between the primary and secondary outcomes, likely predisposing a reader to regard it as more important than the selected secondary measures reported, or even to mistake it for a primary measure.

It is difficult to reconcile what the authors of the original article reported in terms of adverse events and what our “deconstructionists “ found in the unpublished final study report. The deconstruction article also notes that a letter to the editor appearing at the time of publication of the original paper called attention to another citalopram study that remain unpublished, but that was known to be a null study with substantial adverse events.

3.2.3 Mischaracterisation of adverse events

Although Wagner et al. correctly reported that “the rate of discontinuation due to adverse events among citalopram-treated patients was comparable to that of placebo”, the authors failed to mention that the five citalopram-treated subjects discontinuing treatment did so due to one case of hypomania, two of agitation, and one of akathisia. None of these potentially dangerous states of over-arousal occurred with placebo [23]. Furthermore, anxiety occurred in one citalopram patient (and none on placebo) of sufficient severity to temporarily stop the drug and irritability occurred in three citalopram (compared to one placebo). Taken together, these adverse events raise concerns about dangers from the activating effects of citalopram that should have been reported and discussed. Instead Wagner et al. reported “adverse events associated with behavioral activation (such as insomnia or agitation) were not prevalent in this trial” [10] and claimed thatthere were no reports of mania”, without acknowledging the case of hypomania [10].

Furthermore, examination of the final study report revealed that there were many more gastrointestinal adverse events for citalopram than placebo patients. However, Wagner et al. grouped the adverse event data in a way that in effect masked this possibly clinically significantly gastrointestinal intolerance. Finally, the published article also failed to report that one patient on citalopram developed abnormal liver function tests [24].

In a letter to the editor of the American Journal of Psychiatry, Mathews et al. also criticized the manner in which Wagner et al. dealt with adverse outcomes in the CIT-MD-18 data, stating that: “given the recent concerns about the risk of suicidal thoughts and behaviors in children treated with SSRIs, this study could have attempted to shed additional light on the subject” [26] Wagner et al. responded: “At the time the [CIT-MD-18] manuscript was developed, reviewed, and revised, it was not considered necessary to comment further on this topic” [20]. However, concerns about suicidal risk were prevalent before the Wagner et al. article was written and published [27]. In fact, undisclosed in both the published article and Wagner’s letter-to-the-editor, the 2001 negative Lundbeck study had raised concern over heightened suicide risk [10, 20, 21].

A later blog post will discuss the letters to the editor that appeared shortly after the original study in American Journal of Psychiatry. But for now, it would be useful to clarify the status of the negative Lundbeck study at that time.

The letter by Barbe published in AJP  remarked:

It is somewhat surprising that the authors do not compare their results with those of another trial, involving 244 adolescents (13–18-year-olds), that showed no evidence of efficacy of citalopram compared to placebo and a higher level of self-harm (16 [12.9%] of 124 versus nine [7.5%] of 120) in the citalopram group compared to the placebo group (5). Although these data were not available to the public until December 2003, one would expect that the authors, some of whom are employed by the company that produces citalopram in the United States and financed the study, had access to this information. It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

The study authors replied:

It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

Conflict of interest

The authors of the deconstruction study indicate they do not have any conventional industry or speaker’s bureau support to declare, but they have had relevant involvement in litigation. Their disclosure includes:

The authors are not members of any industry-sponsored advisory board or speaker’s bureau, and have no financial interest in any pharmaceutical or medical device company.

Drs. Amsterdam and Jureidini were engaged by Baum, Hedlund, Aristei & Goldman as experts in the Celexa and Lexapro Marketing and Sales Practices Litigation. Dr. McHenry was also engaged as a research consultant in the case. Dr. McHenry is a research consultant for Baum, Hedlund, Aristei & Goldman.

Concluding remarks

I don’t have many illusions about the trustworthiness of the literature reporting clinical trials, whether pharmaceutical or psychotherapy. But I found this deconstruction article quite troubling. Among the authors’ closing observations are:

The research literature on the effectiveness and safety of antidepressants for children and adolescents is relatively small, and therefore vulnerable to distortion by just one or a two badly conducted and/or reported studies. Prescribing rates are high and increasing, so that prescribers who are misinformed by misleading publications risk doing real harm to many children, and wasting valuable health resources.

I recommend readers going to my supplementary blog and reviewing a very similar case of efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. I also recommend another of my blog posts  that summarizes action taken by the US government against both Forest Laboratories and GlaxoSmithKline for promotion of misleading claims about about the efficacy and safety of antidepressants for children and adolescents.

We should scrutinize studies of the efficacy and safety of antidepressants for children and adolescents, because of the weakness of data from relatively small studies with serious difficulties in their methodology and reporting. But we should certainly not stop there. We should critically examine other studies of psychotherapy and psychosocial interventions.

I previously documented [ 1,  2] interference by promoters of the lucrative Triple P Parenting in the implementation of a supposedly independent evaluation of it, including tampering with plans for data analysis. The promoters then followed it up attempting to block publication of a meta-analysis casting doubt on their claims.

But  suppose we are not dealing the threat of conflict of interest associated with high financial stakes as an pharmaceutical companies or a globally promoted psychosocial program. There are still the less clear conflicts associated with investigator egos and the pressures to produce positive results in order to get refunded.  We should require scrutiny of protocols, whether they were faithfully implemented, with the resulting data analyzed according to a priori plans. To do that, we need unrestricted access to data and the opportunity to reanalyze it from multiple perspectives.

Results of clinical trials should be examined wherever possible in replications and extensions in new settings. But this frequently requires resources that are unlikely to be available

We are unlikely ever to see anything for clinical trials resembling the replication initiatives such as the Open Science Collaboration’s (OSC) Replication Project: Psychology. The OSC depends on mass replications involving either samples of college students or recruitment from the Internet. Most of the studies involved in the OSC did not have direct clinical or public health implications. In contrast, clinical trials usually do and require different approaches to insure the trustworthiness of findings that are claimed.

Access to the internal documents of Forest Laboratories revealed a deliberate, concerted effort to produce results consistent with the agenda of vested interests, even where prespecified analyses yielded contradictory findings. There was clear intent. But we don’t need to assume an attempt to deceive and defraud in order to insist on the opportunity to re-examine findings that affect patients and public health. As US Vice President Joseph Biden recently declared, securing advances in biomedicine and public health depends on broad and routine sharing and re-analysis of data.

My usual disclaimer: All views that I express are my own and do not necessarily reflect those of PLOS or other institutional affiliations.

Remission of suicidal ideation by magnetic seizure therapy? Neuro-nonsense in JAMA: Psychiatry

A recent article in JAMA: Psychiatry:

Sun Y, Farzan F, Mulsant BH, Rajji TK, Fitzgerald PB, Barr MS, Downar J, Wong W, Blumberger DM, Daskalakis ZJ. Indicators for remission of suicidal ideation following magnetic seizure therapy in patients with treatment-resistant depression. JAMA Psychiatry. 2016 Mar 16.

Was accompanied by an editorial commentary:

Camprodon JA, Pascual-Leone A. Multimodal Applications of Transcranial Magnetic Stimulation for Circuit-Based Psychiatry. JAMA: Psychiatry. 2016 Mar 16.

Together both the article and commentary can be studied as:

  • An effort by the authors and the journal itself to promote prematurely a treatment for reducing suicide.
  • A pay back to sources of financial support for the authors. Both groups have industry ties that provide them with consulting fees, equipment, grants, and other unspecified rewards. One author has a patent that should increase in value as result of this article and commentary.
  • A bid for successful applications to new grant initiatives with a pledge of allegiance to the NIMH Research Domain Criteria (RDoC).

After considering just how bad the science and reporting:

We have sufficient reason to ask how did this promotional campaign come about? Why was this article accepted by JAMA:Psychiatry? Why was it deemed worthy of comment?

I think a skeptical look at this article would lead to a warning label:

exclamation pointWarning: Results reported in this article are neither robust nor trustworthy, but considerable effort has gone into promoting them as innovative and even breakthrough. Skepticism warranted.

As we will see, the article is seriously flawed as a contribution to neuroscience, identification of biomarkers, treatment development, and suicidology, but we can nonetheless learn a lot from it in terms of how to detect such flaws when they are more subtle. If nothing else, your skepticism will be raised about articles accompanied by commentaries in prestigious journals and you will learn tools for probing such pairs of articles.

 

This article involves intimidating technical details and awe-inspiring figures.

figure 1 picture onefigure 1 picture two

 

 

 

 

 

 

 

 

 

Yet, as in some past blog posts concerning neuroscience and the NIMH RDoC, we will gloss over some technical details, which would be readily interpreted by experts. I would welcome the comments and critiques from experts.

I nonetheless expect readers to agree when they have finished this blog post that I have demonstrated that you don’t have to be an expert to detect neurononsense and crass publishing of articles that fit vested interests.

The larger trial from which these patients is registered as:

ClinicalTrials.gov. Magnetic Seizure Therapy (MST) for Treatment Resistant Depression, Schizophrenia, and Obsessive Compulsive Disorder. NCT01596608.

Because this article is strikingly lacking in crucial details or details in places where we would expect to find them, it will be useful at times to refer to the trial registration.

The title and abstract of the article

As we will soon see, the title, Indicators for remission of suicidal ideation following MST in patients with treatment-resistant depression is misleading. The article has too small sample and too inappropriate a design to establish anything as a reproducible “indicator.”

That the article is going to fail to deliver is already apparent in the abstract.

The abstract states:

 Objective  To identify a biomarker that may serve as an indicator of remission of suicidal ideation following a course of MST by using cortical inhibition measures from interleaved transcranial magnetic stimulation and electroencephalography (TMS-EEG).

Design, Setting, and Participants  Thirty-three patients with TRD were part of an open-label clinical trial of MST treatment. Data from 27 patients (82%) were available for analysis in this study. Baseline TMS-EEG measures were assessed within 1 week before the initiation of MST treatment using the TMS-EEG measures of cortical inhibition (ie, N100 and long-interval cortical inhibition [LICI]) from the left dorsolateral prefrontal cortex and the left motor cortex, with the latter acting as a control site.

Interventions The MST treatments were administered under general anesthesia, and a stimulator coil consisting of 2 individual cone-shaped coils was used.

Main Outcomes and Measures Suicidal ideation was evaluated before initiation and after completion of MST using the Scale for Suicide Ideation (SSI). Measures of cortical inhibition (ie, N100 and LICI) from the left dorsolateral prefrontal cortex were selected. N100 was quantified as the amplitude of the negative peak around 100 milliseconds in the TMS-evoked potential (TEP) after a single TMS pulse. LICI was quantified as the amount of suppression in the double-pulse TEP relative to the single-pulse TEP.

Results  Of the 27 patients included in the analyses, 15 (56%) were women; mean (SD) age of the sample was 46.0 (15.3) years. At baseline, patients had a mean SSI score of 9.0 (6.8), with 8 of 27 patients (30%) having a score of 0. After completion of MST, patients had a mean SSI score of 4.2 (6.3) (pre-post treatment mean difference, 4.8 [6.7]; paired t26 = 3.72; P = .001), and 18 of 27 individuals (67%) had a score of 0 for a remission rate of 53%. The N100 and LICI in the frontal cortex—but not in the motor cortex—were indicators of remission of suicidal ideation with 89% accuracy, 90% sensitivity, and 89% specificity (area under the curve, 0.90; P = .003).

Conclusions and Relevance  These results suggest that cortical inhibition may be used to identify patients with TRD who are most likely to experience remission of suicidal ideation following a course of MST. Stronger inhibitory neurotransmission at baseline may reflect the integrity of transsynaptic networks that are targeted by MST for optimal therapeutic response.

Even viewing the abstract alone, we can see this article is in trouble. It claims to identify a biomarker following a course of magnet seizure therapy (MST) ]. That is an extraordinary claim when a study only started with 33 patients of whom only 27 remain for analysis. Furthermore, at the initial assessment of suicidal ideation, eight of the 27 patients did not have any and so could show no benefit of treatment.

Any results could be substantially changed with any of the four excluded patients being recovered for analysis and any of the 27 included patients being dropped from analyses as an outlier. Statistical controls to control for potential confounds will produce spurious results because of overfit equations ] with even one confound. We also know well that in situation requiring control of possible confounding factors, control of only one is really sufficient and often produces worse results than leaving variables unadjusted.

Identification of any biomarkers is unlikely to be reproducible in larger more representative samples. Any claims of performance characteristics of the biomarkers (accuracy, sensitivity, specificity, area under the curve) are likely to capitalize on sampling and chance in ways that are unlikely to be reproducible.

Nonetheless, the accompanying figures are dazzling, even if not readily interpretable or representative of what would be found in another sample.

Comparison of the article to the trial registration.

According to the trial registration, the study started in February 2012 and the registration was received in May 2012. There were unspecified changes as recently as this month (March 2016), and the study is expected to and final collection of primary outcome data is in December 2016.

Primary outcome

The registration indicates that patients will have been diagnosed with severe major depression, schizophrenia or obsessive compulsive disorder. The primary outcome will depend on diagnosis. For depression it is the Hamilton Rating Scale for Depression.

There is no mention of suicidal ideation as either a primary or secondary outcome.

Secondary outcomes

According to the registration, outcomes include (1) cognitive functioning as measured by episodic memory and non-memory cognitive functions; (2) changes in neuroimaging measures of brain structure and activity derived from fMRI and MRI from baseline to 24th treatment or 12 weeks, whichever comes sooner.

Comparison to the article suggests some important neuroimaging assessment proposed in the registration were compromised. (1) only baseline measures were obtained and without MRI or fMRI; and (2) the article states

Although magnetic resonance imaging (MRI)–guided TMS-EEG is more accurate than non–MRI-guided methods, the added step of obtaining an MRI for every participant would have significantly slowed recruitment for this study owing to the pressing

need to begin treatment in acutely ill patients, many of whom were experiencing suicidal ideation. As such, we proceeded with non–MRI-guided TMS-EEG using EEG-guided methods according to a previously published study.

Treatment

magnetic seizure therapyThe article provides some details of the magnetic seizure treatment:

The MST treatments were administered under general anesthesia using a stimulator machine (MagPro MST; MagVenture) with a twin coil. Methohexital sodium (n = 14), methohexital with remifentanil hydrochloride (n = 18), and ketamine hydrochloride (n = 1) were used as the anesthetic agents. Succinylcholine chloride was used as the neuromuscular blocker. Patients had a mean (SD) seizure duration of 45.1 (21.4) seconds. The twin coil consists of 2 individual cone-shaped coils. Stimulation was delivered over the frontal cortex at the midline position directly over the electrode Fz according to the international 10-20 system.36 Placing the twin coil symmetrically over electrode Fz results in the centers of the 2 coils being over F3 and F4. Based on finite element modeling, this configuration produces a maximum induced electric field between the 2 coils, which is over electrode Fz in this case.37 Patients were treated for 24 sessions or until remission of depressive symptoms based on the 24-item Hamilton Rating Scale for Depression (HRSD) (defined as an HRSD-24 score ≤10 and 60% reduction in symptoms for at least 2 days after the last treatment).38 These remission criteria were standardized from previous ECT depression trials.39,40 Further details of the treatment protocol are available,30 and comprehensive clinical and neurophysiologic trial results will be reported separately.

The article intended to refer the reader to the trial registration for further description of treatment, but the superscript citation in the article is inaccurate. Regardless, given other deviations from registration, readers can’t tell whether any deviations from what was proposed. In in the registration, seizure therapy was described as involving:

100% machine output at between 25 and 100 Hz, with coil directed over frontal brain regions, until adequate seizure achieved. Six treatment sessions, at a frequency of two or three times per week will be administered. If subjects fail to achieve the pre-defined criteria of remission at that point, the dose will be increased to the maximal stimulator output and 3 additional treatment sessions will be provided. This will be repeated a total of 5 times (i.e., maximum treatment number is 24). 24 treatments is typically longer that a conventional ECT treatment course.

One important implication is for this treatment being proposed as resolving suicidal ideation. It takes place over a considerable period of time. Patients who die by suicide notoriously break contact before doing so. It would seem that a required 24 treatments delivered on an outpatient basis would provide ample opportunities for breaks – including demoralization because so many treatments are needed in some cases – and therefore death by suicide

But a protocol that involves continuing treatment until a prespecified reduction in the Hamilton Depression Rating Scale is achieved assures that there will be a drop in suicidal ideation. The interview-based Hamilton depression rating scales and suicidal ideation are highly correlated.

eeg-electroencephalogrphy-250x250There is no randomization or even adequate description of patient accrual in terms of the population from which the patients came. There is no control group and therefore no control for nonspecific factors. The patients are being subject to an elaborate, intrusive ritual In terms of nonspecific effects. The treatment involves patients in an elaborate ritual, starting with electroencephalographic (EEG) assessment [http://www.mayoclinic.org/tests-procedures/eeg/basics/definition/prc-20014093].

The ritual will undoubtedly will undoubtedly have strong nonspecific factors associated with it – instilling a positive expectations and considerable personal attention.

The article’s discussion of results

The discussion opens with some strong claims, unjustified by the modesty of the study and the likelihood that its specific results are not reproducible:

We found that TMS-EEG measures of cortical inhibition (ie, the N100 and LICI) in the frontal cortex, but not in the motor cortex, were strongly correlated with changes in suicidal ideation in patients with TRD who were treated with MST. These findings suggest that patients who benefitted the most from MST demonstrated the greatest cortical inhibition at baseline. More important, when patients were divided into remitters and nonremitters based on their SSI score, our results show that these measures can indicate remission of suicidal ideation from a course of MST with 90% sensitivity and 89% specificity.

Pledge of AllegianceThe discussion contains a Pledge of Allegiance to the research domain criteria approach that is not actually a reflection of the results at hand. Among the many things that we knew before the study was done and that was not shown by the study, is to suicidal ideation is so hopelessly linked to hopelessness, negative affect, and attentional biases, that in such a situation is best seen as a surrogate measure of depression, rather than a marker for risk of suicidal acts or death by suicide.

 

 

Wave that RDoC flag and maybe you will attract money from NIMH.

Our results also support the research domain criteria approach, that is, that suicidal ideation represents a homogeneous symptom construct in TRD that is targeted by MST. Suicidal ideation has been shown to be linked to hopelessness, negative affect, and attentional biases. These maladaptive behaviors all fall under the domain of negative valence systems and are associated with the specific constructs of loss, sustained threat, and frustrative nonreward. Suicidal ideation may represent a better phenotype through which to understand the neurobiologic features of mental illnesses.In this case, variations in GABAergic-mediated inhibition before MST treatment explained much of the variance for improvements in suicidal ideation across individuals with TRD.

Debunking ‘a better phenotype through which to understand the neurobiologic features of mental illnesses.’

  • Suicide is not a disorder or a symptom, but an infrequent, difficult to predict and complex act that varies greatly in nature and circumstances.
  • While some features of a brain or brain functioning may be correlated with eventual death by suicide, most identifications they provide of persons at risk to eventually die by suicide will be false positives.
  • In the United States, access to a firearm is a reliable proximal cause of suicide and is likely to be more so than anything in the brain. However, this basic observation is not consistent with American politics and can lead to grant applications not being funded.

In an important sense,

  • It’s not what’s going on in the brain, but what’s going in the interpersonal context of the brain, in terms of modifiable risk for death by suicide.

The editorial commentary

On the JAMA: Psychiatry website, both the article and the editorial commentary contain sidebar links to each other. Is only in the last two paragraphs of a 14 paragraph commentary that the target article is mentioned. However, the commentary ends with a resounding celebration of the innovation this article represents [emphasis added]:

Sun and colleagues10 report that 2 different EEG measures of cortical inhibition (a negative evoked potential in the EEG that happens approximately 100 milliseconds after a stimulus or event of interest and long-interval cortical inhibition) evoked by TMS to the left dorsolateral prefrontal cortex, but not to the left motor cortex, predicted remission of suicidal ideation with great sensitivity and specificity. This study10 illustrates the potential of multimodal TMS to study physiological properties of relevant circuits in neuropsychiatric populations. Significantly, it also highlights the anatomical specificity of these measures because the predictive value was exclusive to the inhibitory properties of prefrontal circuits but not motor systems.

Multimodal TMS applications allow us to study the physiology of human brain circuitry noninvasively and with causal resolution, expanding previous motor applications to cognitive, behavioral, and affective systems. These innovations can significantly affect psychiatry at multiple levels, by studying disease-relevant circuits to further develop systems for neuroscience models of disease and by developing tools that could be integrated into clinical practice, as they are in clinical neurophysiology clinics, to inform decision making, the differential diagnosis, or treatment planning.

Disclosures of conflicts of interest

The article’s disclosure of conflicts of interest statement is longer than the abstract.

conflict of interest disclosure

The disclosure for the conflicts of interest for the editorial commentary is much shorter but nonetheless impressive:

editorial commentary disclosures

How did this article get into JAMA: Psychiatry with an editorial comment?

Editorial commentaries are often provided by reviewers who either simply check the box on the reviewers’ form indicating their willingness to provide a comment. For reviewers who already have a conflict of interest, this provides an additional one: a non-peer-reviewed paper in which they can promote their interest.

Alternatively, commentators are simply picked by an editor who judges an article to be noteworthy of special recognition. It’s noteworthy that at least one of the associate editors of JAMA: Psychiatry is actively campaigning for a particular direction to suicide research funded by NIMH as seen in an editorial comment of his own that I recently discussed. One of the authors of this paper currently under discussion was until recently a senior member of this associate editor’s department, before departing to become Chair of the Department of Psychiatry at University of Toronto.

Essentially the authors of the paper and the authors of the commentary of providing carefully constructed advertisers for themselves and their agenda. The opportunity for them to do so is because of consistency with the agenda of at least one of the editors, if not the journal itself.

The Committee on Publication Ethics (COPE)   requires that non-peer-reviewed material in ostensibly peer reviewed journals be labeled as such. This requirement is seldom met.

The journal further promoted this article by providing 10 free continuing medical education credits for reading it.

I could go on much longer identifying other flaws in this paper and its editorial commentary. I could raise other objections to the article being published in JAMA:Psychiatry. But out of mercy for the authors, the editor, and my readers, I’ll stop here.

I would welcome comments about other flaws.

Special thanks to Bernard “Barney” Carroll for his helpful comments and encouragement, but all opinions expressed and all factual errors are my own responsibility.

Is risk of Alzheimer’s Disease reduced by taking a more positive attitude toward aging?

Unwarranted claims that “modifiable” negative beliefs cause Alzheimer’s disease lead to blaming persons who develop Alzheimer’s disease for not having been more positive.

Lesson: A source’s impressive credentials are no substitute for independent critical appraisal of what sounds like junk science and is.

More lessons on how to protect yourself from dodgy claims in press releases of prestigious universities promoting their research.

If you judge the credibility of health-related information based on the credentials of the source, this article  is a clear winner:

Levy BR, Ferrucci L, Zonderman AB, Slade MD, Troncoso J, Resnick SM. A Culture–Brain Link: Negative Age Stereotypes Predict Alzheimer’s Disease Biomarkers. Psychology and Aging. Dec 7 , 2015, No Pagination Specified. http://dx.doi.org/10.1037/pag0000062

alzheimers
From INI

As noted in the press release from Yale University, two of the authors are from Yale School of Medicine, another is a neurologist at Johns Hopkins School of Medicine, and the remaining three authors are from the US National Institute on Aging (NIA), including NIA’s Scientific Director.

The press release Negative beliefs about aging predict Alzheimer’s disease in Yale-led study declared:

“Newly published research led by the Yale School of Public Health demonstrates that                   individuals who hold negative beliefs about aging are more likely to have brain changes associated with Alzheimer’s disease.

“The study suggests that combatting negative beliefs about aging, such as elderly people are decrepit, could potentially offer a way to reduce the rapidly rising rate of Alzheimer’s disease, a devastating neurodegenerative disorder that causes dementia in more than 5 million Americans.

The press release posited a novel mechanism:

“We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

A Google search reveals over 40 stories about the study in the media. Provocative titles of the media coverage suggest a children’s game of telephone or Chinese whispers in which distortions accumulate with each retelling.

Negative beliefs about aging tied to Alzheimer’s (Waltonian)

Distain for the elderly could increase your risk of Alzheimer’s (FinancialSpots)

Lack of respect for elderly may be fueling Alzheimer’s epidemic (Telegraph)

Negative thoughts speed up onset of Alzheimer’s disease (Tech Times)

Karma bites back: Hating on the elderly may put you at risk of Alzheimer’s (LA Times)

How you feel about your grandfather may affect your brain health later in life (Men’s Health News)

Young people pessimistic about aging more likely to develop Alzheimer’s later on (Health.com)

Looking forward to old age can save you from Alzheimer’s (Canonplace News)

If you don’t like old people, you are at higher risk of Alzheimer’s, study says (RedOrbit)

If you think elderly people are icky, you’re more likely to get Alzheimer’s (HealthLine)

In defense of the authors of this article as well as journalists, it is likely that editors added the provocative titles without obtaining approval of the authors or even the journalists writing the articles. So, let’s suspend judgment and write off sometimes absurd titles to editors’ need to establish they are offering distinctive coverage, when they are not necessarily doing so. That’s a lesson for the future: if we’re going to criticize media coverage, better focus on the content of the coverage, not the titles.

However, a number of these stories have direct quotes from the study’s first author. Unless the media coverage is misattributing direct quotes to her, she must have been making herself available to the media.

Was the article such an important breakthrough offering new ways in which consumers could take control of their risk of Alzheimer’s by changing beliefs about aging?

No, not at all. In the following analysis, I’ll show that judging the credibility of claims based on the credentials of the sources can be seriously misleading.

What is troubling about this article and its well-organized publicity effort is that information is being disseminated that is misleading and potentially harmful, with the prestige of Yale and NIA attached.

Before we go any further, you can take your own look at a copy of the article in the American Psychological Association journal Psychology and Aging here, the Yale University press release here, and a fascinating post-publication peer review at PubPeer that I initiated as peer 1.

Ask yourself: if you encountered coverage of this article in the media, would you have been skeptical? If so what were the clues?

spoiler aheadcure within The article is yet another example of trusted authorities exploiting entrenched cultural beliefs about the mind-body connection being able to be harnessed in some mysterious way to combat or prevent physical illness. As Ann Harrington details in her wonderful book, The Cure Within, this psychosomatic hypothesis has a long and checkered history, and gets continually reinvented and misapplied.

We see an example of this in claims that attitude can conquer cancer. What’s the harm of such illusions? If people can be led to believe they have such control, they are set up for blame from themselves and from those around them when they fail to fend off and control the outcome of disease by sheer mental power.

The myth of “fighting spirit” overcoming cancer that has survived despite the accumulation of excellent contradictory evidence. Cancer patients are vulnerable to blaming themselves for being blamed by loved ones when they do not “win” the fight against cancer. They are also subject to unfair exhortations to fight harder as their health situation deteriorates.

onion composite
                                                        From the satirical Onion

 What I saw when I skimmed the press release and the article

  • The first alarm went off when I saw that causal claims were being made from a modest sized correlational study. This should set off anyone’s alarms.
  • The press release refers to this as a “first ever” d discussion section of the article refer to this as a “first ever” study. One does not seek nor expect to find robust “first ever” discoveries in such a small data set.
  • The authors do not provide evidence that their key measure of “negative stereotypes” is a valid measure of either stereotyping or likelihood of experiencing stress. They don’t even show it is related to concurrent reports of stress.
  • Like a lot of measures with a negative tone to items, this one is affected by what Paul Meehl calls the crud factor. Whatever is being measured in this study cannot be distinguished from a full range of confounds that not even being assessed in this study.
  • The mechanism by which effects of this self-report measure somehow get manifested in changes in the brain lacks evidence and is highly dubious.
  • There was no presentation of actual data or basic statistics. Instead, there were only multivariate statistics that require at least some access to basic statistics for independent evaluation.
  • The authors resorted to cheap statistical strategies to fool readers with their confirmation bias: reliance on one tailed rather than two-tailed tests of significance; use of a discredited backwards elimination method for choosing control variables; and exploring too many control/covariate variables, given their modest sample size.
  • The analyses that are reported do not accurately depict what is in the data set, nor generalize to other data sets.

The article

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

The authors used their measure of negative aging stereotypes in other studies, but do not provide the usual evidence of convergent  and discriminant validity needed to establish the measure assesses what is intended. Basically, we should expect authors to show that a measure that they have developed is related to existing measures (convergent validity) in ways that one would expect, but not related to existing measures (discriminate validity) with which it should have associations.

Psychology has a long history of researchers claiming that their “new” self-report measures containing negatively toned items assess distinct concepts, despite high correlations with other measures of negative emotion as well as lots of confounds. I poked fun at this unproductive tradition in a presentation, Negative emotions and health: why do we keep stalking bears, when we only find scat in the woods?

The article reported two studies. The first tested whether participants holding more negative age stereotypes would have significantly greater loss of hippocampal volume over time. The study involved 52 individuals selected from a larger cohort enrolled in the brain-neuroimaging program of the Baltimore Longitudinal Study of Aging.

Readers are given none of the basic statistics that would be needed to interpret the complex multivariate analyses. Ideally, we would be given an opportunity to see how the independent variable, negative age stereotypes, is related to other data available on the subjects, and so we could get some sense if we are starting with some basic, meaningful associations.

Instead the authors present the association between negative age stereotyping and hippocampal volume only in the presence of multiple control variables:

Covariates consisted of demographics (i.e., age, sex, and education) and health at time of baseline-age-stereotype assessment, (number of chronic conditions on the basis of medical records; well-being as measured by a subset of the Chicago Attitude Inventory); self-rated health, neuroticism, and cognitive performance, measured by the Benton Visual Retention Test (BVRT; Benton, 1974).

Readers get cannot tell why these variables and not others were chosen. Adding or dropping a few variables could produce radically different results. But there are just too many variables being considered. With only 52 research participants, spurious findings that do not generalize to other samples are highly likely.

I was astonished when the authors announced that they were relying on one-tailed statistical tests. This is widely condemned as unnecessary and misleading.

Basically, every time the authors report a significance level in this article, you need to double the number to get what is obtained with a more conventional two-tailed test. So, if they proudly declare that results are significant p = .046, then the results are actually (non)significant, p= .092. I know, we should not make such a fuss about significance levels, but journals do. We’re being set up to be persuaded the results are significant, when they are not by conventional standards.

So the authors’ accumulating sins against proper statistical techniques and transparent reporting: no presentation of basic associations; reporting one tailed tests; use of multivariate statistics inappropriate for a sample that is so small. Now let’s add another one, in their multivariate regressions, the authors relied on a potentially deceptive backwards elimination:

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.

The authors assembled their candidate control/covariate variables and used a procedure that checks them statistically and drop some from consideration, based on whether they fail to add to the significance of the overall equation. This procedure is condemned because the variables that are retained in the equation capitalize on chance. Particular variables that could be theoretically relevant are eliminated simply because they fail to add anything statistically in the context of the other variables being considered. In the context of other variables, these same discarded variables would have been retained.

The final regression equation had fewer control/covariates then when the authors started. Statistical significance will be calculated on the basis of the small number of variables remaining, not the number that were picked over and so results will artificially appear stronger. Again, potentially quite misleading to the unwary reader.

The authors nonetheless concluded:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had a significantly steeper decline in hippocampal volume

The second study:

examined whether participants holding more negative age stereotypes would have significantly greater accumulation of amyloid plaques and neurofibrillary tangles.

The outcome was a composite-plaques-and-tangles score and the predictor was the same negative age stereotypes measure from the first study. These measurements were obtained from 74 research participants upon death and autopsy. The same covariates were used in stepwise regression with backward elimination. Once again, the statistical test was one tailed.

Results were:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had significantly higher composite-plaques-and-tangles scores, t(1,59) = 1.71 p = .046, d = 0.45, adjusting for age, sex, education, self-rated health, well-being, and number of chronic conditions.

Aha! Now we see why the authors commit themselves to a one tailed test. With a conventional two-tailed test, these results would not be significant. Given a prevailing confirmation bias, aversion to null findings, and obsession with significance levels, this article probably would not have been published without the one tailed test.

The authors’ stirring overall conclusion from the two studies:

By expanding the boundaries of known environmental influences on amyloid plaques, neurofibrillary tangles, and hippocampal volume, our results suggest a new pathway to identifying mechanisms and potential interventions related to Alzheimer’s disease

pubpeerPubPeer discussion of this paper [https://pubpeer.com/publications/16E68DE9879757585EDD8719338DCD ]

Comments accumulated for a couple of days on PubPeer after I posted some concerns about the first study. All of the comments were quite smart, some directly validated points that I been thinking about, but others took the discussion in new directions either statistically or because the commentators knew more about neuroscience.

Using a mechanism available at PubPeer, I sent emails to the first author of the paper, the statistician, and one of the NIA personnel inviting them to make comments also. None have responded so far.

Tom Johnstone, a commentator who exercise the option of identifying himself noted the reliance on inferential statistics in the absence of reporting basic relationships. He also noted that the criterion used to drop covariates was lax. Apparently familiar with neuroscience, he expressed doubts that the results had any clinical significance or relevance to the functioning of the research participants.

Another commentator complained of the small sample size, use of one tailed statistical tests without justification, the “convoluted list of covariates,” and “taboo” strategy for selecting covariates to be retained in the regression equation. This commentator also noted that the authors had examined the effect of outliers, conducting analyses both with and without the inclusion of the most extreme case. While it didn’t affect the overall results, exclusion dramatically change the significance level, highlighting the susceptibility of such a small sample to chance variation or sampling error.

Who gets the blame for misleading claims in this article?

dr-luigi-ferrucciThere’s a lot of blame to go around. By exaggerating the size and significance of any effects, the first author increases the chance of publication and also further funding to pursue what is seen as a “tantalizing” association. But it’s the job of editors and peer reviewers to protect the readership from such exaggerations and maybe to protect the author from herself. They failed, maybe because exaggerated findings are consistent with the journal‘s agenda of increasing citations by publishing newsworthy rather than trustworthy findings. The study statistician, Martin Slade obviously knew that misleading, less than optimal statistics were used, why didn’t he object? Finally, I think the NIA staff, particularly Luigi Ferrucci, the Scientific Director of NIA  should be singled out for the irresponsibility of attaching their names to such misleading claims. Why they do so? Did they not read the manuscript?  I will regularly present instances of NIH staff endorsing dubious claims, such as here. The mind-over-disease, psychosomatic hypothesis, gets a lot of support not warranted by the evidence. Perhaps NIH officials in general see this as a way of attracting research monies from Congress. Regardless, I think NIH officials have the responsibility to see that consumers are not misled by junk science.

This article at least provided the opportunity for an exercise that should raise skepticism and convince consumers at all levels – other researchers, clinicians, policymakers, and those who suffer from Alzheimer’s disease and those who care from them – we just cannot sit back and let trusted sources do our thinking for us.

 

Stalking a Cheshire cat: Figuring out what happened in a psychotherapy intervention trial

John Ioannidis, the “scourge of sloppy science”  has documented again and again that the safeguards being introduced into the biomedical literature against untrustworthy findings are usually ineffective. In Ioannidis’ most recent report , his group:

…Assessed the current status of reproducibility and transparency addressing these indicators in a random sample of 441 biomedical journal articles published in 2000–2014. Only one study provided a full protocol and none made all raw data directly available.

As reported in a recent post in Retraction Watch, Did a clinical trial proceed as planned? New project finds out, Psychiatrist Ben Goldacre has a new project with

…The relatively straightforward task of comparing reported outcomes from clinical trials to what the researchers said they planned to measure before the trial began. And what they’ve found is a bit sad, albeit not entirely surprising.

Ben Goldacre specifically excludes psychotherapy studies from this project. But there are reasons to believe that the psychotherapy literature is less trustworthy than the biomedical literature because psychotherapy trials are less frequently registered, adherence to CONSORT reporting standards is less strict, and investigators more routinely refuse to share data when requested.

Untrustworthiness of information provided in the psychotherapy literature can have important consequences for patients, clinical practice, and public health and social policy.

cheshire cat1The study that I will review twice switched outcomes in its reports, had a poorly chosen comparison control group and flawed analyses, and its protocol was registered after the study started. Yet, the study will likely provide data for decision-making about what to do with primary care patients with a few unexplained medical symptoms. The recommendation of the investigators is to deny these patients medical tests and workups and instead provide them with an unvalidated psychiatric diagnosis and a treatment that encourages them to believe that their concerns are irrational.

In this post I will attempt to track what should have been an orderly progression from (a) registration of a psychotherapy trial to (b) publishing of its protocol to (c) reporting of the trial’s results in the peer-reviewed literature. This exercise will show just how difficult it is to make sense of studies in a poorly documented psychological intervention literature.

  • I find lots of surprises, including outcome switching in both reports of the trial.
  • The second article reporting results of the trial that does not acknowledge registration, minimally cites the first reports of outcomes, and hides important shortcomings of the trial. But the authors inadvertently expose new crucial shortcomings without comment.
  • Detecting important inconsistencies between registration and protocols and reports in the journals requires an almost forensic attention to detail to assess the trustworthiness of what is reported. Some problems hide in plain sight if one takes the time to look, but others require a certain clinical connoisseurship, a well-developed appreciation of the subtle means by which investigators spin outcomes to get novel and significant findings.
  • Outcome switching and inconsistent cross-referencing of published reports of a clinical trial will bedevil any effort to integrate the results of the trial into the larger literature in a systematic review or meta-analysis.
  • Two journals – Psychosomatic Medicine and particularly Journal of Psychosomatic Research– failed to provide adequate peer review of articles based on this trial, in terms of trial registration, outcome switching, and allowing multiple reports of what could be construed as primary outcomes from the same trial into the literature.
  • Despite serious problems in their interpretability, results of this study are likely to be cited and influence far-reaching public policies.
  • cheshire cat4The generalizability of results of my exercise is unclear, but my findings encourage skepticism more generally about published reports of results of psychotherapy interventions. It is distressing that more alarm bells have not been sounded about the reports of this particular study.

The publicly accessible registration of the trial is:

Cognitive Behaviour Therapy for Abridged Somatization Disorder (Somatic Symptom Index [SSI] 4,6) patients in primary care. Current controlled trials ISRCTN69944771

The publicly accessible full protocol is:

Magallón R, Gili M, Moreno S, Bauzá N, García-Campayo J, Roca M, Ruiz Y, Andrés E. Cognitive-behaviour therapy for patients with Abridged Somatization Disorder (SSI 4, 6) in primary care: a randomized, controlled study. BMC Psychiatry. 2008 Jun 22;8(1):47.

The second report of treatment outcomes in Journal of Psychosomatic Research

Readers can more fully appreciate the problems that I uncovered if I work backwards from the second published report of outcomes from the trial. Published in Journal of Psychosomatic Research, the article is behind a pay wall, but readers can write to the corresponding author for a PDF: mgili@uib.es. This person is also the corresponding author for the second paper in Psychosomatic Medicine, and so readers might want to request both papers.

Gili M, Magallón R, López-Navarro E, Roca M, Moreno S, Bauzá N, García-Cammpayo J. Health related quality of life changes in somatising patients after individual versus group cognitive behavioural therapy: A randomized clinical trial. Journal of Psychosomatic Research. 2014 Feb 28;76(2):89-93.

The title is misleading in its ambiguity because “somatising” does not refer to an established diagnostic category. In this article, it refers to an unvalidated category that encompasses a considerable proportion of primary care patients, usually those with comorbid anxiety or depression. More about that later.

PubMed, which usually reliably attaches a trial registration number to abstracts, doesn’t do so for this article 

The article does not list the registration, and does not provide the citation when indicating that a trial protocol is available. The only subsequent citations of the trial protocol are ambiguous:

More detailed design settings and study sample of this trial have been described elsewhere [14,16], which explain the effectiveness of CBT reducing number and severity of somatic symptoms.

The above quote is also the sole citation of a key previous paper that presents outcomes for the trial. Only an alert and motivated reader would catch this. No opportunity within the article is provided for comparing and contrasting results of the two papers.

The brief introduction displays a decided puffer fish phenomenon, exaggerating the prevalence and clinical significance of the unvalidated “abridged somatization disorder.” Essentially, the authors invoke the  problematic, but accepted psychiatric diagnostic categories somatoform or somatization disorders in claiming validity for a diagnosis with much less stringent criteria. Oddly, the category has different criteria when applied to men and women: men require four unexplained medical symptoms, whereas women require six.

I haven’t previously counted the term “abridged” in psychiatric diagnosis. Maybe the authors mean “subsyndromal,” as in “subsyndromal depression.” This is a dubious labeling because it suggested all characteristics needed for diagnosis are not present, some of which may be crucial. Think of it: is a persistent cough subsyndromal lung cancer or maybe emphysema? References to symptoms being “subsyndromal”often occur in context where exaggerated claims about prevalence are being made with inappropriate, non-evidence-based inferences  about treatment of milder cases from the more severe.

A casual reader might infer that the authors are evaluating a psychiatric treatment with wide applicability to as many as 20% of primary care patients. As we will see, the treatment focuses on discouraging any diagnostic medical tests and trying to convince the patient that their concerns are irrational.

The introduction identifies the primary outcome of the trial:

The aim of our study is to assess the efficacy of a cognitive behavioural intervention program on HRQoL [health-related quality of life] of patients with abridged somatization disorder in primary care.

This primary outcome is inconsistent with what was reported in the registration, the published protocol, and the first article reporting outcomes. The earlier report does not even mention the inclusion of a measure of HRQoL, measured by the SF-36. It is listed in the study protocol as a “secondary variable.”

The opening of the methods section declares that the trial is reported in this paper consistent with the Consolidated Standards of Reporting Clinical Trials (CONSORT). This is not true because the flowchart describing patients from recruitment to follow-up is missing. We will see that when it is reported in another paper, some important information is contained in that flowchart.

The methods section reports only three measures were administered: a Standardized Polyvalent Psychiatric Interview (SPPI), a semistructured interview developed by the authors with minimal validation; a screening measure for somatization administered by primary care physicians to patients whom they deemed appropriate for the trial, and the SF-36.

Crucial details are withheld about the screening and diagnosis of “abridged somatization disorder.” If these details had been presented, a reader would further doubt the validity of this unvalidated and idiosyncratic diagnosis.

Few readers, even primary care physicians or psychiatrists, will know what to make of the Smith’s guidelines (Googling it won’t yield much), which is essentially a matter of simply sending a letter to the referring GP. Sending such a letter is a notoriously ineffective intervention in primary care. It mainly indicates that patients referred to a trial did not get assigned to an active treatment. As I will document later, the authors were well aware that this would be an ineffectual control/comparison intervention, but using it as such guarantees that their preferred intervention would look quite good in terms of effect size.

The two active interventions are individual- and group-administered CBT which is described as:

Experimental or intervention group: implementation of the protocol developed by Escobar [21,22] that includes ten weekly 90-min sessions. Patients were assessed at 4 time points: baseline, post-treatment, 6 and 12 months after finishing the treatment. The CBT intervention mainly consists of two major components: cognitive restructuring, which focuses on reducing pain-specific dysfunctional cognitions, and coping, which focuses on teaching cognitive and behavioural coping strategies. The program is structured as follows. Session 1: the connection between stress and pain. Session 2: identification of automated thoughts. Session 3: evaluation of automated thoughts. Session 4: questioning the automatic thoughts and constructing alternatives. Session 5: nuclear beliefs. Session 6: nuclear beliefs on pain. Session 7: changing coping mechanisms. Session 8: coping with ruminations, obsessions and worrying. Session 9: expressive writing. Session 10: assertive communication.

There is sparse presentation of data from the trial in the results section, but some fascinating details await a skeptical, motivated reader.

Table 1 displays social demographic and clinical variables. Psychiatric comorbidity is highly prevalent. Readers can’t tell exactly what is going on, because the authors’ own interview schedule is used to assess comorbidity. But it appears that all but a small minority of patients diagnosed with “abridged somatization disorder” have substantial anxiety and depression. Whether these symptoms meet formal criteria cannot be determined. There is no mention of physical comorbidities.

But there is something startling awaiting an alert reader in Table 2.

sf-36 gili

There is something very odd going on here, and very likely a breakdown of randomization. Baseline differences in the key outcome measure, SF-36 are substantially greater between groups than any within-group change. The treatment as usual condition (TAU) has much lower functioning [lower scores mean lower functioning] than the group CBT condition, which is substantially below the individual CBT difference.

If we compare the scores to adult norms, all three groups of patients are poorly functioning, but those “randomized” to TAU are unusually impaired, strikingly more so than the other two groups.

Keep in mind that evaluations of active interventions, in this case CBT, in randomized trials always involve a between difference between groups, not just difference observed within a particular group. That’s because a comparison/control group is supposed to be equivalent for nonspecific factors, including natural recovery. This trial is going to be very biased in its evaluation of individual CBT, a group within which patients started much higher in physical functioning and ended up much higher. Statistical controls fail to correct for such baseline differences. We simply do not have an interpretable clinical trial here.

cheshire cat2The first report of treatment outcomes in Psychosomatic Medicine

 Moreno S, Gili M, Magallón R, Bauzá N, Roca M, del Hoyo YL, Garcia-Campayo J. Effectiveness of group versus individual cognitive-behavioral therapy in patients with abridged somatization disorder: a randomized controlled trial. Psychosomatic medicine. 2013 Jul 1;75(6):600-8.

The title indicates that the patients are selected on the basis of “abridged somatization disorder.”

The abstract prominently indicates the trial registration number (ISRCTN69944771), which can be plugged into Google to reach the publicly accessible registration.

If a reader is unaware of the lack of validation for “abridged somatization disorder,” they probably won’t infer that from the introduction. The rationale given for the study is that

A recently published meta-analysis (18) has shown that there has been ongoing research on the effectiveness of therapies for abridged somatization disorder in the last decade.

Checking that meta-analysis, it only included a single null trial for treatment of abridged somatization disorder. This seems like a gratuitous, ambiguous citation.

I was surprised to learn that in three of the five provinces in which the study was conducted, patients

…Were not randomized on a one-to-one basis but in blocks of four patients to avoid a long delay between allocation and the onset of treatment in the group CBT arm (where the minimal group size required was eight patients). This has produced, by chance, relatively big differences in the sizes of the three arms.

This departure from one-to-one randomization was not mentioned in the second article reporting results of the study, and seems an outright contradiction of what is presented there. Neither is it mentioned nor in the study protocol. This patient selection strategy may have been the source of lack of baseline equivalence of the TAU and to intervention groups.

For the vigilant skeptic, the authors’ calculation of sample size is an eye-opener. Sample size estimation was based on the effectiveness of TAU in primary care visits, which has been assumed to be very low (approximately 10%).

Essentially, the authors are justifying a modest sample size because they don’t expect the TAU intervention is utterly ineffective. How could authors believe there is equipoise, that the comparison control and active interventions treatments could be expected to be equally effective? The authors seem to say that they don’t believe this. Yet,equipoise is an ethical and practical requirement for a clinical trial for which human subjects are being recruited. In terms of trial design, do the authors really think this poor treatment provides an adequate comparison/control?

In the methods section, the authors also provide a study flowchart, which was required for the other paper to adhere to CONSORT standards but was missing in the other paper. Note the flow at the end of the study for the TAU comparison/control condition at the far right. There was substantially more dropout in this group. The authors chose to estimate the scores with the Last Observation Carried Forward (LOCF) method which assumes the last available observation can be substituted for every subsequent one. This is a discredited technique and particularly inappropriate in this context. Think about it: the TAU condition was expected by the authors to be quite poor care. Not surprisingly,  more patients assigned to it dropped out. But they might have  dropped out while deteriorating, and so the last observation obtained is particularly inappropriate. Certainly it cannot be assumed that the smaller number of dropouts from the other conditions were from the same reason. We have a methodological and statistical mess on our hands, but it was hidden from us in our discussion of the second report.

 

flowchart

Six measures are mentioned: (1) the Othmer-DeSouza screening instrument used by clinicians to select patients; (2) the Screening for Somatoform Disorders (SOMS, a 39 item questionnaire that includes all bodily symptoms and criteria relevant to somatoform disorders according to either DSM-IV or ICD-10; (3) a Visual Analog Scale of somatic symptoms (Severity of Somatic Symptoms scale) that patients useto assess changes in severity in each of 40 symptoms; (4) the authors own SPPI semistructured psychiatric interview for diagnosis of psychiatric morbidity in primary care settings; (5) the clinician administered Hamilton Anxiety Rating Scale; and the (6) Hamilton Depression Rating Scale.

We are never actually told what the primary outcome is for the study, but it can be inferred from the opening of the discussion:

The main finding of the trial is a significant improvement regardless of CBT type compared with no intervention at all. CBT was effective for the relief of somatization, reducing both the number of somatic symptoms (Fig. 2) and their intensity (Fig. 3). CBT was also shown to be effective in reducing symptoms related to anxiety and depression.

But I noticed something else here, after a couple of readings. The items used to select patients and identify them with “abridged somatization disorder” reference  39 or 40 symptoms, and men only needing four, while women only needing six symptoms for a diagnosis. That means that most pairs of patients receiving a diagnosis will not have a symptom in common. Whatever “abridged somatization disorder” means, patients who received this diagnosis are likely to be different from each other in terms of somatic symptoms, but probably have other characteristics in common. They are basically depressed and anxious patients, but these mood problems are not being addressed directly.

Comparison of this report to the outcomes paper  reviewed earlier shows none of these outcomes are mentioned as being assessed and certainly not has outcomes.

Comparison of this report to the published protocol reveals that number and intensity of somatic symptoms are two of the three main outcomes, but this article makes no mention of the third, utilization of healthcare.

Readers can find something strange in Table 2 presenting what seems to be one of the primary outcomes, severity of symptoms. In this table the order is TAU, group CBT, and individual CBT. Note the large difference in baseline symptoms with group CBT being much more severe. It’s difficult to make sense of the 12 month follow-up because there was differential drop out and reliance on an inappropriate LOCR imputation of missing data. But if we accept the imputation as the authors did, it appears that they were no differences between TAU and group CBT. That is what the authors reported with inappropriate analyses of covariance.

Moreno severity of symptoms

The authors’ cheerful take away message?

This trial, based on a previous successful intervention proposed by Sumathipala et al. (39), presents the effectiveness of CBT applied at individual and group levels for patients with abridged somatization (somatic symptom indexes 4 and 6).

But hold on! In the introduction, the authors’ justification for their trial was:

Evidence for the group versus individual effectiveness of cognitive-behavioral treatment of medically unexplained physical symptoms in the primary care setting is not yet available.

And let’s take a look at Sumathipala et al.

Sumathipala A, Siribaddana S, Hewege S, Sumathipala K, Prince M, Mann A. Understanding the explanatory model of the patient on their medically unexplained symptoms and its implication on treatment development research: a Sri Lanka Study. BMC Psychiatry. 2008 Jul 8;8(1):54.

The article presents speculations based on an observational study, not an intervention study so there is no success being reported.

The formal registration 

The registration of psychotherapy trials typically provides sparse details. The curious must consult the more elaborate published protocol. Nonetheless, the registration can often provide grounds for skepticism, particularly when it is compared to any discrepant details in the published protocol, as well as subsequent publications.

This protocol declares

Study hypothesis

Patients randomized to cognitive behavioural therapy significantly improve in measures related to quality of life, somatic symptoms, psychopathology and health services use.

Primary outcome measures

Severity of Clinical Global Impression scale at baseline, 3 and 6 months and 1-year follow-up

Secondary outcome measures

The following will be assessed at baseline, 3 and 6 months and 1-year follow-up:
1. Quality of life: 36-item Short Form health survey (SF-36)
2. Hamilton Depression Scale
3. Hamilton Anxiety Scale
4. Screening for Somatoform Symptoms [SOMS]

Overall trial start date

15/01/2008

Overall trial end date

01/07/2009

The published protocol 

Primary outcome

Main outcome variables:

– SSS (Severity of somatic symptoms scale) [22]: a scale of 40 somatic symptoms assessed by a 7-point visual analogue scale.

– SSQ (Somatic symptoms questionnaire) [22]: a scale made up of 40 items on somatic symptoms and patients’ illness behaviour.

When I searched for, Severity of Clinical Global Impression, the primary outcome declared in the registration , and I could find no reference to it.

The protocol was submitted on May 14, 2008 and published on June 22, 2008. This suggests that the protocol was submitted after the start of the trial.

To calculate the sample size we consider that the effectiveness of usual treatment (Smith’s norms) is rather low, estimated at about 20% in most of the variables [10,11]. We aim to assess whether the new intervention is at least 20% more effective than usual treatment.

Comparison group

Control group or standardized recommended treatment for somatization disorder in primary care (Smith’s norms) [10,11]: standardized letter to the family doctor with Smith’s norms that includes: 1. Provide brief, regularly scheduled visits. 2. Establish a strong patient-physician relationship. 3. Perform a physical examination of the area of the body where the symptom arises. 4. Search for signs of disease instead of relying of symptoms. 5. Avoid diagnostic tests and laboratory or surgical procedures. 6. Gradually move the patient to being “referral ready”.

Basically, TAU, the comparison/control group involves simply sending a letter to referring physicians encouraging them simply to meet regularly with the patients but discouraged diagnostic test or medical procedures. Keep in mind that patients for this study were selected by the physicians because they found them particularly frustrating to treat. Despite the authors’ repeated claims about the high prevalence of “abridged somatization disorder,” they relied on a large number of general practice settings to each contribute only a few patients . These patients are very heterogeneous in terms of somatic symptoms, but most share anxiety or depressive symptoms.

House of GodThere is an uncontrolled selection bias here that makes generalization from results of the study problematic. Just who are these patients? I wonder if these patients have some similarity to the frustrating GOMERS (Get Out Of My Emergency Room) in the classic House of God, a book described by Amazon  as “an unvarnished, unglorified, and amazingly forthright portrait revealing the depth of caring, pain, pathos, and tragedy felt by all who spend their lives treating patients and stand at the crossroads between science and humanity.”

Imagine the disappointment about the referring physicians and the patients when consent to participate in this study simply left the patients back in routine care provided by the same physicians . It’s no wonder that the patients deteriorated and that patients assigned to this treatment were more likely to drop out.

Whatever active ingredients the individual and group CBT have, they also include some nonspecific factors missing from the TAU comparison group: frequency and intensity of contact, reassurance and support, attentive listening, and positive expectations. These nonspecific factors can readily be confused with active ingredients and may account for any differences between the active treatments and the TAU comparison. What terrible study.

The two journals providing reports of the studies failed to responsibility to the readership and the larger audience seeking clinical and public policy relevance. Authors have ample incentive to engage in questionable publication practices, including ignoring and even suppressing registration, switching outcomes, and exaggerating the significance of their results. Journals of necessity must protect authors from their own inclinations, as well as the readers and the larger medical community from on trustworthy reports. Psychosomatic Medicine and Journal of Psychosomatic Research failed miserably in their peer review of these articles. Neither journal is likely to be the first choice for authors seeking to publish findings from well-designed and well reported trials. Who knows, maybe the journals’ standards are compromised by the need to attract randomized trials for what is construed as a psychosomatic condition, at least by the psychiatric community.

Regardless, it’s futile to require registration and posting of protocols for psychotherapy trials if editors and reviewers ignore these resources in evaluating articles for publication.

Postscript: imagine what will be done with the results of this study

You can’t fix with a meta analysis what investigators bungled by design .

In a recent blog post, I examined a registration for a protocol for a systematic review and meta-analysis of interventions to address medically unexplained symptoms. The review protocol was inadequately described, had undisclosed conflicts of interest, and one of the senior investigators had a history of switching outcomes in his own study and refusing to share data for independent analysis. Undoubtedly, the study we have been discussing meets the vague criteria for inclusion in this meta-analysis. But what outcomes will be chosen, particularly when they should only be one outcome per study? And will be recognized that these two reports are actually the same study? Will key problems in the designation of the TAU control group be recognized, with its likely inflation of treatment effects, when used to calculate effect sizes?

cheshire_cat_quote_poster_by_jc_790514-d7exrjeAs you can see, it took a lot of effort to compare and contrast documents that should have been in alignment. Do you really expect those who conduct subsequent meta-analyses to make those multiple comparisons or will they simply extract multiple effect sizes from the two papers so far reporting results?

Obviously, every time we encounter a report of a psychotherapy in the literature, we won’t have the time or inclination to undertake such a cross comparison of articles, registration, and protocol. But maybe we should be skeptical of authors’ conclusions without such checks.

I’m curious what a casual reader would infer from encountering either of these reports of this clinical trial I have reviewed in a literature search, but not the other one.

 

 

PLSO-Blogs-Survey_240x310
http://plos.io/PLOSblogs16

Was independent peer review of the PACE trial articles possible?

I ponder this question guided by Le Chavalier C. Auguste Dupin, the first fictional detective, before anyone was called “detective.”

mccartney too manyArticles reporting the PACE trial have extraordinary numbers of authors, acknowledgments, and institutional affiliations. A considerable proportion of all persons and institutions involved in researching chronic fatigue and related conditions in the UK have a close connection to PACE.

This raises issues about

  • Obtaining independent peer review of these articles that is not tainted by reviewer conflict of interest.
  • Just what authorship on a PACE trial paper represents and whether granting of authorship conforms to international standards.
  • The security of potential critics contemplating speaking out about whatever bad science they find in the PACE trial articles. The security of potential reviewers who are negative and can be found out. Critics within the UK risk isolation and blacklisting from a large group who have investments in what could be exaggerated estimates of the quality and outcome of PACE trial.
  • Whether grants associated with multimillion pound PACE study could have received the independent peer review that is so crucial to assuring that proposals selected to be funded are of the highest quality.

Issues about the large number of authors, acknowledgments, and institutional affiliations become all the more salient as critics [1, 2, 3] find again serious flaws inthe conduct and the reporting of the Lancet Psychiatry 2015 long-term follow-up study. Numerous obvious Questionable Research Practices (QRPs) survived peer review. That implies at least ineptness in peer review or even Questionable Publication Practices (QPPs).

The important question becomes: how is the publication of questionable science to be explained?

Maybe there were difficulties finding reviewers with relevant expertise who were not in some way involved in the PACE trial or affiliated with departments and institutions that would be construed as benefiting from a positive review outcome, i.e. a publication?

Or in the enormous smallness of the UK, is independent peer review achieved by persons putting those relationships and affiliations aside to produce an impeccably detached and rigorous review process?

The untrustworthiness of both the biomedical and psychological literatures are well-established. Nonpharmacological interventions have fewer safeguards than drug trials, in terms of adherence to preregistration, reporting standards like CONSORT, and enforcement of sharing of data.

Open-minded skeptics should be assured of independent peer review of nonpharmacological clinical trials, particularly when there is evidence that persons and groups with considerable financial interests attempt to control what gets published and what is said about their favored interventions. Reviewers with potential conflicts of interest should be excluded from evaluation of manuscripts.

Independent peer review of the PACE trial by those with relevant expertise might not be possible the UK where much of the conceivable expertise is in some way directly or indirectly attached to the PACE trial.

A Dutch observer’s astute observations about the PACE articles

My guest blogger Dutch research biologist Klaas van Dijk  called attention to the exceptionally large number of authors and institutions listed for a pair of PACE trial papers.

klaasKlaas noted

The Pubmed entry for the 2011 Lancet paper lists 19 authors:

B J Angus, H L Baber, J Bavinton, M Burgess, T Chalder, L V Clark, D L Cox, J C DeCesare, K A Goldsmith, A L Johnson, P McCrone, G Murphy, M Murphy, H O’Dowd, PACE trial management group*, L Potts, M Sharpe, R Walwyn, D Wilks and P D White (re-arranged in an alphabetic order).

The actual article from the Lancet website ( http://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(11)60096-2.pdf and also http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext ) lists 19 authors who are acting ‘on behalf of the PACE trial management group†’. But the end of the paper (page 835) states: “PACE trial group.” This term is not identical to “PACE trial management group”.
.
In total, another 19 names are listed under “PACE trial group” (page 835): Hiroko Akagi, Mansel Aylward, Barbara Bowman Jenny Butler, Chris Clark, Janet Darbyshire, Paul Dieppe, Patrick Doherty, Charlotte Feinmann, Deborah Fleetwood, Astrid Fletcher, Stella Law, M Llewelyn, Alastair Miller, Tom Sensky, Peter Spencer, Gavin Spickett, Stephen Stansfeld and Alison Wearden (re-arranged in an alphabetic order).

There is no overlap with the first 19 people who are listed as author of the paper.

So how many people can claim to be an author of this paper? Are all these 19 people of the “PACE trial management group” (not identical to “PACE trial group”???) also some sort of co-author of this paper? Do all these 19 people of the second group also agree with the complete contents of the paper? Do all 38 people agree with the full contents of the paper?

The paper lists many affiliations:
* Queen Mary University of London, UK
* King’s College London, UK
* University of Cambridge, UK
* University of Cumbria, UK
* University of Oxford, UK
* University of Edinburgh, UK
* Medical Research Council Clinical Trials Unit, London, UK
* South London and Maudsley NHS Foundation Trust, London, UK
* The John Radcliffe Hospital, Oxford, UK
* Royal Free Hospital NHS Trust, London, UK
* Barts and the London NHS Trust, London, UK
* Frenchay Hospital NHS Trust, Bristol, UK;
* Western General Hospital, Edinburgh, UK

Do all these affiliations also agree with the full contents of the paper? Am I right to assume that all 38 people (names see above) and all affiliations / institutes (see above) plainly refuse to give critics / other scientists / patients / patient groups (etc.) access to the raw research data of this paper and am I am right with my assumption that it is therefore impossible for all others (including allies of patients / other scientists / interested students, etc.) to conduct re-calculations, check all statements with the raw data, etc?

Decisions whether to accept manuscripts for publication are made in dark places based on opinions offered by people whose identities may be known only to editors. Actually, though, in a small country like the UK, peer-reviewed may be a lot less anonymous than intended and possibly a lot less independent and free of conflict of interests. Without a lot more transparency than is currently available concerning peer review the published papers underwent, we are left to our speculation.

Prepublication peer review is just one aspect of the process of getting research findings vetted and shaped and available to the larger scientific community, and an overall process that is now recognized as tainted with untrustworthiness.

Rules for granting authorship

Concerns about gift and unwarranted authorship have increased not only because of growing awareness of unregulated and unfair practices, but because of the importance attached to citations and authorship for professional advancement. Journals are increasingly requiring documentation that all authors have made an appropriate contribution to a manuscript and have approved the final version

Yet operating rules for granting authorship in many institutional settings vary greatly from the stringent requirements of journals. Contrary to the signed statements that corresponding authors have to make in submitting a manuscript to a journal, many clinicians expect an authorship in return for access to patients. Many competitive institutions award and withhold authorship based on politics and good or bad behavior that have nothing to do with requirements of journals.

Basically, despite the existence of numerous ethical guidelines and explicit policies, authors and institutions can largely do what they want when it comes to granting and withholding authorship.

Persons are quickly disappointed when they are naïve enough to complain about unwarranted authorships or being forced to include authors on papers without appropriate contribution or being denied authorship for an important contribution. They quickly discover that whistleblowers are generally considered more of a threat to institutions and punished more severely than alleged wrongdoers, no matter how strong the evidence may be.

The Lancet website notes

The Lancet is a signatory journal to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, issued by the International Committee of Medical Journal Editors (ICMJE Recommendations), and to the Committee on Publication Ethics (COPE) code of conduct for editors. We follow COPE’s guidelines.

The ICMJE recommends that an author should meet all four of the following criteria:

  • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;
  • Drafting the work or revising it critically for important intellectual content;
  • Final approval of the version to be published;
  • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.”

The intent of these widely endorsed recommendations is that persons associated with a large project have to do a lot to claim their places as authors.

Why the fuss about acknowledgments?

I’ve heard from a number of graduate students and junior investigators that they have had their first manuscripts held up in the submission process because they did not obtain written permission for acknowledgments. Why is that considered so important?

Mention in an acknowledgment is an honor. But it implies involvement in a project and approval of a resulting manuscript. In the past, there were numerous instances where people were named in acknowledgments without having given permission. There was a suspicion sometimes confirmed, that they had been acknowledged only to improve the prospects of a manuscript for getting published. There are other instances where persons were included in acknowledgments without permission with the intent of authors avoiding them in the review process because of the appearance of a conflict of interest.

The expectation is that anyone contributing enough to a manuscript to be acknowledged as a potential conflict of interest in deciding whether it is suitable for publication.

But, as in other aspects of a mysterious and largely anonymous review process, whether people who were acknowledged in manuscripts were barred from participating in review of a manuscript cannot be established by readers.

What is the responsibility of reviewers to declare conflict of interest?

Reviewers are expected to declare conflicts of interest accepting a manuscript to review. But often they are presented with a tick box without a clear explanation of the criteria for the appearance of conflict of interest. But reviewers can usually continue considering a manuscript after acknowledging that they do have an association with authors or institutional affiliation, but they do not consider it a conflict. It is generally accepted that statement.

Authors excluding from the review process persons they consider to have a negative bias

In submitting a manuscript, authors are offered an opportunity to identify persons who should be excluded because of the appearance of a negative bias. Editors generally take these requests quite seriously. As an editor, I sometimes receive a large number of requested exclusions by authors who worry about opinions of particular people.

While we don’t know what went on in prepublication peer review, the PACE investigators have repeatedly and aggressively attempted to manipulate post publication portrayals of their trial in the media. Can we rule out that they similarly try to control potential critics in the prepublication peer review of their papers?

The 2015 Lancet Psychiatry secondary mediation analysis article

Chalder, T., Goldsmith, K. A., Walker, J., & White, P. D. Sharpe, M., Pickles, A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. The Lancet Psychiatry, 2: 141–52

The acknowledgments include

We acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, excluding ARP, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, LV Clark, DL Cox, JC DeCesare, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks. This report is independent research partly arising from a doctoral research fellowship supported by the NIHR.

Fifteen of the authors of the 2011 Lancet PACE paper are no longer present, and another author has been added. The PACE Trial Management Group is again acknowledged, but there is no mention of the separate PACE trial group. We can’t tell why there has been a major reduction in the number of authors and acknowledgments or why it came about. Or whether people who would been dropped participated in a review of this paper. But what is obvious is that this is an exceedingly flawed mediation analysis crafted to a foregone conclusion. I’ll say more about that in future blogs, but we can only speculate how the bad publication practices made it through peer review.

This article is a crime against the practice of secondary mediation analyses. If I were a prospect of author present in a discussion, I would flee before it became a crime scene.

I am told I have over 350 publications, but I considered vulgar for authors to keep track of exact numbers. But there are many potential publications that are not included in this number because I declined authorship because I could not agree with the spin that others were trying to put on the reporting of the findings. In such instances, I exclude myself from review of the resulting manuscript because of the appearance of a conflict of interest. We can ponder how many of the large pool of past PACE authors refused authorship on this paper when it was offered and homely declined to participate in subsequent peer review because of the appearance of a conflict of interest.

The 2015 Lancet Psychiatry long-term follow-up article

Sharpe, M., Goldsmith, K. A., Chalder, T., Johnson, A.L., Walker, J., & White, P. D. (2015). Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry, http://dx.doi.org/10.1016/S2215-0366(15)00317-X

The acknowledgments include

We gratefully acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, L V Clark, D L Cox, J C DeCesare, E Feldman, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks, and the King’s Clinical Trials Unit. We thank Hannah Baber for facilitating the long-term follow-up data collection.

Again, there are authors and acknowledgments missing from the early paper and were in the dark about how and why that happened and whether missing persons were considered free enough of conflict of interest to evaluate this article when it was in manuscript form. But as documented in a blog post at Mind the Brain, there were serious, obvious flaws in the conduct and reporting of the follow-up study. It is a crime against best practices for the proper conduct and reporting of clinical trials. And again we can speculate how it got through peer review.

… And grant reviews?

Where can UK granting agencies obtain independent peer review of past and future grants associated with the PACE trial? To take just one example, the 2015 Lancet Psychiatry secondary mediation analysis was funded in part by a NIHR doctoral research fellowship grant. The resulting paper has many fewer authors than the 2011 Lancet. Did everyone who was an author or mentioned in the acknowledgments on that paper exclude themselves from review of the screen? Who, then, would be left

In Germany and the Netherlands, concerns about avoiding the appearance of conflict of interest in obtaining independent peer review of grants has led to heavy reliance on expertise from outside the country. This does not imply any improprieties from expertise within these countries, but rather the necessity of maintaining a strong appearance that vested interests have not unduly influenced grant review. Perhaps the situation of apparent with the PACE trial suggests that journals and grant review panels within the UK might consider similar steps.

Contemplating the evidence against independent peer review

  • We have a mob of people as authors and mentions in acknowledgments. We have a huge conglomerate of institutions acknowledged.
  • We have some papers with blatant questionable research and reporting practices published in prestigious journals after ostensible peer review.
  • We are left in the dark about what exactly happened in peer review, but that the articles were adequately peer reviewed is a crucial part of their credability.

What are we to conclude?

The_Purloined_LetterI think of what Edgar Allen Poe’s wise character, Le Chevalier C. Auguste Dupin would say. For those of you who don’t know who he is:

Le Chevalier C. Auguste Dupin  is a fictional detective created by Edgar Allan Poe. Dupin made his first appearance in Poe’s “The Murders in the Rue Morgue” (1841), widely considered the first detective fiction story.[1] He reappears in “The Mystery of Marie Rogêt” (1842) and “The Purloined Letter” (1844)…

Poe created the Dupin character before the word detective had been coined. The character laid the groundwork for fictitious detectives to come, including Sherlock Holmes, and established most of the common elements of the detective fiction genre.

I think if we asked Dupin, he would say the danger is that the question is too fascinating to give up, but impossible to resolve without evidence we cannot access. We can blog, we can discuss this important question, but in the end we cannot answer it with certainty.

Sigh.

Uninterpretable: Fatal flaws in PACE Chronic Fatigue Syndrome follow-up study

Earlier decisions by the investigator group preclude valid long-term follow-up evaluation of CBT for chronic fatigue syndrome (CFS).

CFS-Think-of-the-worst1At the outset, let me say that I’m skeptical whether we can hold the PACE investigators responsible for the outrageous headlines that have been slapped on their follow-up study and on the comments they have made in interviews.

The Telegraph screamed

Chronic Fatigue Syndrome sufferers ‘can overcome symptoms of ME with positive thinking and exercise’

Oxford University has found ME is not actually a chronic illness

My own experience critiquing media interpretation of scientific studies suggests that neither researchers nor even journalists necessarily control shockingly inaccurate headlines placed on otherwise unexceptional media coverage. On the other hand, much distorted and exaggerated media coverage starts with statements made by researchers and by press releases from their institutions.

The one specific quote attributed to a PACE investigator is unfortunate because of its potential to be misinterpreted by professionals, persons who suffer from chronic fatigue syndrome, and the people around them affected by their functioning.

“It’s wrong to say people don’t want to get better, but they get locked into a pattern and their life constricts around what they can do. If you live within your limits that becomes a self-fulfilling prophesy.”

It suggests that willfulness causes CFS sufferers’ impaired functioning. This is ridiculous as application of the discredited concept of fighting spirit to cancer patients’ failure to triumph against their life altering and life-threatening condition. Let’s practice the principle of charity and assume this is not the intention of the PACE investigator, particularly when there is so much more for which we should give them responsibility.

Go here for a fuller evaluation that I endorse of the Telegraph coverage of PACE follow-up study.

Having read the PACE follow-up study carefully, my assessment is that the data presented are uninterpretable. We can temporarily suspend critical thinking and some basic rules for conducting randomized trials (RCTs), follow-up studies, and analyzing the subsequent data. Even if we do, we should reject some of the interpretations offered by the PACE investigators as unfairly spun to fit what has already a distorted positive interpretation oPACE trial HQf the results.

It is important to note that the PACE follow-up study can only be as good as the original data it’s based on. And in the case of the PACE study itself, a recent longread critique by UC Berkeley journalism and public health lecturer David Tuller has arguably exposed such indefensible flaws that any follow-up is essentially meaningless. See it for yourself [1, 2, 3 ].

This week’s report of the PACE long term follow-up study and a commentary  are available free at the Lancet Psychiatry website after a free registration. I encourage everyone to download a copy before reading further. Unfortunately, some crucial details of the article are highly technical and some details crucial to interpreting the results are not presented.

I will provide practical interpretations of the most crucial technical details so that they are more understandable to the nonspecialist. Let me know where I fail.

1When Cherished Beliefs Clash with EvidenceTo encourage proceeding with this longread, but to satisfy those who are unwilling or unable to proceed, I’ll reveal my main points are

  • The PACE investigators sacrificed any possibility of meaningful long-term follow-up by breaking protocol and issuing patient testimonials about CBT before accrual was even completed.
  • This already fatal flaw was compounded with a loose recommendation for treatment after the intervention phase of the trial ended. The investigators provide poor documentation of which treatment was taken up by which patients and whether there was crossover in the treatment being received during follow up.
  • Investigators’ attempts to correct methodological issues with statistical strategies lapses into voodoo statistics.
  • The primary outcome self-report variables are susceptible to manipulation, investigator preferences for particular treatments, peer pressure, and confounding with mental health variables.
  • The Pace investigators exploited ambiguities in the design and execution of their trial with self-congratulatory, confirmatory bias.

The Lancet Psychiatry summary/abstract of the article

Background. The PACE trial found that, when added to specialist medical care (SMC), cognitive behavioural therapy (CBT), or graded exercise therapy (GET) were superior to adaptive pacing therapy (APT) or SMC alone in improving fatigue and physical functioning in people with chronic fatigue syndrome 1 year after randomisation. In this pre-specified follow-up study, we aimed to assess additional treatments received after the trial and investigate long-term outcomes (at least 2 years after randomisation) within and between original treatment groups in those originally included in the PACE trial.

Findings Between May 8, 2008, and April 26, 2011, 481 (75%) participants from the PACE trial returned questionnaires. Median time from randomisation to return of long-term follow-up assessment was 31 months (IQR 30–32; range 24–53). 210 (44%) participants received additional treatment (mostly CBT or GET) after the trial; with participants originally assigned to SMC alone (73 [63%] of 115) or APT (60 [50%] of 119) more likely to seek treatment than those originally assigned to GET (41 [32%] of 127) or CBT (36 [31%] of 118; p<0·0001). Improvements in fatigue and physical functioning reported by participants originally assigned to CBT and GET were maintained (within-group comparison of fatigue and physical functioning, respectively, at long-term follow-up as compared with 1 year: CBT –2·2 [95% CI –3·7 to –0·6], 3·3 [0·02 to 6·7]; GET –1·3 [–2·7 to 0·1], 0·5 [–2·7 to 3·6]). Participants allocated to APT and to SMC alone in the trial improved over the follow-up period compared with 1 year (fatigue and physical functioning, respectively: APT –3·0 [–4·4 to –1·6], 8·5 [4·5 to 12·5]; SMC –3·9 [–5·3 to –2·6], 7·1 [4·0 to 10·3]). There was little evidence of differences in outcomes between the randomised treatment groups at long-term follow-up.

Interpretation The beneficial effects of CBT and GET seen at 1 year were maintained at long-term follow-up a median of 2·5 years after randomisation. Outcomes with SMC alone or APT improved from the 1 year outcome and were similar to CBT and GET at long-term follow-up, but these data should be interpreted in the context of additional therapies having being given according to physician choice and patient preference after the 1 year trial final assessment. Future research should identify predictors of response to CBT and GET and also develop better treatments for those who respond to neither.

fem imageNote the contradiction here which will persist throughout the paper, the official Oxford University press release, quotes from the PACE investigators to the media, and media coverage. On the one hand we are told:

Improvements in fatigue and physical functioning reported by participants originally assigned to CBT and GET were maintained…

Yet we are also told:

There was little evidence of differences in outcomes between the randomised treatment groups at long-term follow-up.

Which statement is to be given precedence? To the extent that features of a randomized trial have been preserved in the follow-up (which we will see, is not actually the case), a lack of between group differences at follow-up should be given precedence over any persistence of change within groups from baseline. That is a not controversial point for interpreting clinical trials.

A statement about group differences at follow up should proceed and qualify any statement about within-group follow up. Otherwise why bother with a RCT in the first place?

The statement in the Interpretation section of the summary/abstract has an unsubstantiated spin in favor of the investigators’ preferred intervention.

Outcomes with SMC alone or APT improved from the 1 year outcome and were similar to CBT and GET at long-term follow-up, but these data should be interpreted in the context of additional therapies having being given according to physician choice and patient preference after the 1 year trial final assessment.

If we’re going to be cautious and qualified in our statements, there are lots of other explanations for similar outcomes in the intervention and control groups that are more plausible. Simply put and without unsubstantiated assumptions, any group differences observed earlier have dissipated. Poof! Any advantages of CBT and GET are not sustained.

How the PACE investigators destroyed the possibility of an interpretable follow-up study

imagesNeither the Lancet Psychiatry article nor any recent statements by the PACE investigators acknowledged how these investigators destroyed any possibility of analyses of meaningful follow-up data.

Before the intervention phase of the trial was even completed, even before accrual of patients was complete, the investigators published a newsletter in December 2008 directed at trial participants. An article appropriately reminds participants of the upcoming two and one half year follow-up. But then it acknowledges difficulty accruing patients, but that additional funding has been received from the MRC to extend recruiting. And then glowing testimonials appear on p. 3 of the newsletter about the effects of their intervention.

“Being included in this trial has helped me tremendously. (The treatment) is now a way of life for me, I can’t imagine functioning fully without it. I have nothing but praise and thanks for everyone involved in this trial.”

“I really enjoyed being a part of the PACE Trial. It helped me to learn more about myself, especially (treatment), and control factors in my life that were damaging. It is difficult for me to gauge just how effective the treatment was because 2007 was a particularly strained, strange and difficult year for me but I feel I survived and that the trial armed me with the necessary aids to get me through. It was also hugely beneficial being part of something where people understand the symptoms and illness and I really enjoyed this aspect.”

These testimonials are a horrible breach of protocol. Taken together with the acknowledgment of the difficulty accruing patients, the testimonials solicit expression of gratitude and apply pressure on participants to endorse the trial by providing a positive of their outcome. Some minimal effort is made to disguise the conditions from which the testimonials come. However, references to a therapist and, in the final quote above, to “control factors in my life that were damaging” leave no doubt that the CBT and GET favored by the investigators is having positive results.

Probably more than in most chronic illnesses, CFS sufferers turn to each other for support in the face of bewildering and often stigmatizing responses from the medical community. These testimonials represent a form of peer pressure for positive evaluations of the trial.

Any investigator group that would deliberately violate protocol in this manner deserves further scrutiny for other violations and threats to the validity of their results. I challenge defenders of the PACE study to cite other precedents for this kind of manipulation of clinical trials participants. What would they have thought if a drug company had done this for the evaluation of their medication?

The breakdown of randomization as further destruction of the interpretability of follow-up results

Returning to the Lancet Psychiatry article itself, note the following:

After completing their final trial outcome assessment, trial participants were offered an additional PACE therapy if they were still unwell, they wanted more treatment, and their PACE trial doctor agreed this was appropriate. The choice of treatment offered (APT, CBT, or GET) was made by the patient’s doctor, taking into account both the patient’s preference and their own opinion of which would be most beneficial. These choices were made with knowledge of the individual patient’s treatment allocation and outcome, but before the overall trial findings were known. Interventions were based on the trial manuals, but could be adapted to the patient’s needs.

Readers who are methodologically inclined might be interested in a paper in which I discuss incorporating patient preference in randomized trials, as well as another paper describing clinical trial conducted with German colleagues  in which we incorporated patient preference in evaluation of antidepressants and psychotherapy for depression in primary care. Patient preference can certainly be accommodated in a clinical trial in ways that preserve the benefits of randomization, but not as the PACE investigators have done.

Following completion of the treatment to which particular patients were randomly assigned, the PACE trial offered a complex negotiation between patient and trial physician about further treatment. This represents a thorough breakdown of the benefits of a controlled randomized trial for the evaluation of treatments. Any focus on the long-term effects of initial randomization is sacrificed by what could be substantial departures from that randomization. Any attempts at statistical corrections will fail.

Of course, investigators cannot ethically prevent research participants from seeking additional treatment. But in the case of PACE, the investigators encouraged departures from the randomized treatment yet did not adequately take into account the decisions that were made. An alternative would have been to continue with the randomized treatment, taking into account and quantifying any cross over into another treatment arm.

2When Cherished Beliefs Clash with EvidenceVoodoo statistics in dealing with incomplete follow-up data.

Between May 8, 2008, and April 26, 2011, 481 (75%) participants from the PACE trial returned questionnaires.

This is a very good rate of retention of participants for follow-up. The serious problem is that neither

  • loss to follow-up nor
  • whether there was further treatment, nor
  • whether there was cross over in the treatment received in follow-up versus the actual trial

is random.

Furthermore, any follow-up data is biased by the exhortation of the newsletter.

No statistical controls can restore the quality of the follow-up data to what would’ve been obtained with preservation of the initial randomization. Nothing can correct for the exhortation.

Nonetheless, the investigators tried to correct for loss of participants to follow-up and subsequent treatment. They described their effort in a technically complex passage, which I will subsequently interpret:

We assessed the differences in the measured outcomes between the original randomised treatment groups with linear mixed-effects regression models with the 12, 24, and 52 week, and long-term follow-up measures of outcomes as dependent variables and random intercepts and slopes over time to account for repeated measures.

We included the following covariates in the models: treatment group, trial stratification variables (trial centre and whether participants met the international chronic fatigue syndrome criteria,3 London myalgic encephalomyelitis criteria,4 and DSM IV depressive disorder criteria),18,19 time from original trial randomisation, time by treatment group interaction term, long-term follow-up data by treatment group interaction term, baseline values of the outcome, and missing data predictors (sex, education level, body-mass index, and patient self-help organisation membership), so the differences between groups obtained were adjusted for these variables.

Nearly half (44%; 210 of 479) of all the follow-up study participants reported receiving additional trial treatments after their final 1 year outcome assessment (table 2; appendix p 2). The number of participants who received additional therapy differed between the original treatment groups, with more participants who were originally assigned to SMC alone (73 [63%] of 115) or to APT (60 [50%] of 119) receiving additional therapy than those assigned to GET (41 [32%] of 127) or CBT (36 [31%] of 118; p<0·0001).

In the trial analysis plan we defined an adequate number of therapy sessions as ten of a maximum possible of 15. Although many participants in the follow-up study had received additional treatment, few reported receiving this amount (table 2). Most of the additional treatment that was delivered to this level was either CBT or GET.

The “linear mixed-effects regression models” are rather standard techniques for compensating for missing data by using all of the available data to estimate what is missing. The problem is that this approach assumes that any missing data are random, which is an untested assumption that is unlikely to be true in this study.

3aWhen Cherished Beliefs Clash with Evidence-page-0The inclusion of “covariates” is an effort to control for possible threats to the validity of the overall analyses by taking into account what is known about participants. There are numerous problems here. We can’t be assured that the results are any more robust and reliable than what would be obtained without these efforts at statistical control. The best publishing practice is to make the unadjusted outcome variables available and let readers decide. Greatest confidence in results is obtained when there is no difference between the results in the adjusted and unadjusted analyses.

Methodologically inclined readers should consult an excellent recent article by clinical trial expert, Helene Kraemer, A Source of False Findings in Published Research Studies Adjusting for Covariates.

The effectiveness of statistical controls depends on certain assumptions being met about patterns of variation within the control variables. There is no indication that any diagnostic analyses were done to determine whether possible candidate control variables should be eliminated in order to avoid a violation of assumptions about the multivariate distribution of covariates. With so many control variables, spurious results are likely. Apparent results could change radically with the arbitrary addition or subtraction of control variables. See here for a further explanation of this problem.

We don’t even know how this set of covariate/control variables, rather than some other set, was established. Notoriously, investigators often try out various combinations of control variables and present only those that make their trial looked best. Readers are protected from this questionable research practice only with pre-specification of analyses before investigators know their results—and in an unblinded trial, researchers often know the result trends long before they see the actual numbers.

See JP Simmons’  hilarious demonstration that briefly listening to the Beatles’ “When I’m 64” can be leave research participants a year and a half older younger than listening to “Kalimba” – at least when investigators have free reign to manipulate the results they want in an study without pre-registration of analytic plans.

Finally, the efficacy of complex statistical controls is widely overestimated and depends on unrealistic assumptions. First, it is assumed that all relevant variables that need to be controlled have been identified. Second, even when this unrealistic assumption has been met, it is assumed that all statistical control variables have been measured without error. When that is not the case, results can appear significant when they actually are not. See a classic paper by Andrew Phillips and George Davey Smith for further explanation of the problem of measurement error producing spurious findings.

What the investigators claim the study shows

In an intact clinical trial, investigators can analyze outcome data with and without adjustments and readers can decide which to emphasize. However, this is far from an intact clinical trial and these results are not interpretable.

The investigators nonetheless make the following claims in addition to what was said in the summary/abstract.

In the results the investigators state

The improvements in fatigue and physical functioning reported by participants allocated to CBT or GET at their 1 year trial outcome assessment were sustained.

This was followed by

The improvements in impairment in daily activities and in perceived change in overall health seen at 1 year with these treatments were also sustained for those who received GET and CBT (appendix p 4). Participants originally allocated to APT reported further improvements in fatigue, physical functioning, and impairment in daily activities from the 1 year trial outcome assessment to long-term follow-up, as did those allocated to SMC alone (who also reported further improvements in perceived change in overall health; figure 2; table 3; appendix p 4).

If the investigators are taking their RCT design seriously, they should give precedence to the null findings for group differences at follow-up. They should not be emphasizing the sustaining of benefits within the GET and CBT groups.

The investigators increase their positive spin on the trial in the opening sentence of the Discussion

The main finding of this long-term follow-up study of the PACE trial participants is that the beneficial effects of the rehabilitative CBT and GET therapies on fatigue and physical functioning observed at the final 1 year outcome of the trial were maintained at long-term follow-up 2·5 years from randomisation.

This is incorrect. The main finding   is that any reported advantages of CBT and GET at the end of the trial were lost by long-term follow up. Because an RCT is designed to focus on between group differences, the statement about sustaining of benefits is post-hoc.

The Discussion further states

In so far as the need to seek additional treatment is a marker of continuing illness, these findings support the superiority of CBT and GET as treatments for chronic fatigue syndrome.

This makes unwarranted and self-serving assumptions that treatment choice was mainly driven by the need for further treatment, when decision-making was contaminated by investigative preference, as stated in the newsletter. Note also that CBT is a novel treatment for research participants and more likely to be chosen on the basis of novelty alone in the face of overall modest improvement rates for the trial and lack of improvements in objective measures. Whether or not the investigators designate a limited range of self-report measures as primary, participant decision-making may be driven by other, more objective measures.

Regardless, investigators have yet to present any data concerning how decisions for further treatment were made, if such data exist.

The investigators further congratulate themselves with

There was some evidence from an exploratory analysis that improvement after the 1 year trial final outcome was not associated with receipt of additional treatment with CBT or GET, given according to need. However this finding must be interpreted with caution because it was a post-hoc subgroup analysis that does not allow the separation of patient and treatment factors that random allocation provides.

However, why is this analysis singled out has exploratory and to be interpreted with caution because it is a post-hoc subgroup analysis when similarly post-hoc subgroup analyses are recommended without such caution?

The investigators finally get around to depicting what should be their primary finding, but do so in a dismissive fashion.

Between the original groups, few differences in outcomes were seen at long-term follow-up. This convergence in outcomes reflects the observed improvement in those originally allocated to SMC and APT, the possible reasons for which are listed above.

The discussion then discloses a limitation of the study that should have informed earlier presentation and discussion of results

First, participant response was incomplete; some outcome data were missing. If these data were not missing at random it could have led to either overestimates or underestimates of the actual differences between the groups.

This minimizes the implausibility of the assumption of random missing variables, as well as the problems introduced by the complex attempts to control confounds statistically.

And then there is an unsubstantiated statement that is sure to upset persons who suffer from CFS and those who care for them.

the outcomes were all self-rated, although these are arguably the most pertinent measures in a condition that is defined by symptoms.

I could double the length of this already lengthy blog post if I fully discussed this. But let me raise a few issues.

  1. The self-report measures do not necessarily capture subjective experience, only forced choice responses to a limited set of statements.
  2. One of the two outcome measures, the physical health scale of the SF-36  requires forced choice responses to a limited set of statements selected for general utility across all mental and physical conditions. Despite its wide use, the SF-36 suffers from problems in internal consistency and confounding with mental health variables. Anyone inclined to get excited about it should examine  its items and response options closely. Ask yourself, do differences in scores reliably capture clinically and personally significant changes in the experience and functioning associated with the full range of symptoms of CHF?
  3. The validity other primary outcome measure, the Chalder Fatigue Scale depends heavily on research conducted by this investigator group and has inadequate validation of its sensitivity to change in objective measures of functioning.
  4. Such self-report measures are inexorably confounded with morale and nonspecific mental health symptoms with large, unwanted correlation tendency to endorse negative self-statements that is not necessarily correlated with objective measures.

Although it was a long time ago, I recall well my first meeting with Professor Simon Wessely. It was at a closed retreat sponsored by NIH to develop a consensus about the assessment of fatigue by self-report questionnaire. I listened to a lot of nonsense that was not well thought out. Then, I presented slides demonstrating a history of failed attempts to distinguish somatic complaints from mental health symptoms by self-report. Much later, this would become my “Stalking bears, finding bear scat in the woods” slide show.

you can't see itBut then Professor Wessely arrived at the meeting late, claiming to be grumbly because of jet lag and flight delays. Without slides and with devastating humor, he upstaged me in completing the demolition of any illusions that we could create more refined self-report measures of fatigue.

I wonder what he would say now.

But alas, people who suffer from CFS have to contend with a lot more than fatigue. Just ask them.

borg max[To be continued later if there is interest in my doing so. If there is, I will discuss the disappearance of objective measures of functioning from the PACE study and you will find out why you should find some 3-D glasses if you are going to search for reports of these outcomes.]