The NIMH issued a press release about the publication in JAMA Psychiatry of results of the ED-SAFE Study, the largest suicide intervention trial ever conducted in emergency departments (ED) in US.
“We expect that EDs are capable of helping individuals at risk for suicide attempts. Earlier ED-SAFE study findings showed that brief universal screening could improve detection of more individuals at risk,”, said Jane Pearson, Ph.D., chair of the Suicide Research Consortium at the NIMH. “These recent findings show that if ED care also includes further assessment, safety planning, and telephone-based support after discharge, there is a significant reduction in later suicide attempts among adults.”
“We were happy that we were able to find these results,” said lead author Ivan Miller, Ph.D., Professor of Psychiatry and Human Behavior at Brown University, Providence, Rhode Island. “We would like to have had an even stronger effect, but the fact that we were able to impact attempts with this population and with a relatively limited intervention is encouraging.”
The recently revamped website for the JAMA network of journals provided updated reports of the heavy traffic being drawn in by the article.
The new Key Points feature for important articles gave succinct, more quickly digestible summary of the study than the similarly spun abstract.
Question Do emergency department (ED)–initiated interventions reduce subsequent suicidal behavior among a sample of high-risk ED patients?
Findings In this multicenter study of 1376 ED patients with recent suicide attempts or ideation, compared with treatment as usual, an intervention consisting of secondary suicide risk screening by the ED physician, discharge resources, and post-ED telephone calls focused on reducing suicide risk resulted in a 5% absolute decrease in the proportion of patients subsequently attempting suicide and a 30% decrease in the total number of suicide attempts over a 52-week follow-up period.
Meaning For ED patients at risk for suicide, a multifaceted intervention can reduce future suicidal behavior.
The abstract elaborates:
Results A total of 1376 participants were recruited, including 769 females (55.9%) with a median (interquartile range) age of 37 (26-47) years. A total of 288 participants (20.9%) made at least 1 suicide attempt, and there were 548 total suicide attempts among participants. There were no significant differences in risk reduction between the TAU and screening phases (23% vs 22%, respectively). However, compared with the TAU phase, patients in the intervention phase showed a 5% absolute reduction in suicide attempt risk (23% vs 18%), with a relative risk reduction of 20%. Participants in the intervention phase had 30% fewer total suicide attempts than participants in the TAU phase. Negative binomial regression analysis indicated that the participants in the intervention phase had significantly fewer total suicide attempts than participants in the TAU phase (incidence rate ratio, 0.72; 95% CI, 0.52-1.00; P = .05) but no differences between the TAU and screening phases (incidence rate ratio, 1.00; 95% CI, 0.71-1.41; P = .99).
I have the benefit of having read the entire article a number of times, but there are some notable statistics being reported in the abstract and some crucial things being left out.
The phase of the study that involved only introducing screening into treatment as usual (TAU) had no effect on suicide attempts (p= .99). The claim of an effect of the more extensive intervention on suicide attempts depends on multivariate analyses that include a confidence interval that includes 1.0. (incidence rate ratio, 0.72; 95% CI, 0.52-1.00; P = .05).
From JAMA Psychiatry
Results are quite weak, at best. Pairwise comparisons are being reporting, first the screening versus TAU, then the more extensive intervention versus TAU. Missing is any reporting of the overall ANOVA testing whether there is at least one significant pairwise difference between groups. Obtaining such a significant difference would justify a post hoc look at the specific pairs. Given what we have already been told in the abstract, it is safe to assume no overall effect. This is a null trial. If we stuck to a priori statistical plans, we would have to say that a phased-in, comprehensive intervention with suicidal patients presenting in an emergency room failed to impact subsequent suicide attempts.
These findings contradict the statement of the NIMH Chair of the Suicide Research Consortium.
I know, it is arbitrary to make go/no go decisions based on an arbitrary level of significance, p< .05 or whatever. Yet, the implement/don’t implement and evidence-supported/not evidence-supported distinctions are binary. The best we can do is to set criteria based on a power analysis and avoid switching criteria when we don’t obtain the results that we would have liked.
We can stop here in our critique with the usual messages to avoid spinning of results in order to obtain politically expedient and socially satisfying, even if inaccurate conclusions. Once again, results of a trial are being exaggerated to justify a conclusion to which the researchers and policy makers are already committed.
But there is a lot more to be learned from this report of a large and historically significant trial.
Who was enrolled and what treatments were offered?
1376 adult participants were selected from persons presenting to 8 emergency departments across 7 states with participants with a suicide attempt or ideation within the week prior to the ED visit. Patients under 18 were excluded.
In the TAU phase, participants were treated according to the usual and customary care at each site, serving as the control for the subsequent study phases.
In the screening phase, sites implemented clinical protocols with universal suicide risk screening (the Patient Safety Screener) for all ED patients.
In the intervention phase, in addition to universal screening, all sites implemented a 3-component intervention: (1) a secondary suicide risk screening designed for ED physicians to evaluate suicide risk following an initial positive screen, (2) the provision of a self-administered safety plan and information to patients by nursing staff, and (3) a series of telephone calls to the participant, with the optional involvement of their significant other (SO), for 52 weeks following the index ED visit.
The outcome was the proportion of patients who made a suicide attempt and the total number of suicide attempts occurring during the 52-week follow-up period.
Overall, of 1376 participants, 288 (20.9%) made at least 1 suicide attempt during the 12-month period. In the TAU phase, 114 of 497 participants (22.9%) made a suicide attempt, compared with 81 of 377 participants (21.5%) in the screening phase and 92 of 502 participants (18.3%) in the intervention phase. Five attempts were fatal, with fatalities observed in the TAU phase (n = 2) and intervention phase (n = 3).
Suicide attempts can be interpreted as an outcome in itself or as a surrogate outcome for deaths by suicide. Despite the substantial sample size, there is no way that this study could have demonstrated a significant reduction in deaths by suicide. That reflects the infrequency of death by suicide, even in such a high risk population. The ratio of 57.6 suicide attempts per one death by suicide is much higher than what is typically observed (usually in the range of 100 or so per suicide. This probably reflects the high risk nature of this population, as well as the methodology for determining the serious of suicide attempts.
More evidence that screening for suicide doesn’t improve outcomes
This study adds to an accumulation of a lack of evidence that routine screening for suicide is either efficient or leads to less suicides.
Previously, I blogged about the SEYLE trial of a school-based intervention to prevent teen suicide. It was a large RCT, but failed to demonstrate that screening affected the likelihood of a suicide attempt. The null findings for the Screening by Professionals programme (ProfScreen) of SEYLE are generally downplayed.
Risk scales following self-harm have limited clinical utility and may waste valuable resources. Most scales performed no better than clinician or patient ratings of risk. Some performed considerably worse. Positive predictive values were modest. In line with national guidelines, risk scales should not be used to determine patient management or predict self-harm.
Nonetheless there is:
The Joint Commission. Detecting and treating suicide ideation in all settings. Sentinel Event Alert. 2016;(56):1-7.
Since the alert, many hospitals have implemented suicide risk screening without the benefit of evidence-based tools and clinical pathways, potentially increasing the risk of underdetection (ie, false-negatives) or overburdening limited mental health resources with false-positives.
Most patients in the ED-SAFE study were not recorded as receiving the intervention as intended.
Medical record review indicated that 449 of 502 participants (89.4%) had received a suicide risk assessment from their physician, but only 17 (3.9%) had documentation of the ED-SAFE standardized secondary screening was used.
Among those participants who completed the initial CLASP call, 114 (37.4%) reported having received a written safety plan in the ED.
You cannot fault these researchers for having failed to make a concerted effort to train personnel in the participating sites or to systematically implement the study protocol. See
A wealth of evidence suggests that is it is difficult to implement formal screening with self-report and interviewer-completed checklists in medical settings. Most medical personnel find such instruments intrusive and they are not efficient, anyway. Alex Mitchell and I documented this in our book, Screening for Depression in Clinical Practice: An Evidence-Based Guide
.In both the screening and intervention phase, it was difficult to get adherence to the protocol, in part because patients entering EDs are not necessarily cooperative. But more importantly, EDs in this study were not well-connected to the specialty mental health services needed for timely follow up. The accompanying editorial notes:
Although EDs have been conceptualized as key sites to identify and treat individuals at high risk for suicide,8 the troubling reality is that mental health resources are not available in most American EDs, and few universally screen for suicide risk.9,10 Notably, participating ED-SAFE study sites did not have psychiatric services within or adjacent to the ED in order to increase generalizability. Although time constraints, inadequate training, and lack of proper screening instruments have been cited as reasons clinicians do not routinely screen for suicide risk,8,10,11 the absence of psychiatric services in most EDs reflects disproportionately low cultural expectations of the ED in addressing potentially life-threatening mental health crises.
The realignment and reallocation of resources needed to address this practical and structural problem are not easily obtained. Clinical instances in which quick referral and follow up of a seriously suicidal patient are relatively infrequent. It is difficult to maintain the personnel and resources unencumbered until they are needed, especially in the face of other, pressing competing demands.
How will ED-SAFE be cited and entered into the accumulating literature concerning the difficulty getting reductions in lives lost to suicide?
The article reports the Number Needed to Treat (NNT) for patients receiving the comprehensive ED-SAFE intervention:
The NNT to prevent future suicidal behavior ranged between 13 and 22. This level of risk reduction compares favorably with other interventions to prevent major health issues, including statins to prevent heart attack (NNT = 104),23 antiplatelet therapy for acute ischemic stroke (NNT = 143),24 and vaccines to prevent influenza in elderly individuals (NNT = 20).25
But if the intervention is not effective, NNTs are misleading.
If the NIMH press release is taken as a sign, the ED-SAFE intervention will be interpreted as impressively effective. However, despite some spinning, the ED-SAFE researchers present the problems they encountered and the results they obtained in a way that the formidable obstacles to such a well-conceived effort succeeding are apparent. It would be unfortunate if the lessons to be learned are missed.
A chance to test your rules of thumb for quickly evaluating clinical trials of alternative or integrative medicine in prestigious journals.
A chance to increase your understanding of the importance of well-defined control groups and blinding in evaluating the risk of bias of clinical trials.
A chance to understand the difference between merely evidence-based treatments versus science-based treatments.
Lessons learned can be readily applied to many wasteful evaluations of psychotherapy with shared characteristics.
A press release from the University of Michigan about a study of acupressure for fatigue in cancer patients was churnaled – echoed – throughout the media. It was reproduced dozens of times, with little more than an editor’s title change from one report to the next.
Fortunately, the article that inspired all the fuss was freely available from the prestigious JAMA: Oncology. But when I gained access, I quickly saw that it was not worth my attention, based on what I already knew or, as I often say, my prior probabilities. Rules of thumb is a good enough term.
So the article became another occasion for us to practice our critical appraisal skills, including, importantly, being able to make reliable and valid judgments that some attention in the media is worth dismissing out of hand, even when tied to an article in a prestigious medical journal.
Zick SM, Sen A, Wyatt GK, Murphy SL, Arnedt J, Harris RE. Investigation of 2 Types of Self-administered Acupressure for Persistent Cancer-Related Fatigue in Breast Cancer Survivors: A Randomized Clinical Trial. JAMA Oncol. Published online July 07, 2016. doi:10.1001/jamaoncol.2016.1867.
All I needed to know was contained in a succinct summary at the Journal website:
This is a randomized clinical trial (RCT) in which two active treatments that
Lacked credible scientific mechanisms
Were predictably shown to be better than
A routine care that lacked the positive expectations and support.
A primary outcome assessed by subjectiveself-report amplified the illusory effectiveness of the treatments.
The original research appeared in a prestigious peer-reviewed journal published by the American Medical Association, not a disreputable journal on Beall’s List of Predatory Publishers.
Maybe this means publication in a peer-reviewed prestigious journal is insufficient to erase our doubts about the validity of claims.
The original research was performed with a $2.65 million peer-reviewed grant from the National Cancer Institute.
Maybe NIH is wasting scarce money on useless research.
What is acupressure?
According to the article
Acupressure, a method derived from traditional Chinese medicine (TCM), is a treatment in which pressure is applied with fingers, thumbs, or a device to acupoints on the body. Acupressure has shown promise for treating fatigue in patients with cancer,23 and in a study24 of 43 cancer survivors with persistent fatigue, our group found that acupressure decreased fatigue by approximately 45% to 70%. Furthermore, acupressure points termed relaxing (for their use in TCM to treat insomnia) were significantly better at improving fatigue than another distinct set of acupressure points termed stimulating (used in TCM to increase energy).24 Despite such promise, only 5 small studies24– 28 have examined the effect of acupressure for cancer fatigue.
Chairman Mao is quoted as saying “Even though I believe we should promote Chinese medicine, I personally do not believe in it. I don’t take Chinese medicine.”
Alan Levinovitz, author of the Slate article further argues:
In truth, skepticism, empiricism, and logic are not uniquely Western, and we should feel free to apply them to Chinese medicine.
After all, that’s what Wang Qingren did during the Qing Dynasty when he wrote Correcting the Errors of Medical Literature. Wang’s work on the book began in 1797, when an epidemic broke out in his town and killed hundreds of children. The children were buried in shallow graves in a public cemetery, allowing stray dogs to dig them up and devour them, a custom thought to protect the next child in the family from premature death. On daily walks past the graveyard, Wang systematically studied the anatomy of the children’s corpses, discovering significant differences between what he saw and the content of Chinese classics.
And nearly 2,000 years ago, the philosopher Wang Chong mounted a devastating (and hilarious) critique of yin-yang five phases theory: “The horse is connected with wu (fire), the rat with zi (water). If water really conquers fire, [it would be much more convincing if] rats normally attacked horses and drove them away. Then the cock is connected with ya (metal) and the hare with mao (wood). If metal really conquers wood, why do cocks not devour hares?” (The translation of Wang Chong and the account of Wang Qingren come from Paul Unschuld’s Medicine in China: A History of Ideas.)
A 10-week randomized, single-blind trial comparing self-administered relaxing acupressure with stimulating acupressure once daily for 6 weeks vs usual care with a 4-week follow-up was conducted. There were 5 research visits: at screening, baseline, 3 weeks, 6 weeks (end of treatment), and 10 weeks (end of washout phase). The Pittsburgh Sleep Quality Index (PSQI) and Long-Term Quality of Life Instrument (LTQL) were administered at baseline and weeks 6 and 10. The Brief Fatigue Inventory (BFI) score was collected at baseline and weeks 1 through 10.
Note that the trial was “single-blind.” It compared two forms of acupressure, relaxing versus stimulating. Only the patient was blinded to which of these two treatments was being provided, except patients clearly knew whether or not they were randomized to usual care. The providers were not blinded and were carefully supervised by the investigators and provided feedback on their performance.
The combination of providers not being blinded, patients knowing whether they were randomized to routine care, and subjective self-report outcomes together are the makings of a highly biased trial.
Usual care was defined as any treatment women were receiving from health care professionals for fatigue. At baseline, women were taught to self-administer acupressure by a trained acupressure educator.29 The 13 acupressure educators were taught by one of the study’s principal investigators (R.E.H.), an acupuncturist with National Certification Commission for Acupuncture and Oriental Medicine training. This training included a 30-minute session in which educators were taught point location, stimulation techniques, and pressure intensity.
Relaxing acupressure points consisted of yin tang, anmian, heart 7, spleen 6, and liver 3. Four acupoints were performed bilaterally, with yin tang done centrally. Stimulating acupressure points consisted of du 20, conception vessel 6, large intestine 4, stomach 36, spleen 6, and kidney 3. Points were administered bilaterally except for du 20 and conception vessel 6, which were done centrally (eFigure in Supplement 2). Women were told to perform acupressure once per day and to stimulate each point in a circular motion for 3 minutes.
Note that the control/comparison condition was an ill-defined usual care in which it is not clear that patients received any attention and support for their fatigue. As I have discussed before, we need to ask just what was being controlled by this condition. There is no evidence presented that patients had similar positive expectations and felt similar support in this condition to what was provided in the two active treatment conditions. There is no evidence of equivalence of time with a provider devoted exclusively to the patients’ fatigue. Unlike patients assigned to usual care, patients assigned to one of the acupressure conditions received a ritual delivered with enthusiasm by a supervised educator.
Note the absurdity of the naming of the acupressure points, for which the authority of traditional Chinese medicine is invoked, not evidence. This absurdity is reinforced by a look at a diagram of acupressure points provided as a supplement to the article.
Among the many problems with “acupuncture pressure points” is that sham stimulation generally works as well as actual stimulation, especially when the sham is delivered with appropriate blinding of both providers and patients. Another is that targeting places of the body that are not defined as acupuncture pressure points can produce the same results. For more elaborate discussion see Can we finally just say that acupuncture is nothing more than an elaborate placebo?
Worth looking back at credible placebo versus weak control condition
In a recent blog post I discussed an unusual study in the New England Journal of Medicine that compared an established active treatment for asthma to two credible control conditions, one, an inert spray that was indistinguishable from the active treatment and the other, acupuncture. Additionally, the study involved a no-treatment control. For subjective self-report outcomes, the active treatment, the inert spray and acupuncture were indistinguishable, but all were superior to the no treatment control condition. However, for the objective outcome measure, the active treatment was more effective than all of the three comparison conditions. The message is that credible placebo control conditions are superior to control conditions lacking and positive expectations, including no treatment and, I would argue, ill-defined usual care that lacks positive expectations. A further message is ‘beware of relying on subjective self-report measures to distinguish between active treatments and placebo control conditions’.
At week 6, the change in BFI score from baseline was significantly greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.6 [1.5] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.1 [1.6] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P = .29). At week 10, the change in BFI score from baseline was greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.3 [1.4] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.0 [1.5] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P > .99) (Figure 2). The mean percentage fatigue reductions at 6 weeks were 34%, 27%, and −1% in relaxing acupressure, stimulating acupressure, and usual care, respectively.
These are entirely expectable results. Nothing new was learned in this study.
The bottom line for this study is that there was absolutely nothing to be gained by comparing an inert placebo condition to another inert placebo condition to an uninformative condition without clear evidence the control condition offered control of nonspecific factors – positive expectations, support, and attention. This was a waste of patient time and effort, as well as government funds, and produced results that were potentially misleading to patients. Namely, results are likely to be misinterpreted the acupressure is an effective, evidence-based treatment for cancer-related fatigue.
How the authors explained their results
Why might both acupressure arms significantly improve fatigue? In our group’s previous work, we had seen that cancer fatigue may arise through multiple distinct mechanisms.15 Similarly, it is also known in the acupuncture literature that true and sham acupuncture can improve symptoms equally, but they appear to work via different mechanisms.40 Therefore, relaxing acupressure and stimulating acupressure could elicit improvements in symptoms through distinct mechanisms, including both specific and nonspecific effects. These results are also consistent with TCM theory for these 2 acupoint formulas, whereby the relaxing acupressure acupoints were selected to treat insomnia by providing more restorative sleep and improving fatigue and the stimulating acupressure acupoints were chosen to improve daytime activity levels by targeting alertness.
How could acupressure lead to improvements in fatigue? The etiology of persistent fatigue in cancer survivors is related to elevations in brain glutamate levels, as well as total creatine levels in the insula.15 Studies in acupuncture research have demonstrated that brain physiology,41 chemistry,42 and function43 can also be altered with acupoint stimulation. We posit that self-administered acupressure may have similar effects.
Among the fallacies of the authors’ explanation is the key assumption that they are dealing with a specific, active treatment effect rather than a nonspecific placebo intervention. Supposed differences between relaxing versus stimulating acupressure arise in trials with a high risk of bias due to unblinded providers of treatment and inadequate control/comparison conditions. ‘There is no there there’ to be explained, to paraphrase a quote attributed to Gertrude Stein
How much did this project cost?
According to the NIH Research Portfolios Online Reporting Tools website, this five-year project involved support by the federal government of $2,265,212 in direct and indirect costs. The NCI program officer for investigator-initiated R01CA151445 is Ann O’Marawho serves ina similar role for a number of integrative medicine projects.
How can expenditure of this money be justified for determining whether so-called stimulating acupressure is better than relaxing acupressure for cancer-related fatigue?
Consider what could otherwise have been done with these monies.
Evidence-based versus science based medicine
Proponents of unproven “integrative cancer treatments” can claim on the basis of the study the acupressure is an evidence-based treatment. Future Cochrane Collaboration Reviews may even cite this study as evidence for this conclusion.
I normally label myself as an evidence-based skeptic. I require evidence for claims of the efficacy of treatments and am skeptical of the quality of the evidence that is typically provided, especially when it comes from enthusiasts of particular treatments. However, in other contexts, I describe myself as a science based medicine skeptic. The stricter criteria for this term is that not only do I require evidence of efficacy for treatments, I require evidence for the plausibility of the science-based claims of mechanism. Acupressure might be defined by some as an evidence-based treatment, but it is certainly not a science-based treatment.
The efficacy of psychotherapy is often overestimated because of overreliance on RCTs that involve inadequate comparison/control groups. Adequately powered studies of the comparative efficacy of psychotherapy that include active comparison/control groups are infrequent and uniformly provide lower estimates of just how efficacious psychotherapy is. Most psychotherapy research includes subjective patient self-report measures as the primary outcomes, although some RCTs provide independent, blinded interview measures. A dependence on subjective patient self-report measures amplifies the bias associated with inadequate comparison/control groups.
However, there is a broader relevance to trials of psychotherapy provided to medically ill patients with a comparison/control condition that is inadequate in terms of positive expectations and support, along with a reliance on subjective patient self-report outcomes. The relevance is particularly important to note for conditions in which objective measures are appropriate, but not obtained, or obtained but suppressed in reports of the trial in the literature.
The most interesting things to be learned from a recent clinical trial of mindfulness-based stress reduction to cognitive behavior therapy for chronic back pain are not what the authors intend.
Noticing that some key information is missing from the study illustrates why we don’t need more studies like it.
We need more studies of mindfulness-based therapies with meaningful comparison/control groups.
We need evidence that patients assigned to mindfulness-based treatments actually practice mindfulness in their everyday lives.
We need to demonstrate that any efficacy of mindfulness depends upon patients assigned to it actually showing up.
We need to be alert how boundaries of the concept of mindfulness-based therapies are expanding. Reviewers should be cautious in integrating results from different studies claiming to evaluate “mindfulness.” There is growing clinical heterogeneity – different interventions, sometimes with very different components–that should be distinguished.
The importance of this study was underscored by (1) an accompanying editorial commentary, (2) free access and continue education credit for reading it, and (3) three multimedia links –a JAMA Report on the study, and audio in video interviews with the author.
My recent discussions   of articles in JAMA network journals that are accompanied by editorial commentaries have contemplated why particular studies were chosen for JAMA journals and the conflicts of interest that characterize editorial commentaries. This discussion will be somewhat different.
This commentary is definitely written by authors who have reasons to promote mindfulness. The commentary ends with a predictable non sequitur:
High-quality studies such as the clinical trial by Cherkin et al create a compelling argument for ensuring that an evidence-based health care system should provide access to affordable mind-body therapies.
Not exactly, if you stick to the evidence.
I will eventually comment on my usual questions of:
Why was this article published in a prestigious, generalist medical journal?
Why was it accompanied by an invited editorial commentary?
Why were the particular authors chosen for the commentary?
But the commentary isn’t that bad. It makes some reasonable points that might be overlooked. I will mainly focus on the article itself.
Importance. Mindfulness-based stress reduction (MBSR) has not been rigorously evaluated for young and middle-aged adults with chronic low back pain.
Objective. To evaluate the effectiveness for chronic low back pain of MBSR vs cognitive behavioral therapy (CBT) or usual care.
Design, Setting, and Participants. Randomized, interviewer-blind, clinical trial in an integrated health care system in Washington State of 342 adults aged 20 to 70 years with chronic low back pain enrolled between September 2012 and April 2014 and randomly assigned to receive MBSR (n = 116), CBT (n = 113), or usual care (n = 113).
Interventions. CBT (training to change pain-related thoughts and behaviors) and MBSR (training in mindfulness meditation and yoga) were delivered in 8 weekly 2-hour groups. Usual care included whatever care participants received.
Main Outcomes and Measures. Coprimary outcomes were the percentages of participants with clinically meaningful (≥30%) improvement from baseline in functional limitations (modified Roland Disability Questionnaire [RDQ]; range, 0-23) and in self-reported back pain bothersomeness (scale, 0-10) at 26 weeks. Outcomes were also assessed at 4, 8, and 52 weeks.
Results. There were 342 randomized participants, the mean (SD) [range] age was 49.3 (12.3) [20-70] years, 224 (65.7%) were women, mean duration of back pain was 7.3 years (range, 3 months-50 years), 123 (53.7%) attended 6 or more of the 8 sessions, 294 (86.0%) completed the study at 26 weeks, and 290 (84.8%) completed the study at 52 weeks. In intent-to-treat analyses at 26 weeks, the percentage of participants with clinically meaningful improvement on the RDQ was higher for those who received MBSR (60.5%) and CBT (57.7%) than for usual care (44.1%) (overall P = .04; relative risk [RR] for MBSR vs usual care, 1.37 [95% CI, 1.06-1.77]; RR for MBSR vs CBT, 0.95 [95% CI, 0.77-1.18]; and RR for CBT vs usual care, 1.31 [95% CI, 1.01-1.69]). The percentage of participants with clinically meaningful improvement in pain bothersomeness at 26 weeks was 43.6% in the MBSR group and 44.9% in the CBT group, vs 26.6% in the usual care group (overall P = .01; RR for MBSR vs usual care, 1.64 [95% CI, 1.15-2.34]; RR for MBSR vs CBT, 1.03 [95% CI, 0.78-1.36]; and RR for CBT vs usual care, 1.69 [95% CI, 1.18-2.41]). Findings for MBSR persisted with little change at 52 weeks for both primary outcomes.
Conclusions and Relevance Among adults with chronic low back pain, treatment with MBSR or CBT, compared with usual care, resulted in greater improvement in back pain and functional limitations at 26 weeks, with no significant differences in outcomes between MBSR and CBT. These findings suggest that MBSR may be an effective treatment option for patients with chronic low back pain.
Among the interesting things to note in the abstract is that there were only modest (p <.04) differences between either MBSR or CBT and usual care, which was described as “whatever participants received.” The MBSR was augmented by yoga. We cannot distinguish the effects of mindfulness from this added component.
Unfortunately, if you do a search for “usual care” or “yoga” in the article itself or in the trial registration or protocol, you won’t learn about what the nature of the usual care or yoga. You will learn, however, in the article that:
Thirty of the 103 (29%) participants attending at least 1 MBSR session reported an adverse event (mostly temporarily increased pain with yoga). Ten of the 100 (10%) participants who attended at least 1 CBT session reported an adverse event (mostly temporarily increased pain with progressive muscle relaxation). No serious adverse events were reported.
Some outcomes that would be of interest to policy makers, clinicians, in patients are relegated to a secondary status: whether medication was used in the past week, whether back exercises were done for at least three days, and whether there was general exercise for more than three days.
There were no consistent effects of these interventions versus routine care for these variables.
Intensity of treatment
Unless a study is focusing simply on differences in intensity of treatment, comparisons of treatments should ensure that the conditions being compared are equivalent in the intensity and frequency of clinical contact. In this trial:
The interventions were comparable in format (group), duration (2 hours/week for 8 weeks, although the MBSR program also included an optional 6-hour retreat), frequency (weekly), and number of participants per group.
Only about a quarter of the patients assigned to MBSR attended the six hour retreat, compounding the problems of adherence (around half of patients assigned to either to MBSR or CBT attended at least six group sessions), which also suggests that the 20% of patients lost to follow-up may not be random. That poses issues for the fancy statistical techniques used to compensate for attrition, which assume the missing data are random.
But the bigger issue is that the interventions provide a lot more contact than is typically available in routine care for chronic pain. There are lots of opportunities for important differences between the interventions and control group in nonspecific factors, like supportive accountability.
More contact communicates the patients that they matter more. Getting more interaction with providers means patients have more of a sense that their adherence matters (i.e., they are accountable) to someone besides themselves for activities like daily back exercises. The more intensive treatment also influences self-reported subjective outcomes, even when effects are not shown for other important variables, like decreased use of medication.
Distinguishing MBSR from CBT
MSBR is described as
MBSR was modeled closely after the original MBSR program—adapted from the 2009 MBSR instructor’s manual by a senior MBSR instructor. The MBSR program does not focus specifically on a particular condition such as pain. All classes included didactic content and mindfulness practice (body scan, yoga, meditation [attention to thoughts, emotions, and sensations in the present moment without trying to change them, sitting meditation with awareness of breathing, and walking meditation]).
Gentle Hatha Yoga – practiced with mindful awareness of the body
Sitting Meditation – mindfulness of breath, body, feelings, thoughts, emotions, and choiceless awareness
My concern is that an RCT has been published in JAMA concludes that a combined mindfulness and yoga treatment “may be an effective treatment option for patients with chronic low back pain.” Past research by some of the authors this JAMA article suggests that yoga by itself provides only short-term benefits for patients with chronic pain. This particular study had worrisome adverse effect from the yoga component. Why add something unnecessary to treatments if they may have adverse effects?
Although the providers of MBSR are described as having training in MBSR, there is no mention of training specifically for yoga for patients with chronic back pain.
Practitioners of yoga who have intermittent chronic pain tell me that it has been very important for them to find yoga instructors who are competent to deal with pain. A single, ill-chosen exercise can inflict long-term damage on patient who already has chronic back pain.
CBT is described as
The CBT protocol included CBT techniques most commonly applied and studied for chronic low back pain. The intervention included (1) education about chronic pain, relationships between thoughts and emotional and physical reactions, sleep hygiene, relapse prevention, and maintenance of gains; and (2) instruction and practice in changing dysfunctional thoughts, setting and working toward behavioral goals, relaxation skills (abdominal breathing, progressive muscle relaxation, and guided imagery), activity pacing, and pain-coping strategies. Between-session activities included reading chapters of The Pain Survival Guide: How toReclaim Your Life. Mindfulness, meditation, and yoga techniques were proscribed in CBT; methods to challenge dysfunctional thoughts were proscribed in MBSR.
Many stripped-down versions of CBT offered in primary care do not have all these components, leaving out the abdominal breathing, progressive muscle relaxation, and guided imagery. Many eclectic versions of mindfulness training incorporate progressive muscle relaxation.
Given the about 50% attendance to at least six sessions in the modest uptake of the mindfulness retreat, I’m not sure that that these two interventions often distinctly different experiences. It’s doubtful that questions of whether these two treatments are characterized by distinctly different mechanisms could be addressed in this trial.
Routine Care for Chronic Pain in the US
Routine care for chronic back pain differs widely in the United States. Episodes of care – a clustering of visits around a complaint – do not typically occur beyond a month or couple of visits.
Routine care can be no care at all after initial evaluation in which diagnosis of chronic back pain is recorded.
But routine care for chronic back pain that is guideline-congruent can ironically prove iatrogenic. It can involve overtreatment, unnecessary exposure to opioids and antidepressants without adequate evaluation or follow up, and unnecessary surgeries.
We are living in the aftermath of pain being identified as the Fifth Vital Sign. In some settings, every patient has to be assessed with a simple rating scale of pain, regardless of the reason for visit. Providers have to document that they asked about pain and what procedures or referrals they provided if the patient reported other than “no pain.” Providers are penalized for not recording interventions when there is any pain indicated. They may lose insurance reimbursement for the visit.
There is currently a campaign to overturn these ridiculous and harmful guidelines, which are not evidence-based. The effect of the guidelines having that prescribed opioid pain medications rivaled heroin in terms of its negative public health impact. There is also been an epidemic of unnecessary back surgery, sometimes with crippling adverse effects.
But the guidelines have also induced despair and an unwillingness to address a condition that often must be endured with minimal intervention, rather than burdening clinicians and patients with the unrealistic expectation that it will be cured or eliminated. Clinicians are not good at dealing with conditions for which they do not have solutions.
I suspect that many of the patients in this study who remained assigned to routine care were getting minimal or no care. They were being provided little or no monitoring or reassessment of pain medications; little encouragement to engage in back exercises with regularity needed for them to be effective; and little support in the face of success and failures of getting on with their life in the face of chronic back pain.
Once again, we have an expensive study of mindfulness that does not address the question of whether any apparent effectiveness is simply due to increased intensity and frequency contact with the medical system and support.
We don’t know if the intervention is simply correcting the inadequacies or lack of routine care.
We cannot determine whether a better use of funds would be to improve the overall quality of routine care for chronic pain, including for the bulk of patients who have no interest in devoting the necessary time in the daily lives to practicing mindfulness.
The editorial commentary
The intended answer to the question posed by the title is obviously yes: Is It Time to Make Mind-Body Approaches Available for Chronic Low Back Pain?
The assessment provided by the commentary is:
A compelling argument for ensuring that an evidence-based health care system should provide access to affordable mind-body therapies.
Like the authors of the trial itself, the commentators are trying to get reimbursement for treatment that is provided through a designated mind-body center. Whether or not mind-body centers improve patient outcomes, they are useful for the intensive competitive marketing of medical centers.
Like the authors, the commentators are not only competing for funds from the National Center for Complementary and Integrative Health [NCCIH}, formerly known as The National Center for Complementary and Alternative Medicine [NCCAM], they hoping to get more funds to this National Institute of Health.
The authors of the trial are connected. They have previously co-authored a study of acupuncture for chronic back pain with NCCAM program officers who are listed in the article as influencing and revising interpretations of the data. We have ample evidence acupuncture is not a science based medicine intervention chronic back pain. Any apparent effects are nonspecific. An illusion of effectiveness is likely to emerge in a comparison with routine care that lacks these nonspecific effects. I can’t believe the authors don’t know that.
So we’ve come in another route, but we’ve arrived at the same old story.
Authors with connections get their articles into prestigious, generalist medical journals.
Even though the evidence does not report the strong claims that are made, they are amplified with goodies like the article been freely available, having free continuing education, and other promotions like audio and video links.
Authors of the invited commentaries are written by persons with similar connections and similar vested interest.
I don’t think this article should have made it into JAMA. I don’t think it deserved an editorial commentary. If one were nonetheless provided, it should interpret for a general medical audience issues of the inadequacies of routine care, and inadequacy of routine care as a comparison group, and the practical issues of allocating scarce resources. An accompanying editorial should be reserved for articles more special than this one, and should offer a more detached, objective assessment of the strengths and weaknesses of a study and their implications.
MBSR spans New Age religious and science, as well as, evidence-based versus alternative, non-evidence-based treatments. The new agey aspect is emphasized in the titling of the trial registration including a designation as “CAM [complementary and alternative medicine] and Conventional Mind-Body Therapies.”
We must be alert to MBSR being hyped, promoted beyond what is justified by available evidence, – and now – it leading the charge of non-evidence-based treatments into reimbursement and competition for scarce resources in an already overexpensive and malfunctioning health system.
I continue to follow Goldacre’s work closely and cite him often. I also pay particular attention to John Ioannidis’ follow up of his documentation that much of what we found in the biomedical literature is false or exaggerated, like:
Many trials are entirely lost, as they are not even registered. Substantial diversity probably exists across specialties, countries, and settings. Overall, in a survey conducted in 2012, only 30% of journal editors requested or encouraged trial registration.
In a seeming parallel world, I keep showing that in psychology the situation is worse. I had a simple explanation why that I now recognize was naïve: Needed reforms enforced by regulatory bodies like the US Food and Drug Administration (FDA) take longer to influence the psychotherapy literature, where there are no such pressures.
I think we now know that in both biomedicine and, again, psychology, that broad declarations of government and funding bodies and even journals’ of a commitment to disclose a conflict of interest, registering trials, sharing data, are insufficient to ensure that the literature gets cleaned up.
WHO’s 2005 statement called for all interventional clinical trials to be registered. Subsequently, there has been an increase in clinical trial registration prior to the start of trials. This has enabled tracking of the completion and timeliness of clinical trial reporting. There is now a strong body of evidence showing failure to comply with results-reporting requirements across intervention classes, even in the case of large, randomised trials [3–7]. This applies to both industry and investigator-driven trials. In a study that analysed reporting from large clinical trials (over 500 participants) registered on clinicaltrials.gov and completed by 2009, 23% had no results reported even after a median of 60 months following trial completion; unpublished trials included nearly 300,000 participants . Among randomised clinical trials (RCTs) of vaccines against five diseases registered in a variety of databases between 2006–2012, only 29% had been published in a peer-reviewed journal by 24 months following study completion . At 48 months after completion, 18% of trials were not reported at all, which included over 24,000 participants. In another study, among 400 randomly selected clinical trials, nearly 30% did not publish the primary outcomes in a journal or post results to a clinical trial registry within four years of completion .
Why is this a problem?
It affects understanding of the scientific state of the art.
It leads to inefficiencies in resource allocation for both research and development and financing of health interventions.
It creates indirect costs for public and private entities, including patients themselves, who pay for suboptimal or harmful treatments.
It potentially distorts regulatory and public health decision making.
Furthermore, it is unethical to conduct human research without publication and dissemination of the results of that research. In particular, withholding results may subject future volunteers to unnecessary risk.
How the psychotherapy literature is different from a medical literature.
Unfortunately for the trustworthiness of the psychotherapy literature, the WHO statement is limited to medical interventions. We probably won’t see any direct effects on the psychotherapy literature anytime soon.
The psychotherapy literature has all the problems in implementing reforms that we see in biomedicine – and more. Professional organizations like the American Psychological Association and British Psychological Society publishing psychotherapy research have the other important function of ensuring their clinical membership developer’s employment opportunities. More opportunities for employment show the organizations are meeting their members’ needs this results in more dues-paying members.
The organizations don’t want to facilitate third-party payers citing research that particular interventions that their membership is already practicing are inferior and need to be abandoned. They want the branding of members practicing “evidence-based treatment” but not the burden of members having to make decisions based on what is evidence-based. More basically, psychologists’ professional organizations are cognizant of the need to demonstrate a place in providing services that are reimbursed because they improve mental and physical health. In this respect, they are competing with biomedical interventions for the same pot of money.
So, journals published by psychological organizations have vested interests and not stringently enforcing standards. The well-known questionable research practices of investigators are strengthened by questionable publication practices, like confirmation bias, that are tied to the organizations’ institutional agenda.
And the lower status journals that are not published by professional organizations may compromise their standards for publishing psychotherapy trials because of the status that having these articles confers.
Increasingly, medical journals like The Lancet and The Lancet Psychiatry are seen as more prestigious for publishing psychotherapy trials, but they take less seriously the need to enforce standards for psychotherapy studies the regulatory agencies require for biomedical interventions. Example: The Lancet violated its own policies and accepted publication Tony Morrison’s CBT for psychosis study for publication when it wasn’t registered until after the trial and started. The declared outcomes were vague enough so they could be re-specified after results were known .
Bottom line, in the case of publishing all psychotherapy trials consistent with published protocols: the problem is taken less seriously than if it were a medical trial.
Overall, there is less requirement for psychotherapy trials be registered and less attention paid by editors and reviewers as to whether trials were registered, and whether outcomes are analytic plans were consistent between the registration in the published study.
In a recent blog post, I identified results of a trial that had been published with switched outcomes and then re-published in another paper with different outcomes, without the registration even being noted.
But for all the same reasons cited by the recent WHO statement, publication of all psychotherapy trials matters.
Recovering an important CBT trial gone missing
I am now going to review the impact of a large, well resourced study of CBT for psychosis remaining on published. I identified the study by a search of the ISRCTN:
The ISRCTN registry is a primary clinical trial registry recognised by WHO and ICMJE that accepts all clinical research studies (whether proposed, ongoing or completed), providing content validation and curation and the unique identification number necessary for publication. All study records in the database are freely accessible and searchable.
I then went back to the literature to see what it happened with it. Keep in mind that this step is not even possible for the many psychotherapy trials that are simply not registered at all.
Many trials are not registered because they are considered pilot and feasibility studies and therefore not suitable for entering effect sizes into the literature. Yet, if significant results are found, they will be exaggerated because they come from an underpowered study. And such results become the basis for entering results into the literature as if it were a planned clinical trial, with considerable likelihood of not being able to be replicated.
There are whole classes of clinical and health psychology interventions that are dominated by underpowered, poor quality studies that should have been flagged as for evidence or excluded altogether. So, in centering on this trial, I’m picking an important example because it was available to be discovered, but there is much of their there is not available to be discovered, because it was not registered.
CBT versus supportive therapy for persistent positive symptoms in psychotic disorders
The trial registration indicates that recruitment started on January 1, 2007 and ended on December 31, 2008.
No publications are listed. I and others have sent repeated emails to the principal investigator inquiring about any publications and have failed to get a response. I even sent a German colleague to visit him and all he would say was that results were being written up. That was two years ago.
Google Scholar indicates the principal investigator continues to publish, but not the results of this trial.
Klingberg S, Wittorf A, Meisner C, Wölwer W, Wiedemann G, Herrlich J, Bechdolf A, Müller BW, Sartory G, Wagner M, Kircher T. Cognitive behavioural therapy versus supportive therapy for persistent positive symptoms in psychotic disorders: The POSITIVE Study, a multicenter, prospective, single-blind, randomised controlled clinical trial. Trials. 2010 Dec 29;11(1):123.
The methods section makes it sound like a dream study with resources beyond what is usually encountered for psychotherapy research. If the protocol is followed, the study would be an innovative, large, methodologically superior study.
Methods/Design: The POSITIVE study is a multicenter, prospective, single-blind, parallel group, randomised clinical trial, comparing CBT and ST with respect to the efficacy in reducing positive symptoms in psychotic disorders. CBT as well as ST consist of 20 sessions altogether, 165 participants receiving CBT and 165 participants receiving ST. Major methodological aspects of the study are systematic recruitment, explicit inclusion criteria, reliability checks of assessments with control for rater shift, analysis by intention to treat, data management using remote data entry, measures of quality assurance (e.g. on-site monitoring with source data verification, regular query process), advanced statistical analysis, manualized treatment, checks of adherence and competence of therapists.
The study was one of the rare ones providing for systematic assessments of adverse events and any harm to patients. Preumably if CBT is powerful enough to affect positive change, it can have negative effects as well. But these remain entirely a matter of speculation.
Ratings of outcome were blinded and steps were taken to preserve the blinding even if an adverse event occurred. This is important because blinded trials are less susceptible to investigator bias.
Another unusual feature is the use of a supportive therapy (ST) credible, but nonspecific condition as a control/comparison.
ST is thought as an active treatment with respect to the patient-therapist relationship and with respect to therapeutic commitment . In the treatment of patients suffering from psychotic disorders these ingredients are viewed to be essential as it has been shown consistently that the social network of these patients is limited. To have at least one trustworthy person to talk to may be the most important ingredient in any kind of treatment. However, with respect to specific processes related to modification of psychotic beliefs, ST is not an active treatment. Strategies specifically designed to change misperceptions or reasoning biases are not part of ST.
Use of this control condition allows evaluation of the important question of whether any apparent effects of CBT are due to the active ingredients of that approach or to the supportive therapeutic relationship within which the active ingredients are delivered.
Being able to rule out the effects of CBT are due to nonspecific effects justifies the extra resources needed to provide specialized training in CBT, if equivalent effects are obtained in the ST group, it suggests that equivalent outcomes can be achieved simply by providing more support to patients, presumably by less trained and maybe even lay personnel.
It is a notorious feature of studies of CBT for psychosis that they lack comparison/control groups in any way equivalent to the CBT in terms of nonspecific intensity, support, encouragement, and positive expectations. Too often, the control group are ill-defined treatment as usual (TAU) that lacks regular contact and inspires any positive expectations. Basically CBT is being compared to inadequate treatment and sometimes no treatment and so any apparent effects that are observed are due to correcting these inadequacies, not any active ingredient.
The protocol hints in passing at the investigators’ agenda.
This clinical trial is part of efforts to intensify psychotherapy research in the field of psychosis in Germany, to contribute to the international discussion on psychotherapy in psychotic disorders, and to help implement psychotherapy in routine care.
And so, if the results would not contribute to getting psychotherapy implemented in routine care in Germany, do they get buried?
Science & Politics of CBT for Psychosis
A rollout of a CBT study for psychosis published in Lancet made strong claims in a BBC article and audiotape promotion.
The attention attracted critical scrutiny that these claims couldn’t sustain. After controversy on Twitter, the BBC headline was changed to a more modest claim.
The study retained fewer participants receiving CBT at the end of the study than authors.
The comparison treatment was ill-defined, but for some patients meant no treatment because they were kicked out of routine care for refusing medication.
A substantial proportion of patients assigned to CBT began taking antipsychotic medication by the end of the study.
There was no evidence that the response to CBT was comparable to that achieved with antipsychotic medication alone in clinical trials.
No evidence that less intensive, nonspecific supportive therapy would not have achieved the same results as CBT.
And the authors ended up conceding in a letter to the editor that their trial had been registered after data collection had started and it did not produce evidence of equivalence to antipsychotic medication.
Politics have overcome the science in CBT for psychosis
Recently the British Psychological Society invited me to give a public talk entitled CBT: The Science & Politics behind CBT for Psychosis. In this talk, which was filmed…, I highlight the unquestionable bias shown by the National Institute of Clinical Excellence (NICE) committee (CG178) in their advocacy of CBT for psychosis.
The bias is not concealed, but unashamedly served-up by NICE as a dish that is high in ‘evidence-substitute’, uses data that are past their sell-by-date and is topped-off with some nicely picked cherries. I raise the question of whether committees – with such obvious vested interests – should be advocating on mental health interventions.
I present findings from our own recent meta-analysis (Jauhar et al 2014) showing that three-quarters of all RCTs have failed to find any reduction in the symptoms of psychosis following CBT. I also outline how trials which have used non-blind assessment of outcomes have inflated effect sizes by up to 600%. Finally, I give examples where CBT may have adverse consequences – both for the negative symptoms of psychosis and for relapse rates.
A pair of well-conducted and transparently reported Cochrane reviews suggest there is little evidence for the efficacy of CBT for psychosis (*)
Yet, even after having to be tempered in the face of criticism, the original claims of the Morrison study get echoed in the antipsychiatry Understanding Psychosis:
“Other forms of therapy can also be helpful, but so far it is CBTp that has been most intensively researched. There have now been several meta-analyses (studies using a statistical technique that allows findings from various trials to be averaged out) looking at its effectiveness. Although they each yield slightly different estimates, there is general consensus that on average, people gain around as much benefit from CBT as they do from taking psychiatric medication.”
Such misinformation can confuse patients making difficult decisions about whether to accept antipsychotic medication.
If the results from the missing CBT for psychosis study became available…
If the Klingberg study were available and integrated with existing data, it would be one of the largest and highest quality studies and it would provide insight into any advantage of CBT for psychosis. For those who can be convinced by data, a null finding from a large studythat added to mostly small and methodologically unsophisticated studies could be decisive.
Two recent trials of CBT for established psychosis provide examples of good practice for reporting harms (Klingberg et al. 20102012) and CONSORT (Consolidated Standards of Reporting Trials) provide a sensible set of recommendations (Ioannidis et al. 2004).
Yet, it does not provide indicate why it is missing and is not included in a list of completed but unpublished studies. Yet, the protocol indicates a study considerably larger than any of the studies that were included.
To communicate a better sense of the potential importance of this missing study and perhaps place more pressures on the investigators to release its results, I would suggest that future meta-analyses state:
The protocol for Klingberg et al. Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders indicates that recruitment was completed in 2008. No publications have resulted. Emails to Professor Klingberg about the status of the study failed to get a response. If the study were completed consistent with its protocol, it would represent one of the largest studies of CBT for psychosis ever and one of the few with a fair comparison between CBT and supportive therapy. Inclusion of the results could potentially substantially modify the conclusions of the current meta-analysis.
John Ioannidis, the “scourge of sloppy science” has documented again and again that the safeguards being introduced into the biomedical literature against untrustworthy findings are usually ineffective. In Ioannidis’ most recent report , his group:
…Assessed the current status of reproducibility and transparency addressing these indicators in a random sample of 441 biomedical journal articles published in 2000–2014. Only one study provided a full protocol and none made all raw data directly available.
…The relatively straightforward task of comparing reported outcomes from clinical trials to what the researchers said they planned to measure before the trial began. And what they’ve found is a bit sad, albeit not entirely surprising.
Ben Goldacre specifically excludes psychotherapy studies from this project. But there are reasons to believe that the psychotherapy literature is less trustworthy than the biomedical literature because psychotherapy trials are less frequently registered, adherence to CONSORT reporting standards is less strict, and investigators more routinely refuse to share data when requested.
Untrustworthiness of information provided in the psychotherapy literature can have important consequences for patients, clinical practice, and public health and social policy.
The study that I will review twice switched outcomes in its reports, had a poorly chosen comparison control group and flawed analyses, and its protocol was registered after the study started. Yet, the study will likely provide data for decision-making about what to do with primary care patients with a few unexplained medical symptoms. The recommendation of the investigators is to deny these patients medical tests and workups and instead provide them with an unvalidated psychiatric diagnosis and a treatment that encourages them to believe that their concerns are irrational.
In this post I will attempt to track what should have been an orderly progression from (a) registration of a psychotherapy trial to (b) publishing of its protocol to (c) reporting of the trial’s results in the peer-reviewed literature. This exercise will show just how difficult it is to make sense of studies in a poorly documented psychological intervention literature.
I find lots of surprises, including outcome switching in both reports of the trial.
The second article reporting results of the trial that does not acknowledge registration, minimally cites the first reports of outcomes, and hides important shortcomings of the trial. But the authors inadvertently expose new crucial shortcomings without comment.
Detecting important inconsistencies between registration and protocols and reports in the journals requires an almost forensic attention to detail to assess the trustworthiness of what is reported. Some problems hide in plain sight if one takes the time to look, but others require a certain clinical connoisseurship, a well-developed appreciation of the subtle means by which investigators spin outcomes to get novel and significant findings.
Outcome switching and inconsistent cross-referencing of published reports of a clinical trial will bedevil any effort to integrate the results of the trial into the larger literature in a systematic review or meta-analysis.
Two journals – Psychosomatic Medicine and particularly Journal of Psychosomatic Research– failed to provide adequate peer review of articles based on this trial, in terms of trial registration, outcome switching, and allowing multiple reports of what could be construed as primary outcomes from the same trial into the literature.
Despite serious problems in their interpretability, results of this study are likely to be cited and influence far-reaching public policies.
The generalizability of results of my exercise is unclear, but my findings encourage skepticism more generally about published reports of results of psychotherapy interventions. It is distressing that more alarm bells have not been sounded about the reports of this particular study.
Magallón R, Gili M, Moreno S, Bauzá N, García-Campayo J, Roca M, Ruiz Y, Andrés E. Cognitive-behaviour therapy for patients with Abridged Somatization Disorder (SSI 4, 6) in primary care: a randomized, controlled study. BMC Psychiatry. 2008 Jun 22;8(1):47.
Readers can more fully appreciate the problems that I uncovered if I work backwards from the second published report of outcomes from the trial. Published in Journal of Psychosomatic Research, the article is behind a pay wall, but readers can write to the corresponding author for a PDF: firstname.lastname@example.org. This person is also the corresponding author for the second paper in Psychosomatic Medicine, and so readers might want to request both papers.
Gili M, Magallón R, López-Navarro E, Roca M, Moreno S, Bauzá N, García-Cammpayo J. Health related quality of life changes in somatising patients after individual versus group cognitive behavioural therapy: A randomized clinical trial. Journal of Psychosomatic Research. 2014 Feb 28;76(2):89-93.
The title is misleading in its ambiguity because “somatising” does not refer to an established diagnostic category. In this article, it refers to an unvalidated category that encompasses a considerable proportion of primary care patients, usually those with comorbid anxiety or depression. More about that later.
The article does not list the registration, and does not provide the citation when indicating that a trial protocol is available. The only subsequent citations of the trial protocol are ambiguous:
More detailed design settings and study sample of this trial have been described elsewhere [14,16], which explain the effectiveness of CBT reducing number and severity of somatic symptoms.
The above quote is also the sole citation of a key previous paper that presents outcomes for the trial. Only an alert and motivated reader would catch this. No opportunity within the article is provided for comparing and contrasting results of the two papers.
The brief introduction displays a decided puffer fish phenomenon, exaggerating the prevalence and clinical significance of the unvalidated “abridged somatization disorder.” Essentially, the authors invoke the problematic, but accepted psychiatric diagnostic categories somatoform or somatization disorders in claiming validity for a diagnosis with much less stringent criteria. Oddly, the category has different criteria when applied to men and women: men require four unexplained medical symptoms, whereas women require six.
I haven’t previously counted the term “abridged” in psychiatric diagnosis. Maybe the authors mean “subsyndromal,” as in “subsyndromal depression.” This is a dubious labeling because it suggested all characteristics needed for diagnosis are not present, some of which may be crucial. Think of it: is a persistent cough subsyndromal lung cancer or maybe emphysema? References to symptoms being “subsyndromal”often occur in context where exaggerated claims about prevalence are being made with inappropriate, non-evidence-based inferences about treatment of milder cases from the more severe.
A casual reader might infer that the authors are evaluating a psychiatric treatment with wide applicability to as many as 20% of primary care patients. As we will see, the treatment focuses on discouraging any diagnostic medical tests and trying to convince the patient that their concerns are irrational.
The introduction identifies the primary outcome of the trial:
The aim of our study is to assess the efficacy of a cognitive behavioural intervention program on HRQoL [health-related quality of life] of patients with abridged somatization disorder in primary care.
This primary outcome is inconsistent with what was reported in the registration, the published protocol, and the first article reporting outcomes. The earlier report does not even mention the inclusion of a measure of HRQoL, measured by the SF-36. It is listed in the study protocol as a “secondary variable.”
The opening of the methods section declares that the trial is reported in this paper consistent with the Consolidated Standards of Reporting Clinical Trials (CONSORT). This is not true because the flowchart describing patients from recruitment to follow-up is missing. We will see that when it is reported in another paper, some important information is contained in that flowchart.
The methods section reports only three measures were administered: a Standardized Polyvalent Psychiatric Interview (SPPI), a semistructured interview developed by the authors with minimal validation; a screening measure for somatization administered by primary care physicians to patients whom they deemed appropriate for the trial, and the SF-36.
Crucial details are withheld about the screening and diagnosis of “abridged somatization disorder.” If these details had been presented, a reader would further doubt the validity of this unvalidated and idiosyncratic diagnosis.
Few readers, even primary care physicians or psychiatrists, will know what to make of the Smith’s guidelines (Googling it won’t yield much), which is essentially a matter of simply sending a letter to the referring GP. Sending such a letter is a notoriously ineffective intervention in primary care. It mainly indicates that patients referred to a trial did not get assigned to an active treatment. As I will document later, the authors were well aware that this would be an ineffectual control/comparison intervention, but using it as such guarantees that their preferred intervention would look quite good in terms of effect size.
The two active interventions are individual- and group-administered CBT which is described as:
Experimental or intervention group: implementation of the protocol developed by Escobar [21,22] that includes ten weekly 90-min sessions. Patients were assessed at 4 time points: baseline, post-treatment, 6 and 12 months after finishing the treatment. The CBT intervention mainly consists of two major components: cognitive restructuring, which focuses on reducing pain-specific dysfunctional cognitions, and coping, which focuses on teaching cognitive and behavioural coping strategies. The program is structured as follows. Session 1: the connection between stress and pain. Session 2: identification of automated thoughts. Session 3: evaluation of automated thoughts. Session 4: questioning the automatic thoughts and constructing alternatives. Session 5: nuclear beliefs. Session 6: nuclear beliefs on pain. Session 7: changing coping mechanisms. Session 8: coping with ruminations, obsessions and worrying. Session 9: expressive writing. Session 10: assertive communication.
There is sparse presentation of data from the trial in the results section, but some fascinating details await a skeptical, motivated reader.
Table 1 displays social demographic and clinical variables. Psychiatric comorbidity is highly prevalent. Readers can’t tell exactly what is going on, because the authors’ own interview schedule is used to assess comorbidity. But it appears that all but a small minority of patients diagnosed with “abridged somatization disorder” have substantial anxiety and depression. Whether these symptoms meet formal criteria cannot be determined. There is no mention of physical comorbidities.
But there is something startling awaiting an alert reader in Table 2.
There is something very odd going on here, and very likely a breakdown of randomization. Baseline differences in the key outcome measure, SF-36 are substantially greater between groups than any within-group change. The treatment as usual condition (TAU) has much lower functioning [lower scores mean lower functioning] than the group CBT condition, which is substantially below the individual CBT difference.
If we compare the scores to adult norms, all three groups of patients are poorly functioning, but those “randomized” to TAU are unusually impaired, strikingly more so than the other two groups.
Keep in mind that evaluations of active interventions, in this case CBT, in randomized trials always involve a between difference between groups, not just difference observed within a particular group. That’s because a comparison/control group is supposed to be equivalent for nonspecific factors, including natural recovery. This trial is going to be very biased in its evaluation of individual CBT, a group within which patients started much higher in physical functioning and ended up much higher. Statistical controls fail to correct for such baseline differences. We simply do not have an interpretable clinical trial here.
Moreno S, Gili M, Magallón R, Bauzá N, Roca M, del Hoyo YL, Garcia-Campayo J. Effectiveness of group versus individual cognitive-behavioral therapy in patients with abridged somatization disorder: a randomized controlled trial. Psychosomatic medicine. 2013 Jul 1;75(6):600-8.
The title indicates that the patients are selected on the basis of “abridged somatization disorder.”
The abstract prominently indicates the trial registration number (ISRCTN69944771), which can be plugged into Google to reach the publicly accessible registration.
If a reader is unaware of the lack of validation for “abridged somatization disorder,” they probably won’t infer that from the introduction. The rationale given for the study is that
A recently published meta-analysis (18) has shown that there has been ongoing research on the effectiveness of therapies for abridged somatization disorder in the last decade.
Checking that meta-analysis, it only included a single null trial for treatment of abridged somatization disorder. This seems like a gratuitous, ambiguous citation.
I was surprised to learn that in three of the five provinces in which the study was conducted, patients
…Were not randomized on a one-to-one basis but in blocks of four patients to avoid a long delay between allocation and the onset of treatment in the group CBT arm (where the minimal group size required was eight patients). This has produced, by chance, relatively big differences in the sizes of the three arms.
This departure from one-to-one randomization was not mentioned in the second article reporting results of the study, and seems an outright contradiction of what is presented there. Neither is it mentioned nor in the study protocol. This patient selection strategy may have been the source of lack of baseline equivalence of the TAU and to intervention groups.
For the vigilant skeptic, the authors’ calculation of sample size is an eye-opener. Sample size estimation was based on the effectiveness of TAU in primary care visits, which has been assumed to be very low (approximately 10%).
Essentially, the authors are justifying a modest sample size because they don’t expect the TAU intervention is utterly ineffective. How could authors believe there is equipoise, that the comparison control and active interventions treatments could be expected to be equally effective? The authors seem to say that they don’t believe this. Yet,equipoise is an ethical and practical requirement for a clinical trial for which human subjects are being recruited. In terms of trial design, do the authors really think this poor treatment provides an adequate comparison/control?
In the methods section, the authors also provide a study flowchart, which was required for the other paper to adhere to CONSORT standards but was missing in the other paper. Note the flow at the end of the study for the TAU comparison/control condition at the far right. There was substantially more dropout in this group. The authors chose to estimate the scores with the Last Observation Carried Forward (LOCF) method which assumes the last available observation can be substituted for every subsequent one. This is a discredited technique and particularly inappropriate in this context. Think about it: the TAU condition was expected by the authors to be quite poor care. Not surprisingly, more patients assigned to it dropped out. But they might have dropped out while deteriorating, and so the last observation obtained is particularly inappropriate. Certainly it cannot be assumed that the smaller number of dropouts from the other conditions were from the same reason. We have a methodological and statistical mess on our hands, but it was hidden from us in our discussion of the second report.
Six measures are mentioned: (1) the Othmer-DeSouza screening instrument used by clinicians to select patients; (2) the Screening for Somatoform Disorders (SOMS, a 39 item questionnaire that includes all bodily symptoms and criteria relevant to somatoform disorders according to either DSM-IV or ICD-10; (3) a Visual Analog Scale of somatic symptoms (Severity of Somatic Symptoms scale) that patients useto assess changes in severity in each of 40 symptoms; (4) the authors own SPPI semistructured psychiatric interview for diagnosis of psychiatric morbidity in primary care settings; (5) the clinician administered Hamilton Anxiety Rating Scale; and the (6) Hamilton Depression Rating Scale.
We are never actually told what the primary outcome is for the study, but it can be inferred from the opening of the discussion:
The main finding of the trial is a significant improvement regardless of CBT type compared with no intervention at all. CBT was effective for the relief of somatization, reducing both the number of somatic symptoms (Fig. 2) and their intensity (Fig. 3). CBT was also shown to be effective in reducing symptoms related to anxiety and depression.
But I noticed something else here, after a couple of readings. The items used to select patients and identify them with “abridged somatization disorder” reference 39 or 40 symptoms, and men only needing four, while women only needing six symptoms for a diagnosis. That means that most pairs of patients receiving a diagnosis will not have a symptom in common. Whatever “abridged somatization disorder” means, patients who received this diagnosis are likely to be different from each other in terms of somatic symptoms, but probably have other characteristics in common. They are basically depressed and anxious patients, but these mood problems are not being addressed directly.
Comparison of this report to the outcomes paper reviewed earlier shows none of these outcomes are mentioned as being assessed and certainly not has outcomes.
Comparison of this report to the published protocol reveals that number and intensity of somatic symptoms are two of the three main outcomes, but this article makes no mention of the third, utilization of healthcare.
Readers can find something strange in Table 2 presenting what seems to be one of the primary outcomes, severity of symptoms. In this table the order is TAU, group CBT, and individual CBT. Note the large difference in baseline symptoms with group CBT being much more severe. It’s difficult to make sense of the 12 month follow-up because there was differential drop out and reliance on an inappropriate LOCR imputation of missing data. But if we accept the imputation as the authors did, it appears that they were no differences between TAU and group CBT. That is what the authors reported with inappropriate analyses of covariance.
The authors’ cheerful take away message?
This trial, based on a previous successful intervention proposed by Sumathipala et al. (39), presents the effectiveness of CBT applied at individual and group levels for patients with abridged somatization (somatic symptom indexes 4 and 6).
But hold on! In the introduction, the authors’ justification for their trial was:
Evidence for the group versus individual effectiveness of cognitive-behavioral treatment of medically unexplained physical symptoms in the primary care setting is not yet available.
Sumathipala A, Siribaddana S, Hewege S, Sumathipala K, Prince M, Mann A. Understanding the explanatory model of the patient on their medically unexplained symptoms and its implication on treatment development research: a Sri Lanka Study. BMC Psychiatry. 2008 Jul 8;8(1):54.
The article presents speculations based on an observational study, not an intervention study so there is no success being reported.
The registration of psychotherapy trials typically provides sparse details. The curious must consult the more elaborate published protocol. Nonetheless, the registration can often provide grounds for skepticism, particularly when it is compared to any discrepant details in the published protocol, as well as subsequent publications.
This protocol declares
Patients randomized to cognitive behavioural therapy significantly improve in measures related to quality of life, somatic symptoms, psychopathology and health services use.
Primary outcome measures
Severity of Clinical Global Impression scale at baseline, 3 and 6 months and 1-year follow-up
Secondary outcome measures
The following will be assessed at baseline, 3 and 6 months and 1-year follow-up:
1. Quality of life: 36-item Short Form health survey (SF-36)
2. Hamilton Depression Scale
3. Hamilton Anxiety Scale
4. Screening for Somatoform Symptoms [SOMS]
– SSS (Severity of somatic symptoms scale) : a scale of 40 somatic symptoms assessed by a 7-point visual analogue scale.
– SSQ (Somatic symptoms questionnaire) : a scale made up of 40 items on somatic symptoms and patients’ illness behaviour.
When I searched for, Severity of Clinical Global Impression, the primary outcome declared in the registration , and I could find no reference to it.
The protocol was submitted on May 14, 2008 and published on June 22, 2008. This suggests that the protocol was submitted after the start of the trial.
To calculate the sample size we consider that the effectiveness of usual treatment (Smith’s norms) is rather low, estimated at about 20% in most of the variables [10,11]. We aim to assess whether the new intervention is at least 20% more effective than usual treatment.
Control group or standardized recommended treatment for somatization disorder in primary care (Smith’s norms) [10,11]: standardized letter to the family doctor with Smith’s norms that includes: 1. Provide brief, regularly scheduled visits. 2. Establish a strong patient-physician relationship. 3. Perform a physical examination of the area of the body where the symptom arises. 4. Search for signs of disease instead of relying of symptoms. 5. Avoid diagnostic tests and laboratory or surgical procedures. 6. Gradually move the patient to being “referral ready”.
Basically, TAU, the comparison/control group involves simply sending a letter to referring physicians encouraging them simply to meet regularly with the patients but discouraged diagnostic test or medical procedures. Keep in mind that patients for this study were selected by the physicians because they found them particularly frustrating to treat. Despite the authors’ repeated claims about the high prevalence of “abridged somatization disorder,” they relied on a large number of general practice settings to each contribute only a few patients . These patients are very heterogeneous in terms of somatic symptoms, but most share anxiety or depressive symptoms.
There is an uncontrolled selection bias here that makes generalization from results of the study problematic. Just who are these patients? I wonder if these patients have some similarity to the frustrating GOMERS (Get Out Of My Emergency Room) in the classic House of God, a book described by Amazon as “an unvarnished, unglorified, and amazingly forthright portrait revealing the depth of caring, pain, pathos, and tragedy felt by all who spend their lives treating patients and stand at the crossroads between science and humanity.”
Imagine the disappointment about the referring physicians and the patients when consent to participate in this study simply left the patients back in routine care provided by the same physicians . It’s no wonder that the patients deteriorated and that patients assigned to this treatment were more likely to drop out.
Whatever active ingredients the individual and group CBT have, they also include some nonspecific factors missing from the TAU comparison group: frequency and intensity of contact, reassurance and support, attentive listening, and positive expectations. These nonspecific factors can readily be confused with active ingredients and may account for any differences between the active treatments and the TAU comparison. What terrible study.
The two journals providing reports of the studies failed to responsibility to the readership and the larger audience seeking clinical and public policy relevance. Authors have ample incentive to engage in questionable publication practices, including ignoring and even suppressing registration, switching outcomes, and exaggerating the significance of their results. Journals of necessity must protect authors from their own inclinations, as well as the readers and the larger medical community from on trustworthy reports. Psychosomatic Medicine and Journal of Psychosomatic Research failed miserably in their peer review of these articles. Neither journal is likely to be the first choice for authors seeking to publish findings from well-designed and well reported trials. Who knows, maybe the journals’ standards are compromised by the need to attract randomized trials for what is construed as a psychosomatic condition, at least by the psychiatric community.
Regardless, it’s futile to require registration and posting of protocols for psychotherapy trials if editors and reviewers ignore these resources in evaluating articles for publication.
Postscript: imagine what will be done with the results of this study
You can’t fix with a meta analysis what investigators bungled by design .
In a recent blog post, I examined a registration for a protocol for a systematic review and meta-analysis of interventions to address medically unexplained symptoms. The review protocol was inadequately described, had undisclosed conflicts of interest, and one of the senior investigators had a history of switching outcomes in his own study and refusing to share data for independent analysis. Undoubtedly, the study we have been discussing meets the vague criteria for inclusion in this meta-analysis. But what outcomes will be chosen, particularly when they should only be one outcome per study? And will be recognized that these two reports are actually the same study? Will key problems in the designation of the TAU control group be recognized, with its likely inflation of treatment effects, when used to calculate effect sizes?
As you can see, it took a lot of effort to compare and contrast documents that should have been in alignment. Do you really expect those who conduct subsequent meta-analyses to make those multiple comparisons or will they simply extract multiple effect sizes from the two papers so far reporting results?
Obviously, every time we encounter a report of a psychotherapy in the literature, we won’t have the time or inclination to undertake such a cross comparison of articles, registration, and protocol. But maybe we should be skeptical of authors’ conclusions without such checks.
I’m curious what a casual reader would infer from encountering either of these reports of this clinical trial I have reviewed in a literature search, but not the other one.
Doubts that much of clinical or policy significance was learned from a recent study published in Lancet
Promoters of Acceptance and Commitment Therapy (ACT) notoriously established a record for academics endorsing a psychotherapy as better than alternatives, in the absence of evidence from adequately sized, high quality studies with suitable active control/comparison conditions. The credibility of designating a psychological interventions as “evidence-based” took a serious hit with the promotion of ACT, before its enthusiasts felt they attracted enough adherents to be able to abandon claims of “best” or “better than.”
But the tsunami of mindfulness promotion has surpassed anything ACT ever produced, and still with insufficient quality and quantity of evidence.
Could that be changing?
Some might think so with a recent randomized controlled trial reported in the Lancet of mindfulness-based cognitive therapy (MBCT) to reduce relapse and recurrence in depression. The headline of a Guardian column by one of the Lancet article’s first author’s colleagues at Oxford misleadingly proclaimed that the study showed
And that misrepresentation was echoed in the Mental Health Foundation call for mindfulness to be offered through the UK National Health Service –
The Mental Health Foundation is offering a 10-session online course for £60 and is undoubtedly prepared for an expanded market.
The Declaration of Conflict of Interest for the Lancet article mentions the first author and one other are “co-directors of the Mindfulness Network Community Interest Company and teach nationally and internationally on MBCT.” The first author notes the marketing potential of his study in comments to the media.
Reworded research question. To ensure that readers clearly understand that this trial is not a direct comparison between antidepressant medication (ADM) and Mindfulness-based cognitive therapy (MBCT), but ADM versus MBCT plus tapering support (MBCT-TS), the primary research question has been changed following the recommendation made by the Trial Steering Committee at their meeting on 24 June 2013. The revised primary research question now reads as follows: ‘Is MBCT with support to taper/discontinue antidepressant medication (MBCT-TS) superior to maintenance antidepressant medication (m-ADM) in preventing depression over 24 months?’ In addition, the acronym MBCT-TS will be used to emphasise this aspect of the intervention.
I would agree and amplify: This trial adds nothing to the paucity of evidence from well-controlled trials that MBCT is a first-line treatment for patients experiencing a current episode of major depression. The few studies to date are small and of poor quality and are insufficient to recommend MBCT as a first line treatment of major depression.
I know, you would never guess that from promotions of MBCT for depression, especially not in the current blitz promotion in the UK.
The most salient question is whether MBCT can provide an effective means of preventing relapse in depressed patients who have already achieved remission and seek discontinuation.
Despite a chorus of claims in the social media to the contrary, the Lancet trial does not demonstrate that
Formal psychotherapy is needed to prevent relapse and recurrence among patients previously treated with antidepressants in primary care.
Any less benefit would have been achieved with a depression care manager who requires less formal training than a MBCT therapist.
Any less benefit would have been achieved with primary care physicians simply tapering antidepressant treatment that may not even have been appropriate in the first place.
The crucial benefit to patients being assigned to the MBCT condition was their acquisition of skills.
That practicing mindfulness is needed or even helpful in tapering from antidepressants.
We are all dodos and everyone gets a prize
Something also lost in the promotion of the trial is that it was originally designed to test the hypothesis that MBCT was better than maintenance antidepressant therapy in terms of relapse and recurrence of depression. That is stated in the registration of the trial, but not in the actual Lancet report of the trial outcome.
Across the primary and secondary outcome measures, the trial failed to demonstrate that MBCT was superior. Essentially the investigators had a null trial on their hands. But in a triumph of marketing over accurate reporting of a clinical trial, they shifted the question to whether MBCT is inferior to maintenance antidepressant therapy and declared the success demonstrating that it was not.
We saw a similar move in a MBCT trial that I critiqued just recently. The authors here opted for the noninformative conclusion that MBCT was “not inferior” to an ill-defined routine primary care for a mixed sample of patients with depression and anxiety and adjustment disorders.
An important distinction is being lost here. Null findings in a clinical trial with a sample size set to answer the question whether one treatment is better than another is not the same as demonstrating that the two treatments are equivalent. The latter question requires a non-inferiority design with a much larger sample size in order to demonstrate that by some pre-specified criteria two treatments do not differ from each other in clinically significant terms.
Consider this analogy: we want to test whether yogurt is better than aspirin for a headache. So we do a power analysis tailored to the null hypothesis of no difference between yogurt and aspirin, conduct a trial, and find that yogurt and aspirin do not differ. But if we were actually interested in the question whether yogurt can be substituted for aspirin in treating headaches, we would have to estimate what size of a study would leave us comfortable with that conclusion the treating aspirin with yogurt versus aspirin makes no clinically significant difference. That would require a much larger sample size, typically several times the size of a clinical trial designed to test the efficacy of an intervention.
The often confusing differences between standard efficacy trials and noninferiority and superiority trials are nicely explained here.
Do primary care patients prescribed an antidepressant need to continue?
Patients taking antidepressants should not stop without consulting their physician and agreeing on a plan for discontinuation.
NICE Guidelines, like many international guidelines, recommend that patients with recurrent depression continue their medication for at least two years, out of concerned for a heightened risk of relapse and recurrence. But these recommendations are based on research in specialty mental health settings conducted with patients with an established diagnosis of depression. The generalization to primary care patients may not be appropriate best evidence.
Major depression is typically a recurrent, episodic condition with onset in the teens or early 20s. Many currently adult depressed patients beyond that age would be characterized as having a recurrent depression. In a study conducted at primary care practices associated with the University of Michigan, we found that most patients in waiting rooms identified as depressed on the basis of a two stage screening and formal diagnostic interview had recurrent depression, with the average patient having over six episodes before our point of contact.
However, depression in primary care may have less severe symptoms in a given episode and an overall less severe course then the patients who make it to specialty mental health care. And primary care physicians’ decisions about placing patients on antidepressants in primary care are typically not based upon a formal, semi structured interview in which there are symptom counts to ascertain whether patients have the necessary number of symptoms (5 for the Diagnostic and Statistical Manual-5) to meet diagnostic criteria.
My colleagues in Germany and I conducted another relevant study in which we randomized patients to either antidepressant, behavior therapy, or the patient preference of antidepressant versus behavior therapy. However, what was unusual was that we relied on primary care physician diagnosis, not our formal research criteria. We found that many patients enrolling in the trial would not meet criteria for major depression and, at least by DSM-IV-R criteria, would be given the highly ambiguous diagnosis of Depression, Not Otherwise Specified. The patients identified by the primary care physicians as requiring treatment for depression were quite different than those typically entering clinical trials evaluating treatment options. You can find out more about the trial here .
It is thus important to note that patients in the Lancet study were not originally prescribed antidepressants based on a formal, research diagnosis of major depression. Rather, the decisions of primary care physicians to prescribe the antidepressants, are not usually based on a systematic interview aimed at a formal diagnosis based on a minimal number of symptoms being present. This is a key issue.
The inclusion criteria for the Lancet study were that patients currently be in full or partial remission from a recent episode of depression and have had at least three episodes, counting the recent one. But their diagnosis at the time they were prescribed antidepressants was retrospectively reconstructed and may have biased by them having received antidepressants
Patients enrolled in the study were thus a highly select subsample of all patients receiving antidepressants in the UK primary care. A complex recruitment procedure involving not only review of GP records, but advertisement in the community means that we cannot tell what the overall proportion of patients receiving antidepressants and otherwise meeting criteria would have agreed to be in the study.
The study definitely does not provide a basis for revising guidelines for determining when and if primary care physicians should raise the issue of tapering antidepressant treatment. But that’s a vitally important clinical question.
Questions not answered by the study:
We don’t know the appropriateness of the prescription of antidepressants to these patients in the first place.
We don’t know what review of the appropriateness of prescription of antidepressants had been conducted by the primary care physicians in agreeing that their patients participate in the study.
We don’t know the selectivity with which primary care physicians agreed for their patients to participate. To what extent are the patients to whom they recommended the trial representative of other patients in the maintenance phase of treatment?
We don’t know enough about how the primary care physicians treating the patients in the control groups reacted to the advice from the investigator group to continue medication. Importantly, how often were there meetings with these patients and did that change as a result of participation in this trial? Like every other trial of CBT in the UK that I have reviewed, this one suffers from an ill defined control group that was nonequivalent in terms of the contact time with professionals and support.
The question persists whether any benefits claimed for cognitive behavior therapy or MBCT from recent UK trials could have been achieved with nonspecific supportive interventions. In this particular Lancet study, we don’t know whether the same results could been achieved by simply tapering antidepressants assisted by a depression care manager less credentialed than what is required to provide MBCT.
The investigators provided a cost analysis. They concluded that there were no savings in health care costs of moving patients in full or partial remission off antidepressants to MBCT. But the cost analysis did not take into account the added patient time invested in practicing MBCT. Indeed, we don’t even know whether the patients assigned to MBCT actually practiced it with any diligence or will continue to do after treatment.
The authors promise a process analysis that will shed light on what element of MBCT contributed to the equivalency of outcomes with the maintenance of antidepressant medication.
But this process analysis will be severely limited by the inability to control for nonspecific factors such as contact time with the patient and support provided to the primary care physician and patient in tapering medication.
The authors seem intent on arguing that MBCT should be disseminated into the UK National Health Services. But a more sober assessment is that this trial only demonstrates that a highly select group of patients currently receiving antidepressants within the UK health system could be tapered without heightened risk of relapse and recurrence. There may be no necessity or benefit of providing MBCT per se during this process.
The study is not comparable to other noteworthy studies of MBCT to prevent remission, like Zindel Segal’s complex study . That study started with an acutely depressed patient population defined by careful criteria and treated patients with a well-defined algorithm for choosing and making changes in medications. Randomization to continued medication, MBCT, or pill placebo occurred on in the patients who remitted. It is unclear how much the clinical characteristics of the patients in the present Lancet study overlapped with those in Segal’s study.
What would be the consequences of disseminating and implementing MBCT into routine care based on current levels of evidence?
There are lots of unanswered questions concerning whether MBCT should be disseminated and widely implemented in routine care for depression.
One issue is where would the resources come from for this initiative? There already are long waiting list for cognitive behavior therapy, generally 18 weeks. Would disseminating MBCT draw therapists away from providing conventional cognitive behavior therapy? Therapists are often drawn to therapies based on their novelty and initial, unsubstantiated promises rather than strength of evidence. And the strength of evidence for MBCT is not such that we could recommend substituting it for CBT for treatment of acute, current major depression.
Another issue is whether most patients would be willing to commit not only the time for sessions of training and MBCT but to actually practicing it in their everyday life. Of course, again, we don’t even know from this trial whether actually practicing MBCT matters.
There hasn’t been a fair comparison of MBCT to equivalent time with a depression manager who would review patients currently receiving antidepressants and advise physicians has to whether and how to taper suitable candidates for discontinuation.
If I were distributing scarce resources to research to reduce unnecessary treatment with antidepressants, I would focus on a descriptive, observational study of the clinical status of patients currently receiving antidepressants, the amount of contact time their receiving with some primary health care professional, and the adequacy of their response in terms of symptom levels, but also adherence. Results could establish the usefulness of targeting long term use of antidepressants and the level of adherence of patients to taking the medication and to physicians monitoring their symptom levels and adherence. I bet there is a lot of poor quality maintenance care for depression in the community
When I was conducting NIMH-funded studies of depression in primary care, I never could get review committees interested in the issue of overtreatment and unnecessarily continued treatment. I recall one reviewer’s snotty comment that that these are not pressing public health issues.
That’s too bad, because I think they are key in considering how to distribute scarce resources to study and improve care for depression in the community. Existing evidence suggest a substantial cost of treatment of depression with antidepressants in general medical care is squandered on patients who do not meet guideline criteria for receiving antidepressants or who do not receive adequate monitoring.