Mindfulness research’s huge problem with uninformative control groups

Are enthusiasts protecting cherished beliefs about the power of mindfulness from disconfirmation?

Do any advantages of mindfulness training disappear in a fairly matched cage fight with a treatment of comparable frequency and intensity?

  • Very few of the 1000s of articles retrieved in a literature search with the keyword “mindfulness” represent advances in the limited evidence that mindfulness-based stress reduction (MBSR) is effective for physical health problems.
  • Only a few randomized controlled trials with appropriate control groups are available and they do not offer strong evidence for the efficacy of MBSR.
  • This blog post demonstrates how uninformative and misleading comparisons with no treatment or treatment as usual/routine care can be.
  • While the lack of adequately controlled studies could have initially reflected the naïveté of MBSR researchers, increasing acknowledgment of the problem suggests enthusiasts’ avoidance of confronting cherished beliefs with disconfirming evidence.
  • When cage fights are arranged between MBSR and appropriate active control groups, the alternative treatments are often shown to be superior and more cost-effective, even when MBSR enthusiasts are the referees.

A comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ)

Goyal M, Singh S. Sibinga EMS, et al. Meditation programs for psychological stress and well-being: a systematic review and meta-analysis. JAMA Intern Med. Epub Jan 6 2014. doi:10.1001/jamainternmed.2013.13018.

Reviewed 18,753 citations, and found only 47 trials (3%) with 3515 participants that included an active control treatment.

Mindfulness meditation programs had moderate evidence of improved anxiety (effect size, 0.38 [95%CI, 0.12-0.64] at 8 weeks and 0.22 [0.02-0.43] at 3-6 months), depression (0.30 [0.00-0.59] at 8 weeks and 0.23 [0.05-0.42] at 3-6 months), and pain (0.33 [0.03- 0.62]) and low evidence of improved stress/distress and mental health–related quality of life. We found low evidence of no effect or insufficient evidence of any effect of meditation programs on positive mood, attention, substance use, eating habits, sleep, and weight. We found no evidence that meditation programs were better than any active treatment (ie, drugs, exercise, and other behavioral therapies).

An accompanying commentary on the review asked:

The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?

A reminder: treatments do not have effect sizes.

MBSR does not have an effect size. Rather, comparisons of MBSR to other conditions have effect sizes, which will vary greatly with the comparison treatment and population being studied.

Not just any comparison/control condition will do.

A comparison/control condition must be suitably matched with MBSR in terms of frequency and intensity of contact, positive expectations, and overall levels of support and attention. MBSR treatments typically involve weekly meetings, daylong workshops or retreats, and the expectations that patients will practice mindfulness daily.

Construction of an adequate control condition that matches these features can be challenging.

Comparisons of MBSR with wait list controls and no treatment control conditions produce exaggerated effect sizes for active treatments and may produce positive findings were no differences would be found with an adequate control group.

The domination of the MBSR literature by nonrandomized trials and randomized trials with inadequate control groups represents one contribution to an exaggeration of the efficacy of MBSR.

Demonstrating how uninformative and even misleading poorly chosen control groups can be.

 Spirometry_NIHA study published in NEJM that did not evaluate MBSR nonetheless demonstrates how misleading poorly chosen control groups can be, especially for physical health outcomes.

Wechsler ME, Kelley JM, Boyd IO, Dutile S, Marigowda G, Kirsch I, Israel E, Kaptchuk TJ. Active albuterol or placebo, sham acupuncture, or no intervention in asthma. New England Journal of Medicine. 2011 Jul 14;365(2):119-26.

 This randomized, double-blind, crossover pilot study involved screening 79 patients, of whom 46 with mild-to-moderate asthma met the entry criteria, and were randomly assigned to one of four study interventions. An inhaled albuterol bronchodilator was compared to one of three control conditions placebo inhaler, sham acupuncture, or no intervention. Figure 4 from the article presents subjective outcomes for two self-report measures, perceived improvements in asthma symptoms on a visual-analogue scale and perceived credibility of treatment.

percent change subjectivePatients reported substantial improvement not only with inhaled albuterol (50% improvement) but also with inhaled placebo (45%) and with sham acupuncture (46%). In contrast, the improvement reported with no intervention was only 21%. The difference in the subjective drug effect between the active albuterol inhaler and the placebo inhaler was not significant (P=0.12), and the observed effect size was small (d=0.21). With respect to the placebo effects, however, the difference between the two placebo interventions and no intervention was large (d=1.07 for placebo inhaler and d=1.11 for sham acupuncture) and significant (P<0.001 for both comparisons). Treatment credibility was high, and most patients believed that they had received active treatment (73% for double-blind albuterol, 66% for double-blind placebo inhaler, and 85% for sham acupuncture). The two double-blind conditions did not differ significantly from each other, but sham acupuncture was significantly more credible than both inhaler conditions (P<0.05).

Figure 3 from the article  presents the outcomes for an objective measure physiological responses – improvement in forced expiratory volume (FEV1), measured with spirometry   to each intervention (albuterol inhaler, placebo inhaler, sham acupuncture, and no intervention) across the three study visits.

percent chane objective

The mean percent improvement in FEV1 was 20.1±1.6% with inhaled albuterol, as compared with 7.5±1.0% with inhaled placebo, 7.3±0.8% with sham acupuncture, and 7.1±0.8% with the no-intervention control. There were no significant differences between the three inactive interventions, none of which resulted in the degree of improvement observed with active albuterol. The difference in drug effect between the albuterol inhaler and the placebo inhaler, as indexed by the difference in mean percent improvement in FEV1, was significant (P<0.001) and large (d=1.48). In contrast, the placebo effects did not differ significantly between the two placebo interventions and the no-intervention control (P=0.65 for the comparison of placebo inhaler with no intervention, and P=0.75 for the comparison of sham acupuncture with no intervention).

The authors concluded:

In this repeated-measures pilot study in which active-drug and placebo effects were assessed in patients with asthma, two different types of placebo had no objective bronchodilator effect beyond the improvement that occurred when patients received no intervention of any kind and simply underwent repeated spirometry (no-intervention control). In contrast, the subjective improvement in asthma symptoms with both inhaled placebo and sham acupuncture was significantly greater than the subjective improvement with the no-intervention control and was similar to that with the active drug.

Relevance to Studies of MBSR.

 Claims for the efficacy of MBSR depend heavily on RCTs comparing MBSR to waitlist. I’m unaware of comparisons of the standard waitlist control condition to more appropriate comparison/control conditions.  However, this unusual pilot study provides some suggestive evidence that a waitlist is seriously deficient when compared to credible comparison/control conditions for which patients are likely to have positive expectations.

We should be cautious in interpreting these results because we will be comparing effect sizes across different kinds of studies. But with this caution in mind, we can see that for subjective self-report measures, the large difference between placebo conditions with positive expectations and no treatment is certainly greater than the differences typically found between the MBSR and a waitlist.

The difference between a waitlist control group and a blinded control with blinding group with positive expectations is considerably greater than the difference between MBSR and a waitlist control group. This spells trouble for anyone wanting to crow about MBSR.

It’s not an unreasonable inference that comparison with more appropriate comparison/control conditions will eliminate any advantage of MBSR. I would welcome a direct test of this hypothesis by pitting MBSR against a placebo condition with positive expectations and another comparison control condition like waitlist or no treatment.

The contrast between results from subjective self-report and objective outcomes should be troubling to those needing to evaluate MBSR or other psychological interventions for clinical or health policy applications.  If one relies on studies with subjective self-report as the primary outcome, the risk is that differences for objective health measures will be missed and ineffective treatments will be accepted as effective. Ouch!

For a large proportion of studies of psychological interventions for chronic health conditions, the primary outcomes are indeed subjective self-report. Even when conceptually possible, objective measures of health conditions are either not included or they are deemphasized as secondary outcomes. The message of this study, again delivered with appropriate caution, is that we should not be generalizing from results obtained with subjective self-report to objective health outcomes.

In defense of MBSR researchers, they might not just be defending against his confirmation of a cherished belief. They may also be avoiding a threat to continued funding. An investigator who conducted such a trial and got the expected result would jeopardize getting further funding for MBSR trials.

Cage fights between MBSR and active control conditions.

 corey nelsonComparisons between MBSR an active control conditions are the real test of whether MBSR is effective and distinctively so. Such “cage fights” become particularly important when MBSR enthusiasts are not the referee. Investigator allegiance is an important determinant of outcome. Yet even when cage fights are refereed by investigators rooting for MBSR, the results can be disappointing.

In a recent blog post, I examined a trial of MBSR for smoking cessation that was published too late to be included in the comprehensive systematic review and meta-analysis.


The well-designed study

Vidrine JI, Spears CA, Heppner WL, Reitzel LR, Marcus MT, Cinciripini PM, Waters AJ, Li Y, Nguyen NT, Cao Y, Tindle HA. Efficacy of Mindfulness-Based Addiction Treatment (MBAT) for Smoking Cessation and Lapse Recovery: A Randomized Clinical Trial. Journal of Consulting and Clinical Psychology. 2016 May.

Compared mindfulness-based abstinence therapy (MBAT) to cognitive behavior therapy, which was closely matched for frequency and intensity of contact and credibility. The control/comparison group was four  5-10 minute individual counseling sessions. Although the comparison was lopsided in terms of frequency and intensity of meetings, there were no differences among the three groups. The authors did not emphasize that  a reason for the finding that all three groups received a nicotine patch with instructions.

Another new large study 

Daubenmier J, Moran PJ, Kristeller J, Acree M, Bacchetti P, Kemeny ME, Dallman M, Lustig RH, Grunfeld C, Nixon DF, Milush JM. Effects of a mindfulness‐based weight loss intervention in adults with obesity: A randomized clinical trial. Obesity. 2016 Apr 1;24(4):794-804.

Compared mindfulness training to a 5.5 month active control condition that was carefully matched.

To control of attention, social support, expectations of benefit, food provided during the mindful eating exercises, and home practice time in the mindfulness intervention, the control intervention included additional nutrition and physical activity information, strength training with exercise bands, discussion of societal issues concerning weight loss, snacks, and home activities. We controlled for a mindfulness approach to stress management by including progressive muscle relaxation and cognitive-behavioral training in the control group, although at a lower dose than in the mindfulness intervention.

There were no differences in weight loss between the two groups. Questions can be raised about how different the two treatments actually were, but part of the problem is that it is difficult to design such a treatment of comparable frequency and intensity of contact with credible content that does not overlap.

I would anticipate that comparisons between MBSR and appropriate active control conditions will be slow to accumulate. But the results at this point are not encouraging of the notion that MBSR is distinctively more effective than other active control conditions when delivered with the same frequency of contact, intensity, and positive expectations.

Is mindfulness-based therapy ready for rollout to prevent relapse and recurrence in depression?

Doubts that much of clinical or policy significance was learned from a recent study published in Lancet

Dog-MindfulnessPromoters of Acceptance and Commitment Therapy (ACT) notoriously established a record for academics endorsing a psychotherapy as better than alternatives, in the absence of evidence from adequately sized, high quality studies with suitable active control/comparison conditions. The credibility of designating a psychological interventions as “evidence-based” took a serious hit with the promotion of ACT, before its enthusiasts felt they attracted enough adherents to be able to abandon claims of “best” or “better than.”

But the tsunami of mindfulness promotion has surpassed anything ACT ever produced, and still with insufficient quality and quantity of evidence.

Could that be changing?

Some might think so with a recent randomized controlled trial reported in the Lancet of mindfulness-based cognitive therapy (MBCT) to reduce relapse and recurrence in depression. The headline of a Guardian column  by one of the Lancet article’s first author’s colleagues at Oxford misleadingly proclaimed that the study showed

freeman promoAnd that misrepresentation was echoed in the Mental Health Foundation call for mindfulness to be offered through the UK National Health Service –

calls for NHS mindfulnessThe Mental Health Foundation is offering a 10-session online course  for £60 and is undoubtedly prepared for an expanded market.

Patient testimonial accompanying Mental Health Foundation’s call for dissemination.




The Declaration of Conflict of Interest for the Lancet article mentions the first author and one other are “co-directors of the Mindfulness Network Community Interest Company and teach nationally and internationally on MBCT.” The first author notes the marketing potential of his study in comments to the media.

revising NICETo the authors’ credit, they modified the registration of their trial to reduce the likelihood of it being misinterpreted.

Reworded research question. To ensure that readers clearly understand that this trial is not a direct comparison between antidepressant medication (ADM) and Mindfulness-based cognitive therapy (MBCT), but ADM versus MBCT plus tapering support (MBCT-TS), the primary research question has been changed following the recommendation made by the Trial Steering Committee at their meeting on 24 June 2013. The revised primary research question now reads as follows: ‘Is MBCT with support to taper/discontinue antidepressant medication (MBCT-TS) superior to maintenance antidepressant medication (m-ADM) in preventing depression over 24 months?’ In addition, the acronym MBCT-TS will be used to emphasise this aspect of the intervention.

1792c904fbbe91e81ceefdd510d46304I would agree and amplify: This trial adds nothing to  the paucity of evidence from well-controlled trials that MBCT is a first-line treatment for patients experiencing a current episode of major depression. The few studies to date are small and of poor quality and are insufficient to recommend MBCT as a first line treatment of major depression.

I know, you would never guess that from promotions of MBCT for depression, especially not in the current blitz promotion in the UK.

The most salient question is whether MBCT can provide an effective means of preventing relapse in depressed patients who have already achieved remission and seek discontinuation.

Despite a chorus of claims in the social media to the contrary, the Lancet trial does not demonstrate that

  • Formal psychotherapy is needed to prevent relapse and recurrence among patients previously treated with antidepressants in primary care.
  • Any less benefit would have been achieved with a depression care manager who requires less formal training than a MBCT therapist.
  • Any less benefit would have been achieved with primary care physicians simply tapering antidepressant treatment that may not even have been appropriate in the first place.
  • The crucial benefit to patients being assigned to the MBCT condition was their acquisition of skills.
  • That practicing mindfulness is needed or even helpful in tapering from antidepressants.

We are all dodos and everyone gets a prize

dodosSomething also lost in the promotion of the trial is that it was originally designed to test the hypothesis that MBCT was better than maintenance antidepressant therapy in terms of relapse and recurrence of depression. That is stated in the registration of the trial, but not in the actual Lancet report of the trial outcome.

Across the primary and secondary outcome measures, the trial failed to demonstrate that MBCT was superior. Essentially the investigators had a null trial on their hands. But in a triumph of marketing over accurate reporting of a clinical trial, they shifted the question to whether MBCT is inferior to maintenance antidepressant therapy and declared the success demonstrating that it was not.

We saw a similar move in a MBCT trial  that I critiqued just recently. The authors here opted for the noninformative conclusion that MBCT was “not inferior” to an ill-defined routine primary care for a mixed sample of patients with depression and anxiety and adjustment disorders.

An important distinction is being lost here. Null findings in a clinical trial with a sample size set to answer the question whether one treatment is better than another is not the same as demonstrating that the two treatments are equivalent. The latter question requires a non-inferiority design with a much larger sample size in order to demonstrate that by some pre-specified criteria two treatments do not differ from each other in clinically significant terms.

Consider this analogy: we want to test whether yogurt is better than aspirin for a headache. So we do a power analysis tailored to the null hypothesis of no difference between yogurt and aspirin, conduct a trial, and find that yogurt and aspirin do not differ. But if we were actually interested in the question whether yogurt can be substituted for aspirin in treating headaches, we would have to estimate what size of a study would leave us comfortable with that conclusion the treating aspirin with yogurt versus aspirin makes no clinically significant difference. That would require a much larger sample size, typically several times the size of a clinical trial designed to test the efficacy of an intervention.

The often confusing differences between standard efficacy trials and noninferiority and superiority trials are nicely explained here.

Do primary care patients prescribed an antidepressant need to continue?

Patients taking antidepressants should not stop without consulting their physician and agreeing on a plan for discontinuation.

NICE Guidelines, like many international guidelines, recommend that patients with recurrent depression continue their medication for at least two years, out of concerned for a heightened risk of relapse and recurrence. But these recommendations are based on research in specialty mental health settings conducted with patients with an established diagnosis of depression. The generalization to primary care patients may not be appropriate best evidence.

Major depression is typically a recurrent, episodic condition with onset in the teens or early 20s. Many currently adult depressed patients beyond that age would be characterized as having a recurrent depression. In a study conducted at primary care practices associated with the University of Michigan, we found that most patients in waiting rooms identified as depressed on the basis of a two stage screening and formal diagnostic interview had recurrent depression, with the average patient having over six episodes before our point of contact.

However, depression in primary care may have less severe symptoms in a given episode and an overall less severe course then the patients who make it to specialty mental health care. And primary care physicians’ decisions about placing patients on antidepressants in primary care are typically not based upon a formal, semi structured interview in which there are symptom counts to ascertain whether patients have the necessary number of symptoms (5 for the Diagnostic and Statistical Manual-5) to meet diagnostic criteria.

My colleagues in Germany and I conducted another relevant study in which we randomized patients to either antidepressant, behavior therapy, or the patient preference of antidepressant versus behavior therapy. However, what was unusual was that we relied on primary care physician diagnosis, not our formal research criteria. We found that many patients enrolling in the trial would not meet criteria for major depression and, at least by DSM-IV-R criteria, would be given the highly ambiguous diagnosis of Depression, Not Otherwise Specified. The patients identified by the primary care physicians as requiring treatment for depression were quite different than those typically entering clinical trials evaluating treatment options. You can find out more about the trial here .

It is thus important to note that patients in the Lancet study were not originally prescribed antidepressants based on a formal, research diagnosis of major depression. Rather, the decisions of primary care physicians to prescribe the antidepressants, are not usually based on a systematic interview aimed at a formal diagnosis based on a minimal number of symptoms being present. This is a key issue.

The inclusion criteria for the Lancet study were that patients currently be in full or partial remission from a recent episode of depression and have had at least three episodes, counting the recent one. But their diagnosis at the time they were prescribed antidepressants was retrospectively reconstructed and may have biased by them having received antidepressants

Patients enrolled in the study were thus a highly select subsample of all patients receiving antidepressants in the UK primary care. A complex recruitment procedure involving not only review of GP records, but advertisement in the community means that we cannot tell what the overall proportion of patients receiving antidepressants and otherwise meeting criteria would have agreed to be in the study.

The study definitely does not provide a basis for revising guidelines for determining when and if primary care physicians should raise the issue of tapering antidepressant treatment. But that’s a vitally important clinical question.

skeptical-cat-is-fraught-with-skepticismQuestions not answered by the study:

  • We don’t know the appropriateness of the prescription of antidepressants to these patients in the first place.
  • We don’t know what review of the appropriateness of prescription of antidepressants had been conducted by the primary care physicians in agreeing that their patients participate in the study.
  • We don’t know the selectivity with which primary care physicians agreed for their patients to participate. To what extent are the patients to whom they recommended the trial representative of other patients in the maintenance phase of treatment?
  • We don’t know enough about how the primary care physicians treating the patients in the control groups reacted to the advice from the investigator group to continue medication. Importantly, how often were there meetings with these patients and did that change as a result of participation in this trial? Like every other trial of CBT in the UK that I have reviewed, this one suffers from an ill defined control group that was nonequivalent in terms of the contact time with professionals and support.
  • The question persists whether any benefits claimed for cognitive behavior therapy or MBCT from recent UK trials could have been achieved with nonspecific supportive interventions. In this particular Lancet study, we don’t know whether the same results could been achieved by simply tapering antidepressants assisted by a depression care manager less credentialed than what is required to provide MBCT.

The investigators provided a cost analysis. They concluded that there were no savings in health care costs of moving patients in full or partial remission off antidepressants to MBCT. But the cost analysis did not take into account the added patient time invested in practicing MBCT. Indeed, we don’t even know whether the patients assigned to MBCT actually practiced it with any diligence or will continue to do after treatment.

The authors promise a process analysis that will shed light on what element of MBCT contributed to the equivalency of outcomes with the maintenance of antidepressant medication.

But this process analysis will be severely limited by the inability to control for nonspecific factors such as contact time with the patient and support provided to the primary care physician and patient in tapering medication.

The authors seem intent on arguing that MBCT should be disseminated into the UK National Health Services. But a more sober assessment is that this trial only demonstrates that a highly select group of patients currently receiving antidepressants within the UK health system could be tapered without heightened risk of relapse and recurrence. There may be no necessity or benefit of providing MBCT per se during this process.

The study is not comparable to other noteworthy studies of MBCT to prevent remission, like Zindel Segal’s complex study . That study started with an acutely depressed patient population defined by careful criteria and treated patients with a well-defined algorithm for choosing and making changes in medications. Randomization to continued medication, MBCT, or pill placebo occurred on in the patients who remitted. It is unclear how much the clinical characteristics of the patients in the present Lancet study overlapped with those in Segal’s study.

What would be the consequences of disseminating and implementing MBCT into routine care based on current levels of evidence?

There are lots of unanswered questions concerning whether MBCT should be disseminated and widely implemented in routine care for depression.

One issue is where would the resources come from for this initiative? There already are long waiting list for cognitive behavior therapy, generally 18 weeks. Would disseminating MBCT draw therapists away from providing conventional cognitive behavior therapy? Therapists are often drawn to therapies based on their novelty and initial, unsubstantiated promises rather than strength of evidence. And the strength of evidence for MBCT is not such that we could recommend substituting it for CBT for treatment of acute, current major depression.

Another issue is whether most patients would be willing to commit not only the time for sessions of training and MBCT but to actually practicing it in their everyday life. Of course, again, we don’t even know from this trial whether actually practicing MBCT matters.

There hasn’t been a fair comparison of MBCT to equivalent time with a depression manager who would review patients currently receiving antidepressants and advise physicians has to whether and how to taper suitable candidates for discontinuation.

If I were distributing scarce resources to research to reduce unnecessary treatment with antidepressants, I would focus on a descriptive, observational study of the clinical status of patients currently receiving antidepressants, the amount of contact time their receiving with some primary health care professional, and the adequacy of their response in terms of symptom levels, but also adherence. Results could establish the usefulness of targeting long term use of antidepressants and the level of adherence of patients to taking the medication and to physicians monitoring their symptom levels and adherence. I bet there is a lot of poor quality maintenance care for depression in the community

When I was conducting NIMH-funded studies of depression in primary care, I never could get review committees interested in the issue of overtreatment and unnecessarily continued treatment. I recall one reviewer’s snotty comment that that these are not pressing public health issues.

That’s too bad, because I think they are key in considering how to distribute scarce resources to study and improve care for depression in the community. Existing evidence suggest a substantial cost of treatment of depression with antidepressants in general medical care is squandered on patients who do not meet guideline criteria for receiving antidepressants or who do not receive adequate monitoring.