Headspace mindfulness training app no better than a fake mindfulness procedure for improving critical thinking, open-mindedness, and well-being.

The Headspace app increased users’ critical thinking and being open-minded. So did practicing a shame mindfulness procedure- participants simply sat with their eyes closed, but thought they were meditating.

mind the brain logo

The Headspace app increased users’ critical thinking and open-mindedness. So did practicing a sham mindfulness procedure. Participants simply sat with their eyes closed, but thought they were meditating.

cat_ dreamstime_164683 (300 x 225)Results call into question claims about Headspace  coming from other studies that did not have such a credible, active control group comparison.

Results also call into question the widespread use of standardized self-report measures of mindfulness to establish whether someone is in the state of mindfulness. These measures don’t distinguish between the practice of standard versus fake mindfulness.

Results can be seen as further evidence that practicing mindfulness depends on nonspecific factors (AKA placebo), rather than any active, distinctive ingredient.

Hopefully this study will prompt better studies evaluating the Headspace App, as well as evaluations of mindfulness training more generally, using credible active treatments, rather than no treatment or waitlist controls.

Maybe it is time for a moratorium on trials of mindfulness without such an active control or at least a tempering of claims based on poorly controlled  trials.

This study points to the need for development of more psychometrically sophisticated measures of mindfulness that are not so vulnerable to experiment expectations and demand characteristics.

Until the accumulation of better studies with better measures, claims about the effects of practicing mindfulness ought to be recognized as based on relatively weak evidence.

The study

Noone, C & Hogan,M. Randomised active-controlled trial of effects of online mindfulness intervention on executive control, critical thinking and key thinking dispositionsBMC Psychology, 2018

Trial registration

The study was initially registered in the AEA Social Science Registry before the recruitment was initiated (RCT ID: AEARCTR-0000756; 14/11/2015) and retrospectively registered in the ISRCTN registry (RCT ID: ISRCTN16588423) in line with requirements for publishing the study protocol.

Excerpts from the Abstract

The aim of this study was…investigating the effects of an online mindfulness intervention on executive function, critical thinking skills, and associated thinking dispositions.


Participants recruited from a university were randomly allocated, following screening, to either a mindfulness meditation group or a sham meditation group. Both the researchers and the participants were blind to group allocation. The intervention content for both groups was delivered through the Headspace online application, an application which provides guided meditations to users.


Primary outcome measures assessed mindfulness, executive functioning, critical thinking, actively open-minded thinking, and need for cognition. Secondary outcome measures assessed wellbeing, positive and negative affect, and real-world outcomes.


Significant increases in mindfulness dispositions and critical thinking scores were observed in both the mindfulness meditation and sham meditation groups. However, no significant effects of group allocation were observed for either primary or secondary measures. Furthermore, mediation analyses testing the indirect effect of group allocation through executive functioning performance did not reveal a significant result and moderation analyses showed that the effect of the intervention did not depend on baseline levels of the key thinking dispositions, actively open-minded thinking, and need for cognition.

The authors conclude

While further research is warranted, claims regarding the benefits of mindfulness practice for critical thinking should be tempered in the meantime.

Headscape Be used on an iPhone

The active control condition

The sham treatment control condition was embarrassingly straightforward and simple. But as we will see, participants found it credible.

This condition presented the participants with guided breathing exercises. Each session began by inviting the participants to sit with their eyes closed. These exercises were referred to as meditation but participants were not given guidance on how to control their awareness of their body or breath. This approach was designed to control for the effects of expectations surrounding mindfulness and physiological relaxation to ensure that the effect size could be attributed to mindfulness practice specifically. This content was also delivered by Andy Puddicombe and was developed based on previous work by Zeidan and colleagues [55, 57, 58].

What can we conclude about the standard self-report measures of the state of mindfulness?

The study used the Five Facet Mindfulness Questionnaire, which is widely used to assess whether people are in a state of mindfulness. It has been cited almost 4000 times.

Participants assigned to the mindfulness condition had significant changes for all five facets from baseline to follow up: observing, non-reactivity, non-judgment, acting with awareness, and describing. In the absence of a comparison with change in the sham mindfulness group, these pre-post results would seem to suggest that the measure was sensitive to whether participants had practiced mindfulness. However, there were no differences from the changes observed for the participants assigned to mindfulness and those which were simply asked to sit with their eyes closed.

I asked Chris Noone about the questionnaires his group used to assess mindfulness:

The participants genuinely thought they were meditating in the sham condition so I think both non-specific and demand characteristics were roughly equivalent across both groups. I’m also skeptical regarding the ability of the Five-Facet Mindfulness Questionnaire (or any mindfulness questionnaire for that matter) to capture anything other than “perceived mindfulness”. The items used in these questionnaires feature similar content to the scripts used by the people delivering the mindfulness (and sham) guided meditations. The improvement in critical thinking across both groups is just a mix of learning across a semester and habituation to the task (as the same problems were posed at both measurements).

What I like about this trial

The trial provides a critical test of a key claim for mindfulness:

Mindfulness should facilitate critical thinking in higher-education, based on early Buddhist conceptualizations of mindfulness as clarity of thought.

The trial was registered before recruitment and departures from protocol were noted.

Sample size was determined by power analysis.

The study had a closely matched, active control condition, a sham mindfulness treatment.

The credibility and equivalence of this sham condition versus the active treatment under study was repeatedly assessed.

“Manipulation checks were carried out to assess intervention acceptability, technology acceptance and meditation quality 2 weeks after baseline and 4 weeks after baseline.”

The study tested some a priori hypotheses about mediators and moderation:

Analyses were intention to treat.

 How the study conflicts with past studies

Previous studies claimed to show positive effects of mindfulness on aspects of executive functioning [25 and  26]

How the contradiction of past studies by these results is resolved

 “There are many studies using guided meditations similar to those in our mindfulness meditation condition, delivered through smartphone applications [49, 50, 52, 90, 91], websites [92, 93, 94, 95, 96, 97] and CDs [98, 99], which show effects on measures of outcomes reliably associated with increases in mindfulness such as depression, anxiety, stress, wellbeing and compassion. There are two things to note about these studies – they tend not to include a measure of dispositional mindfulness (e.g. only 4% of all mindfulness intervention studies reviewed in a recent meta-analysis include such measures at baseline and follow-up; [54]) and they usually employ a weak form of control group such as a no-treatment control or waitlist control [54]. Therefore, even when change in mindfulness is assessed in mindfulness meditation intervention studies, it is usually overestimated and this must be borne in mind when comparing the results of this study with those of previous studies. This combined with generally only moderate correlations with behavioural outcomes [54] suggests that when mindfulness interventions are effective, dispositional measures do not fully capture what has changed.”

The broader take away messages

“Our results show that, for most outcomes, there were significant changes from baseline to follow-up but none which can be specifically attributed to the practice of mindfulness.’

This creative use of a sham mindfulness control condition is a breakthrough that should be widely followed. First, it allowed a fair test of whether mindfulness is any better than another active, credible treatment. Second, because the active treatment was a sham, results provide a challenge to the notion that apparent effects of mindfulness on critical thinking are anything more than a placebo effect.

The Headspace App is enormously popular and successful, based on claims about what benefits its use will provide. Some of these claims may need to be tempered, not only in terms of critical thinking, but effects on well-being.

The Headspace App platform lends itself to such critical evaluations with respect to a sham treatment with a degree of standardization that is not readily possible with face-to-face mindfulness training. This opportunity should be exploited further with other active control groups constructed on the basis of specific hypotheses.

There is far too much research on the practice of mindfulness being done that does not advance understanding of what works or how it works. We need a lot fewer studies, and more with adequate control/comparison groups.

Perhaps we should have a moratorium on evaluations of mindfulness without adequate control groups.

Perhaps articles being aimed at audiences making enthusiastic claims for the benefits of mindfulness should routinely note whether these claims are based on adequately controlled studies. Most are not.

Creating illusions of wondrous effects of yoga and meditation on health: A skeptic exposes tricks

The tour of the sausage factory is starting, here’s your brochure telling you’ll see.


A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.

What Is the Molecular Signature of Mind–Body Interventions? A Systematic Review of Gene Expression Changes Induced by Meditation and Related Practices.  Frontiers in Immunology. 2017;8.

Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release.  Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:

“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”


“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”

[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]

Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.

Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.

This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.

Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring  confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments.  Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:

More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.

Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review.  But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.

Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]

As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA.  We can be no more confident with what the trials provide than we would be without them ever having been done.

I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.

I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.

Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.

The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.

One giveaway is how the introductions  referred to the biggest, most badass, comprehensive and well-done review, that of Goyal and colleagues.

That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored.  If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing.  Only a handful  of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.

One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?

For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.

Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.

Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.

If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:

In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).

Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.

Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.

Some general rules:

  1. Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.

Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.

When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!

Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.

Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up.  If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.

  1. Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.

What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.

For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.

I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.

  1. Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.

In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants.  A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.

  1. Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.

Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).

  1. Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or  browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
  1. Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.

Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.

But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.

  1. Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
  1. Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.

Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.”  Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and  . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.

Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of  altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:

One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.

How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?



Before you enroll your child in the MAGENTA chronic fatigue syndrome study: Issues to be considered

[October 3 8:23 AM Update: I have now inserted Article 21 of the Declaration of Helsinki below, which is particularly relevant to discussions of the ethical problems of Dr. Esther Crawley’s previous SMILE trial.]

Petitions are calling for shutting down the MAGENTA trial. Those who organized the effort and signed the petition are commendably brave, given past vilification of any effort by patients and their allies to have a say about such trials.

Below I identify a number of issues that parents should consider in deciding whether to enroll their children in the MAGENTA trial or to withdraw them if they have already been enrolled. I take a strong stand, but I believe I have adequately justified and documented my points. I welcome discussion to the contrary.

This is a long read but to summarize the key points:

  • The MAGENTA trial does not promise any health benefits for the children participating in the trial. The information sheet for the trial was recently modified to suggest they might benefit. However, earlier versions clearly stated that no benefit was anticipated.
  • There is inadequate disclosure of likely harms to children participating in the trial.
  • An estimate of a health benefit can be evaluated from the existing literature concerning the effectiveness of the graded exercise therapy intervention with adults. Obtaining funding for the MAGENTA trial depended on a misrepresentation of the strength of evidence that it works in adult populations.  I am talking about the PACE trial.
  • Beyond any direct benefit to their children, parents might be motivated by the hope of contributing to science and the availability of effective treatments. However, these possible benefits depend on publication of results of a trial after undergoing peer review. The Principal Investigator for the MAGENTA trial, Dr. Esther Crawley, has a history of obtaining parents’ consent for participation of their children in the SMILE trial, but then not publishing the results in a timely fashion. Years later, we are still waiting.
  • Dr. Esther Crawley exposed children to unnecessary risk without likely benefit in her conduct of the SMILE trial. This clinical trial involved inflicting a quack treatment on children. Parents were not adequately informed of the nature of the treatment and the absence of evidence for any mechanism by which the intervention could conceivably be effective. This reflects on the due diligence that Dr. Crawley can be expected to exercise in the MAGENTA trial.
  • The consent form for the MAGENTA trial involves parents granting permission for the investigator to use children and parents’ comments concerning effects of the treatment for its promotion. Insufficient restrictions are placed on how the comments can be used. There is the clear precedent of comments made in the context of the SMILE trial being used to promote the quack Lightning Process treatment in the absence of evidence that treatment was actually effective in the trial. There is no guarantee that any comments collected from children and parents in the MAGENTA trial would not similarly be misused.
  • Dr. Esther Crawley participated in a smear campaign against parents having legitimate concerns about the SMILE trial. Parents making legitimate use of tools provided by the government such as Freedom of Information Act requests, appeals of decisions of ethical review boards and complaints to the General Medical Council were vilified and shamed.
  • Dr. Esther Crawley has provided direct, self-incriminating quotes in the newsletter of the Science Media Centre about how she was coached and directed by their staff to slam the patient community.  She played a key role in a concerted and orchestrated attack on the credibility of not only parents of participants in the MAGENTA trial, but of all patients having chronic fatigue syndrome/ myalgic encephalomyelitis , as well as their advocates and allies.

I am not a parent of a child eligible for recruitment to the MAGENTA trial. I am not even a citizen or resident of the UK. Nonetheless, I have considered the issues and lay out some of my considerations below. On this basis, I signed the global support version  of the UK petition to suspend all trials of graded exercise therapy in children and adults with ME/CFS. I encourage readers who are similarly in my situation outside the UK to join me in signing the global support petition.

If I were a parent of an eligible child or a resident of the UK, I would not enroll my child in MAGENTA. I would immediately withdraw my child if he or she were currently participating in the trial. I would request all the child’s data be given back or evidence that it had been destroyed.

I recommend my PLOS Mind the Brain post, What patients should require before consenting to participate in research…  as either a prelude or epilogue to the following blog post.

What you will find here is a discussion of matters that parents should consider before enrolling their children in the MAGENTA trial of graded exercise for chronic fatigue syndrome. The previous blog post [http://blogs.plos.org/mindthebrain/2015/12/09/what-patients-should-require-before-consenting-to-participate-in-research/ ]  is rich in links to an ongoing initiative from The BMJ to promote broader involvement of patients (and implicitly, parents of patients) in the design, implementation, and interpretation of clinical trials. The views put forth by The BMJ are quite progressive, even if there is a gap between their expression of views and their actual implementation. Overall, that blog post presents a good set of standards for patients (and parents) making informed decisions concerning enrollment in clinical trials.

Simon McGrathLate-breaking update: See also

Simon McGrath: PACE trial shows why medicine needs patients to scrutinise studies about their health

Basic considerations.

Patients are under no obligation to participate in clinical trials. It should be recognized that any participation typically involves burden and possibly risk over what is involved in receiving medical care outside of a clinical trial.

It is a deprivation of their human rights and a violation of the Declaration of Helsinki to coerce patients to participate in medical research without freely given, fully informed consent.

Patients cannot be denied any medical treatment or attention to which they would otherwise be entitled if they fail to enroll in a clinical trial.

Issues are compounded when consent from parents is sought for participation of vulnerable children and adolescents for whom they have legal responsibility. Although assent to participate in clinical trials is sought from children and adolescents, it remains for their parents to consent to their participation.

Parents can at any time withdraw their consent for their children and adolescents participating in trials and have their data removed, without requiring the approval of any authorities of their reason for doing so.

Declaration of Helsinki

The World Medical Association (WMA) has developed the Declaration of Helsinki as a statement of ethical principles for medical research involving human subjects, including research on identifiable human material and data.

It includes:

In medical research involving human subjects capable of giving informed consent, each potential subject must be adequately informed of the aims, methods, sources of funding, any possible conflicts of interest, institutional affiliations of the researcher, the anticipated benefits and potential risks of the study and the discomfort it may entail, post-study provisions and any other relevant aspects of the study. The potential subject must be informed of the right to refuse to participate in the study or to withdraw consent to participate at any time without reprisal. Special attention should be given to the specific information needs of individual potential subjects as well as to the methods used to deliver the information.

[October 3 8:23 AM Update]: I have now inserted Article 21 of the Declaration of Helsinki which really nails the ethical problems of the SMILE trial:

21. Medical research involving human subjects must conform to generally accepted scientific principles, be based on a thorough knowledge of the scientific literature, other relevant sources of information, and adequate laboratory and, as appropriate, animal experimentation. The welfare of animals used for research must be respected.

There is clearly in adequate scientific justification for testing the quack Lightning Process Treatment.

What Is the Magenta Trial?

The published MAGENTA study protocol states

This study aims to investigate the acceptability and feasibility of carrying out a multicentre randomised controlled trial investigating the effectiveness of graded exercise therapy compared with activity management for children/teenagers who are mildly or moderately affected with CFS/ME.

Methods and analysis 100 paediatric patients (8–17 years) with CFS/ME will be recruited from 3 specialist UK National Health Service (NHS) CFS/ME services (Bath, Cambridge and Newcastle). Patients will be randomised (1:1) to receive either graded exercise therapy or activity management. Feasibility analysis will include the number of young people eligible, approached and consented to the trial; attrition rate and treatment adherence; questionnaire and accelerometer completion rates. Integrated qualitative methods will ascertain perceptions of feasibility and acceptability of recruitment, randomisation and the interventions. All adverse events will be monitored to assess the safety of the trial.

The first of two treatments being compared is:

Arm 1: activity management

This arm will be delivered by CFS/ME specialists. As activity management is currently being delivered in all three services, clinicians will not require further training; however, they will receive guidance on the mandatory, prohibited and flexible components (see online supplementary appendix 1). Clinicians therefore have flexibility in delivering the intervention within their National Health Service (NHS) setting. Activity management aims to convert a ‘boom–bust’ pattern of activity (lots 1 day and little the next) to a baseline with the same daily amount before increasing the daily amount by 10–20% each week. For children and adolescents with CFS/ME, these are mostly cognitive activities: school, schoolwork, reading, socialising and screen time (phone, laptop, TV, games). Those allocated to this arm will receive advice about the total amount of daily activity, including physical activity, but will not receive specific advice about their use of exercise, increasing exercise or timed physical exercise.

So, the first arm of the trial is a comparison condition consisting of standard care delivered without further training of providers. The treatment is flexibly delivered, expected to vary between settings, and thus largely uncontrolled. The treatment represents a methodologically weak condition that does not adequately control for attention and positive expectations. Control conditions should be equivalent to the intervention being evaluated in these dimensions.

The second arm of the study:

Arm 2: graded exercise therapy (GET)

This arm will be delivered by referral to a GET-trained CFS/ME specialist who will receive guidance on the mandatory, prohibited and flexible components (see online supplementary appendix 1). They will be encouraged to deliver GET as they would in their NHS setting.20 Those allocated to this arm will be offered advice that is focused on exercise with detailed assessment of current physical activity, advice about exercise and a programme including timed daily exercise. The intervention will encourage children and adolescents to find a baseline level of exercise which will be increased slowly (by 10–20% a week, as per NICE guidance5 and the Pacing, graded Activity and Cognitive behaviour therapy – a randomised Evaluation (PACE)12 ,21). This will be the median amount of daily exercise done during the week. Children and adolescents will also be taught to use a heart rate monitor to avoid overexertion. Participants will be advised to stay within the target heart rate zones of 50–70% of their maximum heart rate.5 ,7

The outcome of the trial will be evaluated in terms of

Quantitative analysis

The percentage recruited of those eligible will be calculated …Retention will be estimated as the percentage of recruited children and adolescents reaching the primary 6-month follow-up point, who provide key outcome measures (the Chalder Fatigue Scale and the 36-Item Short-Form Physical Functioning Scale (SF-36 PFS)) at that assessment point.

actigraphObjective data will be collected in the form of physical activity measured by Accelerometers. These are

Small, matchbox-sized devices that measure physical activity. They have been shown to provide reliable indicators of physical activity among children and adults.

However, actual evaluation of the outcome of the trial will focus on recruitment and retention and subjective, self-report measures of fatigue and physical functioning. These subjective measures have been shown to be less valid than objective measures. Scores are  vulnerable  to participants knowing what condition they are assigned to (called ‘being unblinded’) and their perception of which intervention the investigators prefer.

It is notable that in the PACE trial of CBT and GET for chronic fatigue syndrome in adults, the investigators manipulated participants’ self-reports with praise in newsletters sent out during the trial . The investigators also switched their scoring of the self-report measures and produced results that they later conceded to have been exaggerated by their changing in scoring of the self-report measures [http://www.wolfson.qmul.ac.uk/current-projects/pace-trial#news ].

Irish ME/CFS Association Officer & Tom Kindlon
Tom Kindlon, Irish ME/CFS Association Officer

See an excellent commentary by Tom Kindlon at PubMed Commons [What’s that? ]

The validity of using subjective outcome measures as primary outcomes is questionable in such a trial

The bottom line is that the investigators have a poorly designed study with inadequate control condition. They have chosen subjective self-reports that are prone to invalidity and manipulation over objective measures like actual changes in activity or practical real-world measures like school attendance. Not very good science here. But they are asking parents to sign their children up.

What is promised to parents consenting to have the children enrolled in the trial?

The published protocol to which the investigators supposedly committed themselves stated

What are the possible benefits and risks of participating?
Participants will not benefit directly from taking part in the study although it may prove enjoyable contributing to the research. There are no risks of participating in the study.

Version 7 of the information sheet provided to parents, states

Your child may benefit from the treatment they receive, but we cannot guarantee this. Some children with CFS/ME like to know that they are helping other children in the future. Your child may also learn about research.

Survey assessments conducted by the patient community strongly contradict the suggestion that there is no risk of harm with GET.

alemAlem Matthees, the patient activist who obtained release of the PACE data and participated in reanalysis has commented:

“Given that post-exertional symptomatology is a hallmark of ME/CFS, it is premature to do trials of graded exercise on children when safety has not first been properly established in adults. The assertion that graded exercise is safe in adults is generally based on trials where harms are poorly reported or where the evidence of objectively measured increases in total activity levels is lacking. Adult patients commonly report that their health was substantially worsened after trying to increase their activity levels, sometimes severely and permanently, therefore this serious issue cannot be ignored when recruiting children for research.”

See also

Kindlon T. Reporting of harms associated with graded exercise therapy and cognitive behavioural therapy in myalgic encephalomyelitis/chronic fatigue syndrome. Bulletin of the IACFS/ME. 2011;19(2):59-111.

This thorough systematic review reports inadequacy in harm reporting in clinical trials, but:

Exercise-related physiological abnormalities have been documented in recent studies and high rates of adverse  reactions  to exercise have been  recorded in  a number of  patient surveys. Fifty-one percent of  survey respondents (range 28-82%, n=4338, 8 surveys) reported that GET worsened their health while 20% of respondents (range 7-38%, n=1808, 5 surveys) reported similar results for CBT.

The unpublished results of Dr. Esther Crawley’s SMILE trial

 A Bristol University website indicates that recruitment of the SMILE trial was completed in 2013. The published protocol for the SMILE trial

[Note the ® in the title below, indicating a test of trademarked commercial product. The significance of that is worthy of a whole other blog post. ]

Crawley E, Mills N, Hollingworth W, Deans Z, Sterne JA, Donovan JL, Beasant L, Montgomery A. Comparing specialist medical care with specialist medical care plus the Lightning Process® for chronic fatigue syndrome or myalgic encephalomyelitis (CFS/ME): study protocol for a randomised controlled trial (SMILE Trial). Trials. 2013 Dec 26;14(1):1.


The data monitoring group will receive notice of serious adverse events (SAEs) for the sample as whole. If the incidence of SAEs of a similar type is greater than would be expected in this population, it will be possible for the data monitoring group to receive data according to trial arm to determine any evidence of excess in either arm.

Primary outcome data at six months will be examined once data are available from 50 patients, to ensure that neither arm is having a detrimental effect on the majority of patients. An independent statistician with no other involvement in the study will investigate whether more than 20 participants in the study sample as a whole have experienced a reduction of ≥ 30 points on the SF-36 at six months. In this case, the data will then be summarised separately by trial arm, and sent to the data monitoring group for review. This process will ensure that the trial team will not have access to the outcome data separated by treatment arm.

A Bristol University website indicates that recruitment of the SMILE trial was completed in 2013. The trial was thus completed a number of years ago, but these valuable data have never been published.

The only publication from the trial so far uses selective quotes from child participants that cannot be independently evaluated. Readers are not told how representative these quotes, the outcomes for the children being quoted or the overall outcomes of the trial.

Parslow R, Patel A, Beasant L, Haywood K, Johnson D, Crawley E. What matters to children with CFS/ME? A conceptual model as the first stage in developing a PROM. Archives of Disease in Childhood. 2015 Dec 1;100(12):1141-7.

The “evaluation” of the quack Lightning Treatment in the SMILE trial and quotes from patients have also been used to promote Parker’s products as being used in NHS clinics.

How can I say the Lightning Process is quackery?

 Dr. Crawley describes the Lightning Process in the Research Ethics Application Form for the SMILE study as   ombining the principles of neurolinguistic programming, osteopathy, and clinical hypnotherapy.

That is an amazing array of three different frameworks from different disciplines. You would be hard pressed to find an example other than the Lightning Process that claimed to integrate them. Yet, any mechanisms for explaining therapeutic interventions cannot be a creative stir fry of whatever is on hand being thrown together. For a treatment to be considered science-based, there has to be a solid basis of evidence that these presumably complex processes fit together as assumed and work as assumed. I challenge Dr. Crawley or anyone else to produce a shred of credible, peer-reviewed evidence for the basic mechanism of the Lightning Process.

The entry for Neuro-linguistic programming (NLP) in Wikipedia states

There is no scientific evidence supporting the claims made by NLP advocates and it has been discredited as a pseudoscience by experts.[1][12] Scientific reviews state that NLP is based on outdated metaphors of how the brain works that are inconsistent with current neurological theory and contain numerous factual errors.[13][14

The respected Skeptics Dictionary offers a scathing critique of Phil Parker’s Lightning Process. The critique specifically cites concerns that Crawley’s SMILE trial switched outcomes to increase the likelihood of obtaining evidence of effectiveness.

 The Hampshire (UK) County Council Trading Standards Office filed a formal complaint against Phil Parker for claims made on the Lightning Process website concerning effects on CFS/ME:

The “CFS/ME” page of the website included the statements “Our survey found that 81.3 %* of clients report that they no longer have the issues they came with by day three of the LP course” and “The Lightning Process is working with the NHS on a feasibility study, please click here for further details, and for other research information click here”.

parker nhs advert
Seeming endorsements on Parker’s website. Two of them –Northern Ireland and NHS Suffolk subsequently complained that use of their insignias was unauthorized and they were quickly removed.

The “working with the NHS” refers to the collaboration with Dr. Easter Crawley.

The UK Advertising Standards Authority upheld this complaint, as well as about Parker’s claims about effectiveness with other conditions, including  multiple sclerosis, irritable bowel syndrome and fibromyalgia

 Another complaint in 2013 about claims on Phil Parker’s website was similarly upheld:

 The claims must not appear again in their current form. We welcomed the decision to remove the claims. We told Phil Parker Group not to make claims on websites within their control that were directly connected with the supply of their goods and services if those claims could not be supported with robust evidence. We also told them not to refer to conditions for which advice should be sought from suitably qualified health professionals.

 As we will see, these upheld charges of quackery occurred when parents of children participating in the SMILE trial were being vilified in the BMJ and elsewhere. Dr. Crawley was prominently featured in this vilification and was quoted in a celebration of its success by the Science Media Centre, which had orchestrated the vilification.

Captured cfs praker ad

The Research Ethics Committee approval of the SMILE trial and the aftermath

 I was not very aware of the CFS/ME literature, and certainly not all its controversies when the South West Research Ethics Committee (REC) reviewed the application for the SMILE trial and ultimately approved it on September 8, 2010.

I would have had strong opinions about it. I only first started blogging a little afterwards.  But I was very concerned about any patients being exposed to alternative and unproven medical treatments in other contexts that were not evidence-based – even more so to treatments for which promoters claimed implausible mechanisms by which they worked. I would not have felt it appropriate to inflict the Lightning Process on unsuspecting children. It is insufficient justification to put them a clinical trial simply because a particular treatment has not been evaluated.

 Prince Charles once advocated organic coffee enemas to treat advanced cancer. His endorsement generated a lot of curiosity from cancer patients. But that would not justify a randomized trial of coffee enemas. By analogy, I don’t think Dr. Esther Crawley had sufficient justification to conduct her trial, especially without warnings that that there was no scientific basis to expect the Lightning Process to work or that it would not hurt the children.

 I am concerned about clinical trials that have little likelihood of producing evidence that a treatment is effective, but that seemed designed to get these treatments into routine clinical care. it is now appreciated that some clinical trials have little scientific value but serve as experimercials or means of placing products in clinical settings. Pharmaceutical companies notoriously do this.

As it turned out, the SMILE trial succeeded admirably as a promotion for the Lightning Process, earning Phil Parker unknown but substantial fees through its use in the SMILE trial, but also in successful marketing throughout the NHS afterwards.

In short, I would been concerned about the judgment of Dr. Esther Crawley in organizing the SMILE trial. I would been quite curious about conflicts of interest and whether patients were adequately informed of how Phil Parker was benefiting.

The ethics review of the SMILE trial gave short shrift to these important concerns.

When the patient community and its advocate, Dr. Charles Shepherd, became aware of the SMILE trial’s approval, there were protests leading to re-evaluations all the way up to the National Patient Safety Agency. Examining an Extract of Minutes from South West 2 REC meeting held on 2 December 2010, I see many objections to the approval being raised and I am unsatisfied by the way in which they were discounted.

Patient, parent, and advocate protests escalated. If some acted inappropriate, this did not undermine the righteousness of others legitimate protest. By analogy, I feel strongly about police violence aimed against African-Americans and racist policies that disproportionately target African-Americans for police scrutiny and stoppng. I’m upset when agitators and provocateurs become violent at protests, but that does not delegitimize my concerns about the way black people are treated in America.

Dr. Esther Crawley undoubtedly experienced considerable stress and unfair treatment, but I don’t understand why she was not responsive to patient concerns nor  why she failed to honor her responsibility to protect child patients from exposure to unproven and likely harmful treatments.

Dr. Crawley is extensively quoted in a British Medical Journal opinion piece authored by a freelance journalist,  Nigel Hawkes:

Hawkes N. Dangers of research into chronic fatigue syndrome. BMJ. 2011 Jun 22;342:d3780.

If I had been on the scene, Dr. Crawley might well have been describing me in terms of how I would react, including my exercising of appropriate, legally-provided means of protest and complaint:

Critics of the method opposed the trial, first, Dr Crawley says, by claiming it was a terrible treatment and then by calling for two ethical reviews. Dr Shepherd backed the ethical challenge, which included the claim that it was unethical to carry out the trial in children, made by the ME Association and the Young ME Sufferers Trust. After re-opening its ethical review and reconsidering the evidence in the light of the challenge, the regional ethical committee of the NHS reiterated its support for the trial.

There was arguably some smearing of Dr. Shepherd, even in some distancing of him from the action of others:

This point of view, if not the actions it inspires, is defended by Charles Shepherd, medical adviser to and trustee of the ME Association. “The anger and frustration patients have that funding has been almost totally focused on the psychiatric side is very justifiable,” he says. “But the way a very tiny element goes about protesting about it is not acceptable.

This article escalated with unfair comparisons to animal rights activists, with condemnation of appropriate use of channels of complaint – reporting physicians to the General Medical Council.

The personalised nature of the campaign has much in common with that of animal rights activists, who subjected many scientists to abuse and intimidation in the 1990s. The attitude at the time was that the less said about the threats the better. Giving them publicity would only encourage more. Scientists for the most part kept silent and journalists desisted from writing about the subject, partly because they feared anything they wrote would make the situation worse. Some journalists have also been discouraged from writing about CFS/ME, such is the unpleasant atmosphere it engenders.

While the campaigners have stopped short of the violent activities of the animal rights groups, they have another weapon in their armoury—reporting doctors to the GMC. Willie Hamilton, an academic general practitioner and professor of primary care diagnostics at Peninsula Medical School in Exeter, served on the panel assembled by the National Institute for Health and Clinical Excellence (NICE) to formulate treatment advice for CFS/ME.

Simon Wessely and the Principal Investigator of the PACE trial, Peter White, were given free rein to dramatize their predicament posed by the protest. Much later, in the 2016 Lower Tribunal Hearing, testimony would be given by PACE

Co-Investigator Trudie Chalder would much later (2016) cast doubt on whether the harassment was as severe or violent as it was portrayed. Before that, the financial conflicts of interest of Peter White that were denied in the article would be exposed.

In response to her testimony, the UK Information Officer stated:

Professor Chalder’s evidence when she accepts that unpleasant things have been said to and about PACE researchers only, but that no threats have been made either to researchers or participants.

But in 2012, a pamphlet celebrating the success of The Science Media Centre started by Wessely would be rich in indiscreet quotes from Esther Crawley. The article in BMJ was revealed to be part of a much larger orchestrated campaign to smear, discredit and silence patients, parents, advocates and their allies.

Dr. Esther Crawley’s participation in a campaign organized by the Science Media Center to discredit patients, parents, advocates and supporters.

 The SMC would later organize a letter writing campaign to Parliament in support of Peter White and his refusal to release the PACE data to Alem Mattheees who had made a requestunder the Freedom of Information Act. The letter writing campaign was an effort to get scientific data excluded from the provisions of the freedom of information act. The effort failed and the data were subsequently released.

But here is how Esther Crawley described her assistance:

The SMC organised a meeting so we could discuss what to do to protect researchers. Those who had been subject to abuse met with press officers, representatives from the GMC and, importantly, police who had dealt with the  animal rights campaign. This transformed my view of  what had been going on. I had thought those attacking us were “activists”; the police explained they were “extremists”.


We were told that we needed to make better use of the law and consider using the press in our favour – as had researchers harried by animal rights extremists. “Let the public know what you are trying to do and what is happening to you,” we were told. “Let the public decide.”


I took part in quite a few interviews that day, and have done since. I was also inundated with letters, emails and phone calls from patients with CFS/ME all over the world asking me to continue and not “give up”. The malicious, they pointed out, are in a minority. The abuse has stopped completely. I never read the activists’ blogs, but friends who did told me that they claimed to be “confused” and “upset” – possibly because their role had been switched from victim to abuser. “We never thought we were doing any harm…”

 The patient community and its allies are still burdened by the damage of this effort and are rebuilding its credibility only slowly. Only now are they beginning to get an audience as suffering human beings with significant, legitimate unmet needs. Only now are they escaping the stigmatization that occurred at this time with Esther Crawley playing a key role.

Where does this leave us?

stop posterParents are being asked to enroll in a clinical trial without clear benefit to the children but with the possibility of considerable risk from the graded exercise. They are being asked by Esther Crawley, a physician, who has previously inflicted a quack treatment on their children with CFS/ME in the guise of a clinical trial, for which he is never published the resulting data. She has played an effective role in damaging the legitimacy and capacity of patients and parents to complain.

Given this history and these factors, why would a parent possibly want to enroll their children in the MAGENTA trial? Somebody please tell me.

Special thanks to all the patient citizen-scientists who contributed to this blog post. Any inaccuracies or excesses are entirely my own, but these persons gave me substantial help. Some are named in the blog, but others prefer anonymity.

 All opinions expressed are solely those of James C Coyne. The blog post in no way conveys any official position of Mind the Brain, PLOS blogs or the larger PLOS community. I appreciate the free expression of  personal opinion that I am allowed.







Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned

  • A chance to test your rules of thumb for quickly evaluating clinical trials of alternative or integrative  medicine in prestigious journals.
  • A chance to increase your understanding of the importance of  well-defined control groups and blinding in evaluating the risk of bias of clinical trials.
  • A chance to understand the difference between merely evidence-based treatments versus science-based treatments.
  • Lessons learned can be readily applied to many wasteful evaluations of psychotherapy with shared characteristics.

A press release from the University of Michigan about a study of acupressure for fatigue in cancer patients was churnaled  – echoed – throughout the media. It was reproduced dozens of times, with little more than an editor’s title change from one report to the next.

Fortunately, the article that inspired all the fuss was freely available from the prestigious JAMA: Oncology. But when I gained access, I quickly saw that it was not worth my attention, based on what I already knew or, as I often say, my prior probabilities. Rules of thumb is a good enough term.

So the article became another occasion for us to practice our critical appraisal skills, including, importantly, being able to make reliable and valid judgments that some attention in the media is worth dismissing out of hand, even when tied to an article in a prestigious medical journal.

The press release is here: Acupressure reduced fatigue in breast cancer survivors: Relaxing acupressure improved sleep, quality of life.

A sampling of the coverage:

sample coverage

As we’ve come to expect, the UK Daily Mail editor added its own bit of spin:

daily mailHere is the article:

Zick SM, Sen A, Wyatt GK, Murphy SL, Arnedt J, Harris RE. Investigation of 2 Types of Self-administered Acupressure for Persistent Cancer-Related Fatigue in Breast Cancer Survivors: A Randomized Clinical Trial. JAMA Oncol. Published online July 07, 2016. doi:10.1001/jamaoncol.2016.1867.

Here is the Trial registration:

All I needed to know was contained in a succinct summary at the Journal website:

key points

This is a randomized clinical trial (RCT) in which two active treatments that

  • Lacked credible scientific mechanisms
  • Were predictably shown to be better than
  • A routine care that lacked the positive expectations and support.
  • A primary outcome assessed by  subjectiveself-report amplified the illusory effectiveness of the treatments.

But wait!

The original research appeared in a prestigious peer-reviewed journal published by the American Medical Association, not a  disreputable journal on Beall’s List of Predatory Publishers.

Maybe  this means publication in a peer-reviewed prestigious journal is insufficient to erase our doubts about the validity of claims.

The original research was performed with a $2.65 million peer-reviewed grant from the National Cancer Institute.

Maybe NIH is wasting scarce money on useless research.

What is acupressure?

 According to the article

Acupressure, a method derived from traditional Chinese medicine (TCM), is a treatment in which pressure is applied with fingers, thumbs, or a device to acupoints on the body. Acupressure has shown promise for treating fatigue in patients with cancer,23 and in a study24 of 43 cancer survivors with persistent fatigue, our group found that acupressure decreased fatigue by approximately 45% to 70%. Furthermore, acupressure points termed relaxing (for their use in TCM to treat insomnia) were significantly better at improving fatigue than another distinct set of acupressure points termed stimulating (used in TCM to increase energy).24 Despite such promise, only 5 small studies24– 28 have examined the effect of acupressure for cancer fatigue.

290px-Acupuncture_point_Hegu_(LI_4)You can learn more about acupressure here. It is a derivative of acupuncture, that does not involve needles, but the same acupuncture pressure points or acupoints as acupuncture.

Don’t be fooled by references to traditional Chinese medicine (TCM) as a basis for claiming a scientific mechanism.

See Chairman Mao Invented Traditional Chinese Medicine.

Chairman Mao is quoted as saying “Even though I believe we should promote Chinese medicine, I personally do not believe in it. I don’t take Chinese medicine.”


Alan Levinovitz, author of the Slate article further argues:


In truth, skepticism, empiricism, and logic are not uniquely Western, and we should feel free to apply them to Chinese medicine.

After all, that’s what Wang Qingren did during the Qing Dynasty when he wrote Correcting the Errors of Medical Literature. Wang’s work on the book began in 1797, when an epidemic broke out in his town and killed hundreds of children. The children were buried in shallow graves in a public cemetery, allowing stray dogs to dig them up and devour them, a custom thought to protect the next child in the family from premature death. On daily walks past the graveyard, Wang systematically studied the anatomy of the children’s corpses, discovering significant differences between what he saw and the content of Chinese classics.

And nearly 2,000 years ago, the philosopher Wang Chong mounted a devastating (and hilarious) critique of yin-yang five phases theory: “The horse is connected with wu (fire), the rat with zi (water). If water really conquers fire, [it would be much more convincing if] rats normally attacked horses and drove them away. Then the cock is connected with ya (metal) and the hare with mao (wood). If metal really conquers wood, why do cocks not devour hares?” (The translation of Wang Chong and the account of Wang Qingren come from Paul Unschuld’s Medicine in China: A History of Ideas.)

Trial design

A 10-week randomized, single-blind trial comparing self-administered relaxing acupressure with stimulating acupressure once daily for 6 weeks vs usual care with a 4-week follow-up was conducted. There were 5 research visits: at screening, baseline, 3 weeks, 6 weeks (end of treatment), and 10 weeks (end of washout phase). The Pittsburgh Sleep Quality Index (PSQI) and Long-Term Quality of Life Instrument (LTQL) were administered at baseline and weeks 6 and 10. The Brief Fatigue Inventory (BFI) score was collected at baseline and weeks 1 through 10.

Note that the trial was “single-blind.” It compared two forms of acupressure, relaxing versus stimulating. Only the patient was blinded to which of these two treatments was being provided, except patients clearly knew whether or not they were randomized to usual care. The providers were not blinded and were carefully supervised by the investigators and provided feedback on their performance.

The combination of providers not being blinded, patients knowing whether they were randomized to routine care, and subjective self-report outcomes together are the makings of a highly biased trial.


Usual care was defined as any treatment women were receiving from health care professionals for fatigue. At baseline, women were taught to self-administer acupressure by a trained acupressure educator.29 The 13 acupressure educators were taught by one of the study’s principal investigators (R.E.H.), an acupuncturist with National Certification Commission for Acupuncture and Oriental Medicine training. This training included a 30-minute session in which educators were taught point location, stimulation techniques, and pressure intensity.

Relaxing acupressure points consisted of yin tang, anmian, heart 7, spleen 6, and liver 3. Four acupoints were performed bilaterally, with yin tang done centrally. Stimulating acupressure points consisted of du 20, conception vessel 6, large intestine 4, stomach 36, spleen 6, and kidney 3. Points were administered bilaterally except for du 20 and conception vessel 6, which were done centrally (eFigure in Supplement 2). Women were told to perform acupressure once per day and to stimulate each point in a circular motion for 3 minutes.

Note that the control/comparison condition was an ill-defined usual care in which it is not clear that patients received any attention and support for their fatigue. As I have discussed before, we need to ask just what was being controlled by this condition. There is no evidence presented that patients had similar positive expectations and felt similar support in this condition to what was provided in the two active treatment conditions. There is no evidence of equivalence of time with a provider devoted exclusively to the patients’ fatigue. Unlike patients assigned to usual care, patients assigned to one of the acupressure conditions received a ritual delivered with enthusiasm by a supervised educator.

Note the absurdity of the  naming of the acupressure points,  for which the authority of traditional Chinese medicine is invoked, not evidence. This absurdity is reinforced by a look at a diagram of acupressure points provided as a supplement to the article.

relaxation acupuncture pointsstimulation acupressure points


Among the many problems with “acupuncture pressure points” is that sham stimulation generally works as well as actual stimulation, especially when the sham is delivered with appropriate blinding of both providers and patients. Another is that targeting places of the body that are not defined as acupuncture pressure points can produce the same results. For more elaborate discussion see Can we finally just say that acupuncture is nothing more than an elaborate placebo?

 Worth looking back at credible placebo versus weak control condition

In a recent blog post   I discussed an unusual study in the New England Journal of Medicine  that compared an established active treatment for asthma to two credible control conditions, one, an inert spray that was indistinguishable from the active treatment and the other, acupuncture. Additionally, the study involved a no-treatment control. For subjective self-report outcomes, the active treatment, the inert spray and acupuncture were indistinguishable, but all were superior to the no treatment control condition. However, for the objective outcome measure, the active treatment was more effective than all of the three comparison conditions. The message is that credible placebo control conditions are superior to control conditions lacking and positive expectations, including no treatment and, I would argue, ill-defined usual care that lacks positive expectations. A further message is ‘beware of relying on subjective self-report measures to distinguish between active treatments and placebo control conditions’.


At week 6, the change in BFI score from baseline was significantly greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.6 [1.5] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.1 [1.6] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P  = .29). At week 10, the change in BFI score from baseline was greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.3 [1.4] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.0 [1.5] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P > .99) (Figure 2). The mean percentage fatigue reductions at 6 weeks were 34%, 27%, and −1% in relaxing acupressure, stimulating acupressure, and usual care, respectively.

These are entirely expectable results. Nothing new was learned in this study.

The bottom line for this study is that there was absolutely nothing to be gained by comparing an inert placebo condition to another inert placebo condition to an uninformative condition without clear evidence the control condition offered control of nonspecific factors – positive expectations, support, and attention. This was a waste of patient time and effort, as well as government funds, and produced results that were potentially misleading to patients. Namely, results are likely to be misinterpreted the acupressure is an effective, evidence-based treatment for cancer-related fatigue.

How the authors explained their results

Why might both acupressure arms significantly improve fatigue? In our group’s previous work, we had seen that cancer fatigue may arise through multiple distinct mechanisms.15 Similarly, it is also known in the acupuncture literature that true and sham acupuncture can improve symptoms equally, but they appear to work via different mechanisms.40 Therefore, relaxing acupressure and stimulating acupressure could elicit improvements in symptoms through distinct mechanisms, including both specific and nonspecific effects. These results are also consistent with TCM theory for these 2 acupoint formulas, whereby the relaxing acupressure acupoints were selected to treat insomnia by providing more restorative sleep and improving fatigue and the stimulating acupressure acupoints were chosen to improve daytime activity levels by targeting alertness.

How could acupressure lead to improvements in fatigue? The etiology of persistent fatigue in cancer survivors is related to elevations in brain glutamate levels, as well as total creatine levels in the insula.15 Studies in acupuncture research have demonstrated that brain physiology,41 chemistry,42 and function43 can also be altered with acupoint stimulation. We posit that self-administered acupressure may have similar effects.

Among the fallacies of the authors’ explanation is the key assumption that they are dealing with a specific, active treatment effect rather than a nonspecific placebo intervention. Supposed differences between relaxing versus stimulating acupressure arise in trials with a high risk of bias due to unblinded providers of treatment and inadequate control/comparison conditions. ‘There is no there there’ to be explained, to paraphrase a quote attributed to Gertrude Stein

How much did this project cost?

 According to the NIH Research Portfolios Online Reporting Tools website, this five-year project involved support by the federal government of $2,265,212 in direct and indirect costs. The NCI program officer for investigator-initiated  R01CA151445 is Ann O’Marawho serves ina similar role for a number of integrative medicine projects.

How can expenditure of this money be justified for determining whether so-called stimulating acupressure is better than relaxing acupressure for cancer-related fatigue?

 Consider what could otherwise have been done with these monies.

 Evidence-based versus science based medicine

Proponents of unproven “integrative cancer treatments” can claim on the basis of the study the acupressure is an evidence-based treatment. Future Cochrane Collaboration Reviews may even cite this study as evidence for this conclusion.

I normally label myself as an evidence-based skeptic. I require evidence for claims of the efficacy of treatments and am skeptical of the quality of the evidence that is typically provided, especially when it comes from enthusiasts of particular treatments. However, in other contexts, I describe myself as a science based medicine skeptic. The stricter criteria for this term is that not only do I require evidence of efficacy for treatments, I require evidence for the plausibility of the science-based claims of mechanism. Acupressure might be defined by some as an evidence-based treatment, but it is certainly not a science-based treatment.

For further discussion of this important distinction, see Why “Science”-Based Instead of “Evidence”-Based?

Broader relevance to psychotherapy research

The efficacy of psychotherapy is often overestimated because of overreliance on RCTs that involve inadequate comparison/control groups. Adequately powered studies of the comparative efficacy of psychotherapy that include active comparison/control groups are infrequent and uniformly provide lower estimates of just how efficacious psychotherapy is. Most psychotherapy research includes subjective patient self-report measures as the primary outcomes, although some RCTs provide independent, blinded interview measures. A dependence on subjective patient self-report measures amplifies the bias associated with inadequate comparison/control groups.

I have raised these issues with respect to mindfulness-based stress reduction (MBSR) for physical health problems  and for prevention of relapse in recurrence in patients being tapered from antidepressants .

However, there is a broader relevance to trials of psychotherapy provided to medically ill patients with a comparison/control condition that is inadequate in terms of positive expectations and support, along with a reliance on subjective patient self-report outcomes. The relevance is particularly important to note for conditions in which objective measures are appropriate, but not obtained, or obtained but suppressed in reports of the trial in the literature.

Study protocol violations, outcomes switching, adverse events misreporting: A peek under the hood

An extraordinary, must-read article is now available open access:

Jureidini, JN, Amsterdam, JD, McHenry, LB. The citalopram CIT-MD-18 pediatric depression trial: Deconstruction of medical ghostwriting, data mischaracterisation and academic malfeasance. International Journal of Risk & Safety in Medicine, vol. 28, no. 1, pp. 33-43, 2016

The authors had access to internal documents written with the belief that they would be left buried in corporate files. However, these documents became publicly available in a class-action product liability suit concerning the marketing of the antidepressant citalopram for treating children and adolescents.

Detailed evidence of ghost writing by industry sponsors has considerable shock value. But there is a broader usefulness to this article allowing us to peek in on the usually hidden processes by which null findings and substantial adverse events are spun into a positive report of the efficacy and safety of a treatment.

another peeking under the hoodWe are able to see behind the scenes how an already underspecified protocol was violated, primary and secondary outcomes were switched or dropped, and adverse events were suppressed in order to obtain the kind of results needed for a planned promotional effort and the FDA approval for use of the drug in these populations.

We can see how subtle changes in analyses that would otherwise go unnoticed can have a profound impact on clinical and public policy.

In so many other situations, we are left only with our skepticism about results being too good to be true. We are usually unable to evaluate independently investigators’ claims because protocols are unavailable, deviations are not noted, analyses are conducted and reported without transparency. Importantly, there usually is no access to data that would be necessary for reanalysis.

ghostwriter_badThe authors whose work is being criticized are among the most prestigious child psychiatrists in the world. The first author is currently President-elect of the American Academy of Child and Adolescent Psychiatry. The journal is among the top psychiatry journals in the world. A subscription is provided as part of membership in the American Psychiatric Association. Appearing in this journal is thus strategic because its readership includes many practitioners and clinicians who will simply defer to academics publishing in a journal they respect, without inclination to look carefully.

Indeed, I encourage readers to go to the original article and read it before proceeding further in the blog. Witness the unmasking of how null findings were turned positive. Unless you had been alerted, would you have detected that something was amiss?

Some readers have participated in multisite trials other than as a lead investigator.  I ask them to imagine that they had had received the manuscript for review and approval and assumed it was vetted by the senior investigators – and only the senior investigators.  Would they have subjected it to the scrutiny needed to detect data manipulation?

I similarly ask reviewers for scientific journals if they would have detected something amiss. Would they have compared the manuscript to the study protocol? Note that when this article was published, they probably would’ve had to contact the authors or the pharmaceutical company.

Welcome to a rich treasure trove

Separate from the civil action that led to these documents and data being released, the federal government later filed criminal charges and false claims act allegations against Forest Laboratories. The pharmaceutical company pleaded guilty and accepted a $313 million fine.

Links to the filing and the announcement from the federal government of a settlement is available in a supplementary blog at Quick Thoughts. That blog post also has rich links to the actual emails accessed by the authors, as well as blog posts by John M Nardo, M.D. that detail the difficulties these authors had publishing the paper we are discussing.

Aside from his popular blog, Dr. Nardo is one of the authors of a reanalysis that was published in The BMJ of a related trial:

Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, Abi-Jaoude E. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ 2015; 351: h4320

My supplementary blog post contains links to discussions of that reanalysis obtained from GlaxoSmithKline, the original publication based on these data, 30 Rapid Responses to the reanalysis The BMJ, as well as federal criminal complaints and the guilty pleading of GlaxoSmithKline.

With Dr. Nardo’s assistance, I’ve assembled a full set of materials that should be valuable in stimulating discussion among senior and junior investigators, as well in student seminars. I agree with Dr. Nardo’s assessment:

I think it’s now our job to insure that all this dedicated work is rewarded with a wide readership, one that helps us move closer to putting this tawdry era behind us…John Mickey Nardo

The citalopram CIT-MD-18 pediatric depression trial

The original article that we will be discussing is:

Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. American Journal of Psychiatry. 2004 Jun 1;161(6):1079-83.

It reports:

An 8-week, randomized, double-blind, placebo-controlled study compared the safety and efficacy of citalopram with placebo in the treatment of children (ages 7–11) and adolescents (ages 12–17) with major depressive disorder.

The results and conclusion:

Results: The overall mean citalopram dose was approximately 24 mg/day. Mean Children’s Depression Rating Scale—Revised scores decreased significantly more from baseline in the citalopram treatment group than in the placebo treatment group, beginning at week 1 and continuing at every observation point to the end of the study (effect size=2.9). The difference in response rate at week 8 between placebo (24%) and citalopram (36%) also was statistically significant. Citalopram treatment was well tolerated. Rates of discontinuation due to adverse events were comparable in the placebo and citalopram groups (5.9% versus 5.6%, respectively). Rhinitis, nausea, and abdominal pain were the only adverse events to occur with a frequency exceeding 10% in either treatment group.

Conclusions: In this population of children and adolescents, treatment with citalopram reduced depressive symptoms to a significantly greater extent than placebo treatment and was well tolerated.

The article ends with an elaboration of what is said in the abstract:

In conclusion, citalopram treatment significantly improved depressive symptoms compared with placebo within 1 week in this population of children and adolescents. No serious adverse events were reported, and the rate of discontinuation due to adverse events among the citalopram-treated patients was comparable to that of placebo. These findings further support the use of citalopram in children and adolescents suffering from major depression.

The study protocol

The protocol for CIT-MD-I8, IND Number 22,368 was obtained from Forest Laboratories. It was dated September 1, 1999 and amended April 8, 2002.

The primary outcome measure was the change from baseline to week 8 on the Children’s Depression Rating Scale-Revised (CDRS-R) total score.

Comparison between citalopram and placebo will be performed using three-way analysis of covariance (ANCOVA) with age group, treatment group and center as the three factors, and the baseline CDRS-R score as covariate.

The secondary outcome measures were the Clinical Global Impression severity and improvement subscales, Kiddie Schedule for Affective Disorders and Schizophrenia – depression module, and Children’s Global Assessment Scale.

Comparison between citalopram and placebo will be performed using the same approach as for the primary efficacy parameter. Two-way ANOVA will be used for CGI-I, since improvement relative to Baseline is inherent in the score.

 There was no formal power analysis but:

The primary efficacy variable is the change from baseline in CDRS-R score at Week 8.

Assuming an effect size (treatment group difference relative to pooled standard deviation) of 0.5, a sample size of 80 patients in each treatment group will provide at least 85% power at an alpha level of 0.05 (two-sided).

The deconstruction

 Selective reporting of subtle departures from the protocol could easily have been missed or simply excused as accidental and inconsequential, except that there was unrestricted access to communication within Forest Laboratories and to the data for reanalysis.

3.2 Data

The fact that Forest controlled the CIT-MD-18 manuscript production allowed for selection of efficacy results to create a favourable impression. The published Wagner et al. article concluded that citalopram produced a significantly greater reduction in depressive symptoms than placebo in this population of children and adolescents [10]. This conclusion was supported by claims that citalopram reduced the mean CDRS-R scores significantly more than placebo beginning at week 1 and at every week thereafter (effect size = 2.9); and that response rates at week 8 were significantly greater for citalopram (36% ) versus placebo (24% ). It was also claimed that there were comparable rates of tolerability and treatment discontinuation for adverse events (citalopram = 5.6% ; placebo = 5.9% ). Our analysis of these data and documents has led us to conclude that these claims were based on a combination of: misleading analysis of the primary outcome and implausible calculation of effect size; introduction of post hoc measures and failure to report negative secondary outcomes; and misleading analysis and reporting of adverse events.

3.2.1 Mischaracterisation of primary outcome

Contrary to the protocol, Forest’s final study report synopsis increased the study sample size by adding eight of nine subjects who, per protocol, should have been excluded because they were inadvertently dispensed unblinded study drug due to a packaging error [23]. The protocol stipulated: “Any patient for whom the blind has been broken will immediately be discontinued from the study and no further efficacy evaluations will be performed” [10]. Appendix Table 6 of the CIT-MD-18 Study Report [24] showed that Forest had performed a primary outcome calculation excluding these subjects (see our Fig. 2). This per protocol exclusion resulted in a ‘negative’ primary efficacy outcome.

Ultimately however, eight of the excluded subjects were added back into the analysis, turning the (albeit marginally) statistically insignificant outcome (p <  0.052) into a statistically significant outcome (p  <  0.038). Despite this change, there was still no clinically meaningful difference in symptom reduction between citalopram and placebo on the mean CDRS-R scores (Fig. 3).

The unblinding error was not reported in the published article.

Forest also failed to follow their protocol stipulated plan for analysis of age-by-treatment interaction. The primary outcome variable was the change in total CDRS-R score at week 8 for the entire citalopram versus placebo group, using a 3-way ANCOVA test of efficacy [24]. Although a significant efficacy value favouring citalopram was produced after including the unblinded subjects in the ANCOVA, this analysis resulted in an age-by-treatment interaction with no significant efficacy demonstrated in children. This important efficacy information was withheld from public scrutiny and was not presented in the published article. Nor did the published article report the power analysis used to determine the sample size, and no adequate description of this analysis was available in either the study protocol or the study report. Moreover, no indication was made in these study documents as to whether Forest originally intended to examine citalopram efficacy in children and adolescent subgroups separately or whether the study was powered to show citalopram efficacy in these subgroups. If so, then it would appear that Forest could not make a claim for efficacy in children (and possibly not even in adolescents). However, if Forest powered the study to make a claim for efficacy in the combined child plus adolescent group, this may have been invalidated as a result of the ANCOVA age-by-treatment interaction and would have shown that citalopram was not effective in children.

A further exaggeration of the effect of citalopram was to report “effect size on the primary outcome measure” of 2.9, which was extraordinary and not consistent with the primary data. This claim was questioned by Martin et al. who criticized the article for miscalculating effect size or using an unconventional calculation, which clouded “communication among investigators and across measures” [25]. The origin of the effect size calculation remained unclear even after Wagner et al. publicly acknowledged an error and stated that “With Cohens method, the effect size was 0.32,” [20] which is more typical of antidepressant trials. Moreover, we note that there was no reference to the calculation of effect size in the study protocol.

3.2.2 Failure to publish negative secondary outcomes, and undeclared inclusion of Post Hoc Outcomes

Wagner et al. failed to publish two of the protocol-specified secondary outcomes, both of which were unfavourable to citalopram. While CGI-S and CGI-I were correctly reported in the published article as negative [10], (see p1081), the Kiddie Schedule for Affective Disorders and Schizophrenia-Present (depression module) and the Children’s Global Assessment Scale (CGAS) were not reported in either the methods or results sections of the published article.

In our view, the omission of secondary outcomes was no accident. On October 15, 2001, Ms. Prescott wrote: “Ive heard through the grapevine that not all the data look as great as the primary outcome data. For these reasons (speed and greater control) I think it makes sense to prepare a draft in-house that can then be provided to Karen Wagner (or whomever) for review and comments” (see Fig. 1). Subsequently, Forest’s Dr. Heydorn wrote on April 17, 2002: “The publications committee discussed target journals, and recommended that the paper be submitted to the American Journal of Psychiatry as a Brief Report. The rationale for this was the following: … As a Brief Report, we feel we can avoid mentioning the lack of statistically significant positive effects at week 8 or study termination for secondary endpoints” [13].

Instead the writers presented post hoc statistically positive results that were not part of the original study protocol or its amendment (visit-by-visit comparison of CDRS-R scores, and ‘Response’, defined as a score of ≤28 on the CDRS-R) as though they were protocol-specified outcomes. For example, ‘Response’ was reported in the results section of the Wagner et al. article between the primary and secondary outcomes, likely predisposing a reader to regard it as more important than the selected secondary measures reported, or even to mistake it for a primary measure.

It is difficult to reconcile what the authors of the original article reported in terms of adverse events and what our “deconstructionists “ found in the unpublished final study report. The deconstruction article also notes that a letter to the editor appearing at the time of publication of the original paper called attention to another citalopram study that remain unpublished, but that was known to be a null study with substantial adverse events.

3.2.3 Mischaracterisation of adverse events

Although Wagner et al. correctly reported that “the rate of discontinuation due to adverse events among citalopram-treated patients was comparable to that of placebo”, the authors failed to mention that the five citalopram-treated subjects discontinuing treatment did so due to one case of hypomania, two of agitation, and one of akathisia. None of these potentially dangerous states of over-arousal occurred with placebo [23]. Furthermore, anxiety occurred in one citalopram patient (and none on placebo) of sufficient severity to temporarily stop the drug and irritability occurred in three citalopram (compared to one placebo). Taken together, these adverse events raise concerns about dangers from the activating effects of citalopram that should have been reported and discussed. Instead Wagner et al. reported “adverse events associated with behavioral activation (such as insomnia or agitation) were not prevalent in this trial” [10] and claimed thatthere were no reports of mania”, without acknowledging the case of hypomania [10].

Furthermore, examination of the final study report revealed that there were many more gastrointestinal adverse events for citalopram than placebo patients. However, Wagner et al. grouped the adverse event data in a way that in effect masked this possibly clinically significantly gastrointestinal intolerance. Finally, the published article also failed to report that one patient on citalopram developed abnormal liver function tests [24].

In a letter to the editor of the American Journal of Psychiatry, Mathews et al. also criticized the manner in which Wagner et al. dealt with adverse outcomes in the CIT-MD-18 data, stating that: “given the recent concerns about the risk of suicidal thoughts and behaviors in children treated with SSRIs, this study could have attempted to shed additional light on the subject” [26] Wagner et al. responded: “At the time the [CIT-MD-18] manuscript was developed, reviewed, and revised, it was not considered necessary to comment further on this topic” [20]. However, concerns about suicidal risk were prevalent before the Wagner et al. article was written and published [27]. In fact, undisclosed in both the published article and Wagner’s letter-to-the-editor, the 2001 negative Lundbeck study had raised concern over heightened suicide risk [10, 20, 21].

A later blog post will discuss the letters to the editor that appeared shortly after the original study in American Journal of Psychiatry. But for now, it would be useful to clarify the status of the negative Lundbeck study at that time.

The letter by Barbe published in AJP  remarked:

It is somewhat surprising that the authors do not compare their results with those of another trial, involving 244 adolescents (13–18-year-olds), that showed no evidence of efficacy of citalopram compared to placebo and a higher level of self-harm (16 [12.9%] of 124 versus nine [7.5%] of 120) in the citalopram group compared to the placebo group (5). Although these data were not available to the public until December 2003, one would expect that the authors, some of whom are employed by the company that produces citalopram in the United States and financed the study, had access to this information. It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

The study authors replied:

It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

Conflict of interest

The authors of the deconstruction study indicate they do not have any conventional industry or speaker’s bureau support to declare, but they have had relevant involvement in litigation. Their disclosure includes:

The authors are not members of any industry-sponsored advisory board or speaker’s bureau, and have no financial interest in any pharmaceutical or medical device company.

Drs. Amsterdam and Jureidini were engaged by Baum, Hedlund, Aristei & Goldman as experts in the Celexa and Lexapro Marketing and Sales Practices Litigation. Dr. McHenry was also engaged as a research consultant in the case. Dr. McHenry is a research consultant for Baum, Hedlund, Aristei & Goldman.

Concluding remarks

I don’t have many illusions about the trustworthiness of the literature reporting clinical trials, whether pharmaceutical or psychotherapy. But I found this deconstruction article quite troubling. Among the authors’ closing observations are:

The research literature on the effectiveness and safety of antidepressants for children and adolescents is relatively small, and therefore vulnerable to distortion by just one or a two badly conducted and/or reported studies. Prescribing rates are high and increasing, so that prescribers who are misinformed by misleading publications risk doing real harm to many children, and wasting valuable health resources.

I recommend readers going to my supplementary blog and reviewing a very similar case of efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. I also recommend another of my blog posts  that summarizes action taken by the US government against both Forest Laboratories and GlaxoSmithKline for promotion of misleading claims about about the efficacy and safety of antidepressants for children and adolescents.

We should scrutinize studies of the efficacy and safety of antidepressants for children and adolescents, because of the weakness of data from relatively small studies with serious difficulties in their methodology and reporting. But we should certainly not stop there. We should critically examine other studies of psychotherapy and psychosocial interventions.

I previously documented [ 1,  2] interference by promoters of the lucrative Triple P Parenting in the implementation of a supposedly independent evaluation of it, including tampering with plans for data analysis. The promoters then followed it up attempting to block publication of a meta-analysis casting doubt on their claims.

But  suppose we are not dealing the threat of conflict of interest associated with high financial stakes as an pharmaceutical companies or a globally promoted psychosocial program. There are still the less clear conflicts associated with investigator egos and the pressures to produce positive results in order to get refunded.  We should require scrutiny of protocols, whether they were faithfully implemented, with the resulting data analyzed according to a priori plans. To do that, we need unrestricted access to data and the opportunity to reanalyze it from multiple perspectives.

Results of clinical trials should be examined wherever possible in replications and extensions in new settings. But this frequently requires resources that are unlikely to be available

We are unlikely ever to see anything for clinical trials resembling the replication initiatives such as the Open Science Collaboration’s (OSC) Replication Project: Psychology. The OSC depends on mass replications involving either samples of college students or recruitment from the Internet. Most of the studies involved in the OSC did not have direct clinical or public health implications. In contrast, clinical trials usually do and require different approaches to insure the trustworthiness of findings that are claimed.

Access to the internal documents of Forest Laboratories revealed a deliberate, concerted effort to produce results consistent with the agenda of vested interests, even where prespecified analyses yielded contradictory findings. There was clear intent. But we don’t need to assume an attempt to deceive and defraud in order to insist on the opportunity to re-examine findings that affect patients and public health. As US Vice President Joseph Biden recently declared, securing advances in biomedicine and public health depends on broad and routine sharing and re-analysis of data.

My usual disclaimer: All views that I express are my own and do not necessarily reflect those of PLOS or other institutional affiliations.

Deep Brain Stimulation: Unproven treatment promoted with a conflict of interest in JAMA: Psychiatry [again]

“Even with our noisy ways and cattle prods in the brain, we have to take care of sick people, now,” – Helen Mayberg

“All of us—researchers, journalists, patients and their loved ones–are desperate for genuine progress in treatments for severe mental illness. But if the history of such treatments teaches us anything, it is that we must view claims of dramatic progress with skepticism, or we will fall prey to false hopes.” – John Horgan

An email alert announced the early release of an article in JAMA: Psychiatry reporting effects of brain stimulation therapy for depression (DBS). The article was accompanied by an editorial commentary.

Oh no! Is an unproven treatment once again being promoted by one of the most prestigious psychiatry journals with an editorial commentary reeking of vested interests?

Indeed it is, but we can use the article and commentary as a way of honing our skepticism about such editorial practices and to learn better where to look to confirm or dispel our suspicions when they arise.

Xray depictionLike many readers of this blog, there was a time when I would turn to a trusted, prestigious source like JAMA: Psychiatry with great expectations. Not being an expert in a particular area like DBS, I would be inclined to accept uncritically what I read. But then I noticed how much of what I read conflicted with what I already knew about research design and basic statistics. Time and time again, this knowledge proved sufficient to detect serious hype, exaggeration, and simply false claims.

The problem was no longer simply one of the authors adopting questionable research practices. It expanded to journals and professional organizations adopting questionable publication practices that fit with financial, political, and other, not strictly scientific agendas.

What is found in the most prestigious biomedical journals is not necessarily the most robust and trustworthy of scientific findings. Rather, content is picked in terms of its ability to be portrayed as innovative and breakthrough medicine. But beyond that, the content is consistent with prevailing campaigns to promote particular viewpoints and themes. There is apparently no restriction on those who might most personally profit being selected for accompanying commentaries.

We need to recognize that editorial commentaries often receive weak or no peer review. Invitations from editors to provide commentaries are often a matter of sharing nonscientific agenda and simple cronyism.

Coming to these conclusions, I have been on a mission to learn better how to detect hype and hokum and I have invited readers of my blog posts to come along.

This installment builds on my recent discussion of an article claiming remission of suicidal ideation by magnetic seizure therapy. Like the editorial commentary accompanying previous JAMA: Psychiatry article, the commentary discussed here had an impressive conflict of interest disclosure. The disclosure probably would not have prompted me to search on the Internet for other material about one of the authors. Yet, a search revealed some information that is quite relevant to our interpretation of the new article and its commentary.  We can ponder whether this information should have been withheld. I think it should have been disclosed.

The lesson that I learned is a higher level of vigilance is needed to interpret highly touted article-commentary combos in prestigious journals. Unless we are going to simply dismiss them as advertisements or propaganda, rather than a highlighting of solid biomedical science.

Sadly, though, this exercise convinced me that efforts to scrutinize claims by turning to seemingly trustworthy supplementary sources can provide a misleading picture.

The article under discussion is:

Bergfeld IO, Mantione M, Hoogendoorn MC, et al. Deep Brain Stimulation of the Ventral Anterior Limb of the Internal Capsule for Treatment-Resistant Depression: A Randomized Clinical Trial. JAMA Psychiatry. Published online April 06, 2016. doi:10.1001/jamapsychiatry.2016.0152.

The commentary is:

Mayberg HS, Riva-Posse P, Crowell AL. Deep Brain Stimulation for Depression: Keeping an Eye on a Moving Target. JAMA Psychiatry. Published online April 06, 2016. doi:10.1001/jamapsychiatry.2016.0173.

The trial registration is

Deep Brain Stimulation in Treatment-refractory patients with Major Depressive Disorder.

Pursuing my skepticism by searching on the Internet, I immediately discovered a series of earlier blog posts about DBS by Neurocritic [1] [2] [3] that saved me a lot of time and directed me to still other useful sources. I refer to what I learned from Neurocritic in this blog post. But as always, all opinions are entirely my responsibility, along with misstatements and any inaccuracies.

But what I learned from immediately from Neurocritic is that BSD is a hot area of research, even if it continues to produce disappointing outcomes.

DBS had a commitment of $70 million from President Obama’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative . Premised on the causes of psychopathology being in precise, isolated neural circuitry, it is the poster children of the Research Domain Criteria (RDoC) of former NIMH director Thomas Insel. Neurocritic points to Insel promotion of “electroceuticals” like DBS in his NIMH Director’s Blog 10 Best of 2013:

The key concept: if mental disorders are brain circuit disorders, then successful treatments need to tune circuits with precision. Chemicals may be less precise than electrical or cognitive interventions that target specific circuits.

The randomized trial of deep brain stimulation for depression.

The objective of the trial was:

To assess the efficacy of DBS of the ventral anterior limb of the internal capsule (vALIC), controlling for placebo effects with active and sham stimulation phases.

Inclusion criteria were a diagnosis of major depressive disorder designated as being treatment resistant (TRD) on the basis of

A failure of at least 2 different classes of second-generation antidepressants (eg, selective serotonin reuptake inhibitor), 1 trial of a tricyclic antidepressant, 1 trial of a tricyclic antidepressant with lithium augmentation, 1 trial of a monoamine oxidase inhibitor, and 6 or more sessions of bilateral electroconvulsive therapy.

Twenty-five patients with TRD from 2 Dutch hospitals first received surgery that implanted four contact electrodes deep within their brains. The electrodes were attached to tiny wires leading to a battery-powered pulse generator implanted under their collar bones.

The standardized DBS treatment started after a three-week recovery from the surgery. Brain stimulation was continuous one week after surgery, but at three weeks, patients begin visits with psychiatrists or psychologists on what was at first a biweekly basis, but later less frequently.

deep brain stimulation teamAt the visits, level of depression was assessed and adjustments were made to various parameters of the DBS, such as the specific site targeted in the brain, voltage, and pulse  frequency and amplitude. Treatment continued until optimization – either four weeks of sustained improvement on depression rating scales or the end of the 52 week period. In the original protocol, this this phase of the study was limited to six months, but was extended after experience with a few patients. Six patients went even longer than the 52 weeks to achieve optimization.

Once optimization was achieved, patients were randomized to a crossover phase in which they received two blocks of six weeks of either continued active or sham treatment that involved simply turning off the stimulation. Outcomes were classified in terms of investigator-rated changes in the 17-item Hamilton Depression Rating Scale.

The outcome of the open-label phase of the study was the change of the investigator-rated HAM-D-17 score (range, 0-52) from baseline to T2. In addition, we classified patients as responders (≥50% reduction of HAM-D-17 score at T2 compared with baseline) or nonresponders (<50% reduction of HAM-D-17 score atT2 compared with baseline). Remission was defined as a HAM-D-17 score of 7 or less at T2. The primary outcome measure of the randomized, double-blind crossover trial was the difference in HAM-D-17 scores between the active and sham stimulation phases. In a post hoc analysis, we tested whether a subset of nonresponders showed a partial response (≥25% but <50% reduction of HAM-D-17 score at T2 compared with baseline).


Clinical outcomes. The mean time to first response in responders was 53.6 (50.6) days (range, 6-154 days) after the start of treatment optimization. The mean HAM-D-17 scores decreased from 22.2 (95%CI, 20.3-24.1) at baseline to 15.9 (95% CI, 12.3-19.5) at T2.

An already small sample shrank further from initial assessment of eligibility until retention at the end of the cross over study. Of the 52 patients assessed for eligibility, 23 were in eligible and four refused. Once the optimization phase of the trial started, four patients withdrew for lack of effect. Another five could not be randomized in the crossover phase, three because of an unstable psychiatric status, one because of fear of worsening symptoms, and one because of their physical health. So, the randomized phase of the trial consisted of nine patients randomized to the active treatment and then the sham and another seven patients randomized to the sham and then active treatment.

The crossover to sham treatment did not go as planned. Of the nine (three responders and six nonresponders) randomized to the active-then-sham condition, all had to be crossed over early – one because the patient requested a crossover, two because of a gradual increase in symptoms, and three because of logistics. Of the seven patients assigned to sham- first (four responders and three nonresponders), all had to be crossed over within a day because of increases in symptoms.

I don’t want to get lost in the details here. But we are getting into small numbers with nonrandom attrition, imbalanced assignment of responders versus nonresponders in the randomization, and the breakdown of the planned sham treatment. From what I’ve read elsewhere about DBS, I don’t think that providers or patients were blinded to the sham treatment. Patients should be able to feel the shutting off of the stimulator.

Adverse events. DBS has safety issues. Serious adverse events included severe nausea during surgery (1 patient), suicide attempt (4 patients), and suicidal ideation (2 patients). Two nonresponders died several weeks after they withdrew from the study and DBS had been stopped (1 suicide, 1 euthanasia). Two patients developed full blown mania during treatment and another patient became hypomanic.

The article’s Discussion claims

We found a significant reduction of depressive symptoms following vALIC DBS, resulting in response in 10 patients (40%) and partial response in 6 (24%) patients with TRD.

Remission was achieved in 5 (20%) patients. The randomized active-sham phase study design indicates that reduction of depressive symptoms cannot be attributed to placebo effects…


This trial shows efficacy of DBS in patients with TRD and supports the possible benefits of DBS despite a previous disappointing randomized clinical trial. Further specification of targets and the most accurate setting optimization as well as larger randomized clinical trials are necessary.

A clinical trial with starting with 25 patients does not have much potential to shift our confidence in the efficacy of DBS. Any hope of doing so was further dashed when the sample was reduced to 17 patients who remained for the investigators’ attempted randomization to an active treatment versus sham comparison (seven responders and nine nonresponders). Then sham condition could not be maintained as planed in the protocol for any patients.

The authors interpreted the immediate effects of shifting to sham treatment as ruling out any placebo effect. However, it’s likely that shutting off the stimulator was noticeable to the patients and the immediacy of effects speaks to likelihood an effect due to the strong expectations of patients with intolerable depression having their hope taken away. Some of the immediate response could’ve been a nocebo response.

Helen Mayberg and colleagues’ invited commentary

The commentary attempted to discourage a pessimistic assessment of DBS based on the difficulties implementing the original plans for the study as described in the protocol.

A cynical reading of the study by Bergfeld et al1 might lead to the conclusion that the labor-intensive and expert-driven tuning of the DBS device required for treatment response makes this a nonviable clinical intervention for TRD. On the contrary, we see a tremendous opportunity to retrospectively characterize the various features that best define patients who responded well to this treatment. New studies could test these variables prospectively.

The substantial deviation from protocol that occurred after only two patients were entered into the trial was praised in terms of the authors’ “tenacious attempts to establish a stable response”:

We appreciate the reality of planning a protocol with seemingly conservative time points based on the initial patients, only to find these time points ultimately to be insufficient. The authors’ tenacious attempts to establish a stable response by extending the optimization period from the initial protocol using 3 to 6 months to a full year is commendable and provides critical information for future trials.

Maybe, but I think the need for this important change, along with the other difficulties that were encountered in implementing the study, speak to a randomized controlled trial of DBS being premature.

Conflict of Interest Disclosures: Dr Mayberg has a paid consulting agreement with St Jude Medical Inc, which licensed her intellectual property to develop deep brain stimulation for the treatment of severe depression (US 2005/0033379A1). The terms of this agreement have been reviewed and approved by Emory University in accordance with their conflict of interest policies. No other disclosures were reported.

Helen Mayberg’s declaration of interest clearly identifies her as someone who is not a detached observer, but who would benefit financially and professionally from any strengthening the claims for the efficacy of DBS. We are alerted by this declaration, but I think there were some things that were not mentioned in the article or editorial about Helen Mayberg’s work that would influence her credibility even more if they were known.

Helen Mayberg’s anecdotes and statistics about the success of DBS

Mayberg has been attracting attention for over a decade with her contagious exuberance for DBS. A 2006 article in the New York Times by David Dobbs started with a compelling anecdote of one of Mayberg’s patients being able to resume a normal life after previous ineffective treatments for severe depression. The story reported the success with 8 of12 patients treated with DBS:

They’ve re-engaged their families, resumed jobs and friendships, started businesses, taken up hobbies old and new, replanted dying gardens. They’ve regained the resilience that distinguishes the healthy from the depressed.

Director of NIMH Tom Insel chimed in:

“People often ask me about the significance of small first studies like this,” says Dr. Thomas Insel, who as director of the National Institute of Mental Health enjoys an unparalleled view of the discipline. “I usually tell them: ‘Don’t bother. We don’t know enough.’ But this is different. Here we know enough to say this is something significant. I really do believe this is the beginning of a new way of understanding depression.”

A 2015 press release from Emory University, Targeting depression with deep brain stimulation, gives another anecdote of a dramatic treatment success.

Okay, we know to be skeptical about University press releases, but then there are the dramatic anecdotes and even numbers in a news article in Science, Short-Circuiting Depression that borders on an infomercial for Mayberg’s work.

short-circuiting depression

Since 2003, Mayberg and others have used DBS in area 25 to treat depression in more than 100 patients. Between 30% and 40% of patients do “extremely well”—getting married, going back to work, and reclaiming their lives, says Sidney Kennedy, a psychiatrist at Toronto General Hospital in Canada who is now running a DBS study sponsored by the medical device company St. Jude Medical. Another 30% show modest improvement but still experience residual depression. Between 20% and 25% do not experience any benefit, he says. People contemplating brain surgery might want better odds, but patients with extreme, relentless depression often feel they have little to lose. “For me, it was a last resort,” Patterson says.

By making minute adjustments in the positions of the electrodes, Mayberg says, her team has gradually raised its long-term response rates to 75% to 80% in 24 patients now being treated at Emory University.

A chronically depressed person or someone who cares for someone who is depressed might be motivated to go on the Internet and try to find more information about Mayberg’s trial. A website for Mayberg’s BROADEN (BROdmann Area 25 DEep brain Neuromodulation) study once provided a description of the study, answers to frequently asked questions, and an opportunity to register for screening for the study. However, it’s no longer accessible through Google or other search engines. But you can reach an archived website with a link provided by Neurocritic, but the click links are no longer functional.

Neurocritic’s blog posts about Mayberg and DBS

If you are lucky, a Google search for Mayberg deep brain stimulation, might bring you to any of three blog posts by Neurocritic [1] [2] [3] that have rich links and provide a very different story of Mayberg and DBS.

One link takes you to the trial registration for Mayberg’s BROADEN study: A Clinical Evaluation of Subcallosal Cingulate Gyrus Deep Brain Stimulation for Treatment-Resistant Depression. The updated file registration indicates that the study will end in September 2017, and that the study is ongoing but not recruiting participants.

This information should have been updated, as should other publicity about Mayberg’s BROADEN study. Namely, as Neurocritic documents, the company attempting to commercialize DBS by funding the study, St. Jude Medical terminated after futility analyses indicated that further enrollment of patients had only a 17% probability of achieving a significant effect. At the point of terminating the trial, 125 patients had been role.

Neurocritic also provides a link to an excellent, open access review paper:

Morishita T, Fayad SM, Higuchi MA, Nestor KA, Foote KD. Deep brain stimulation for treatment-resistant depression: systematic review of clinical outcomes. Neurotherapeutics. 2014 Jul 1;11(3):475-84.

The article reveals that although there are 22 published studies of DBS for treatment-resistant depression, only three are randomized trials, one of which was completed with null results. Two – including Mayberg’s BROADEN trial – were discontinued because futility analyses indicate that a finding of efficacy for the treatment was unlikely.

Finally, Neurocritic  also provides a link to a Neurotech Business Report, Depressing Innovation:

The news that St. Jude Medical failed a futility analysis of its BROADEN trial of DBS for treatment of depression cast a pall over an otherwise upbeat attendance at the 2013 NANS meeting [see Conference Report, p7]. Once again, the industry is left to pick up the pieces as a promising new technology gets set back by what could be many years.

It’s too early to assess blame for this failure. It’s tempting to wonder if St. Jude management was too eager to commence this trial, since that has been a culprit in other trial failures. But there’s clearly more involved here, not least the complexity of specifying the precise brain circuits involved with major depression. Indeed, Helen Mayberg’s own thinking on DBS targeting has evolved over the years since the seminal paper she and colleague Andres Lozano published in Neuron in 2005, which implicated Cg25 as a lucrative target for depression. Mayberg now believes that neuronal tracts emanating from Cg25 toward medial frontal areas may be more relevant [NBR Nov13 p1]. Research that she, Cameron McIntyre, and others are conducting on probabilistic tractography to identify the patient-specific brain regions most relevant to the particular form of depression the patient is suffering from will likely prove to be very fruitful in the years ahead.

So, we have a heavily hyped unproven treatment for which the only clinical trials have either been null or terminated following a futility analysis. Helen Mayberg, a patent holder associated with one of these trials was inappropriate to be recruited for commentary on another, more modestly sized trial that also ran into numerous difficulties that can be taken to suggest it was premature. However, I find it outrageous that so little effort has been made to correct the record concerning her BROADEN trial or even to acknowledge its closing in the JAMA: Psychiatry commentary.

Untold numbers of depressed patients who don’t get expected benefits from available treatments are being misled with false hope from anecdotes and statistics from a trial that was ultimately terminated.

I find troubling what my exercise showed might happen when someone who is motivated by the skepticism goes to the Internet and tries to get additional information about the JAMA: Psychiatry paper. They could be careful to rely on only seemingly credible sources – a trial registration and a Science article.  The Science article is not peer-reviewed but nonetheless has a credibility conveyed appearing in the premier and respected Science. The trial registration has not been updated with valuable information and the Science article gives no indication how it is contradicted by better quality evidence. So, they would be misled.



Remission of suicidal ideation by magnetic seizure therapy? Neuro-nonsense in JAMA: Psychiatry

A recent article in JAMA: Psychiatry:

Sun Y, Farzan F, Mulsant BH, Rajji TK, Fitzgerald PB, Barr MS, Downar J, Wong W, Blumberger DM, Daskalakis ZJ. Indicators for remission of suicidal ideation following magnetic seizure therapy in patients with treatment-resistant depression. JAMA Psychiatry. 2016 Mar 16.

Was accompanied by an editorial commentary:

Camprodon JA, Pascual-Leone A. Multimodal Applications of Transcranial Magnetic Stimulation for Circuit-Based Psychiatry. JAMA: Psychiatry. 2016 Mar 16.

Together both the article and commentary can be studied as:

  • An effort by the authors and the journal itself to promote prematurely a treatment for reducing suicide.
  • A pay back to sources of financial support for the authors. Both groups have industry ties that provide them with consulting fees, equipment, grants, and other unspecified rewards. One author has a patent that should increase in value as result of this article and commentary.
  • A bid for successful applications to new grant initiatives with a pledge of allegiance to the NIMH Research Domain Criteria (RDoC).

After considering just how bad the science and reporting:

We have sufficient reason to ask how did this promotional campaign come about? Why was this article accepted by JAMA:Psychiatry? Why was it deemed worthy of comment?

I think a skeptical look at this article would lead to a warning label:

exclamation pointWarning: Results reported in this article are neither robust nor trustworthy, but considerable effort has gone into promoting them as innovative and even breakthrough. Skepticism warranted.

As we will see, the article is seriously flawed as a contribution to neuroscience, identification of biomarkers, treatment development, and suicidology, but we can nonetheless learn a lot from it in terms of how to detect such flaws when they are more subtle. If nothing else, your skepticism will be raised about articles accompanied by commentaries in prestigious journals and you will learn tools for probing such pairs of articles.


This article involves intimidating technical details and awe-inspiring figures.

figure 1 picture onefigure 1 picture two










Yet, as in some past blog posts concerning neuroscience and the NIMH RDoC, we will gloss over some technical details, which would be readily interpreted by experts. I would welcome the comments and critiques from experts.

I nonetheless expect readers to agree when they have finished this blog post that I have demonstrated that you don’t have to be an expert to detect neurononsense and crass publishing of articles that fit vested interests.

The larger trial from which these patients is registered as:

ClinicalTrials.gov. Magnetic Seizure Therapy (MST) for Treatment Resistant Depression, Schizophrenia, and Obsessive Compulsive Disorder. NCT01596608.

Because this article is strikingly lacking in crucial details or details in places where we would expect to find them, it will be useful at times to refer to the trial registration.

The title and abstract of the article

As we will soon see, the title, Indicators for remission of suicidal ideation following MST in patients with treatment-resistant depression is misleading. The article has too small sample and too inappropriate a design to establish anything as a reproducible “indicator.”

That the article is going to fail to deliver is already apparent in the abstract.

The abstract states:

 Objective  To identify a biomarker that may serve as an indicator of remission of suicidal ideation following a course of MST by using cortical inhibition measures from interleaved transcranial magnetic stimulation and electroencephalography (TMS-EEG).

Design, Setting, and Participants  Thirty-three patients with TRD were part of an open-label clinical trial of MST treatment. Data from 27 patients (82%) were available for analysis in this study. Baseline TMS-EEG measures were assessed within 1 week before the initiation of MST treatment using the TMS-EEG measures of cortical inhibition (ie, N100 and long-interval cortical inhibition [LICI]) from the left dorsolateral prefrontal cortex and the left motor cortex, with the latter acting as a control site.

Interventions The MST treatments were administered under general anesthesia, and a stimulator coil consisting of 2 individual cone-shaped coils was used.

Main Outcomes and Measures Suicidal ideation was evaluated before initiation and after completion of MST using the Scale for Suicide Ideation (SSI). Measures of cortical inhibition (ie, N100 and LICI) from the left dorsolateral prefrontal cortex were selected. N100 was quantified as the amplitude of the negative peak around 100 milliseconds in the TMS-evoked potential (TEP) after a single TMS pulse. LICI was quantified as the amount of suppression in the double-pulse TEP relative to the single-pulse TEP.

Results  Of the 27 patients included in the analyses, 15 (56%) were women; mean (SD) age of the sample was 46.0 (15.3) years. At baseline, patients had a mean SSI score of 9.0 (6.8), with 8 of 27 patients (30%) having a score of 0. After completion of MST, patients had a mean SSI score of 4.2 (6.3) (pre-post treatment mean difference, 4.8 [6.7]; paired t26 = 3.72; P = .001), and 18 of 27 individuals (67%) had a score of 0 for a remission rate of 53%. The N100 and LICI in the frontal cortex—but not in the motor cortex—were indicators of remission of suicidal ideation with 89% accuracy, 90% sensitivity, and 89% specificity (area under the curve, 0.90; P = .003).

Conclusions and Relevance  These results suggest that cortical inhibition may be used to identify patients with TRD who are most likely to experience remission of suicidal ideation following a course of MST. Stronger inhibitory neurotransmission at baseline may reflect the integrity of transsynaptic networks that are targeted by MST for optimal therapeutic response.

Even viewing the abstract alone, we can see this article is in trouble. It claims to identify a biomarker following a course of magnet seizure therapy (MST) ]. That is an extraordinary claim when a study only started with 33 patients of whom only 27 remain for analysis. Furthermore, at the initial assessment of suicidal ideation, eight of the 27 patients did not have any and so could show no benefit of treatment.

Any results could be substantially changed with any of the four excluded patients being recovered for analysis and any of the 27 included patients being dropped from analyses as an outlier. Statistical controls to control for potential confounds will produce spurious results because of overfit equations ] with even one confound. We also know well that in situation requiring control of possible confounding factors, control of only one is really sufficient and often produces worse results than leaving variables unadjusted.

Identification of any biomarkers is unlikely to be reproducible in larger more representative samples. Any claims of performance characteristics of the biomarkers (accuracy, sensitivity, specificity, area under the curve) are likely to capitalize on sampling and chance in ways that are unlikely to be reproducible.

Nonetheless, the accompanying figures are dazzling, even if not readily interpretable or representative of what would be found in another sample.

Comparison of the article to the trial registration.

According to the trial registration, the study started in February 2012 and the registration was received in May 2012. There were unspecified changes as recently as this month (March 2016), and the study is expected to and final collection of primary outcome data is in December 2016.

Primary outcome

The registration indicates that patients will have been diagnosed with severe major depression, schizophrenia or obsessive compulsive disorder. The primary outcome will depend on diagnosis. For depression it is the Hamilton Rating Scale for Depression.

There is no mention of suicidal ideation as either a primary or secondary outcome.

Secondary outcomes

According to the registration, outcomes include (1) cognitive functioning as measured by episodic memory and non-memory cognitive functions; (2) changes in neuroimaging measures of brain structure and activity derived from fMRI and MRI from baseline to 24th treatment or 12 weeks, whichever comes sooner.

Comparison to the article suggests some important neuroimaging assessment proposed in the registration were compromised. (1) only baseline measures were obtained and without MRI or fMRI; and (2) the article states

Although magnetic resonance imaging (MRI)–guided TMS-EEG is more accurate than non–MRI-guided methods, the added step of obtaining an MRI for every participant would have significantly slowed recruitment for this study owing to the pressing

need to begin treatment in acutely ill patients, many of whom were experiencing suicidal ideation. As such, we proceeded with non–MRI-guided TMS-EEG using EEG-guided methods according to a previously published study.


magnetic seizure therapyThe article provides some details of the magnetic seizure treatment:

The MST treatments were administered under general anesthesia using a stimulator machine (MagPro MST; MagVenture) with a twin coil. Methohexital sodium (n = 14), methohexital with remifentanil hydrochloride (n = 18), and ketamine hydrochloride (n = 1) were used as the anesthetic agents. Succinylcholine chloride was used as the neuromuscular blocker. Patients had a mean (SD) seizure duration of 45.1 (21.4) seconds. The twin coil consists of 2 individual cone-shaped coils. Stimulation was delivered over the frontal cortex at the midline position directly over the electrode Fz according to the international 10-20 system.36 Placing the twin coil symmetrically over electrode Fz results in the centers of the 2 coils being over F3 and F4. Based on finite element modeling, this configuration produces a maximum induced electric field between the 2 coils, which is over electrode Fz in this case.37 Patients were treated for 24 sessions or until remission of depressive symptoms based on the 24-item Hamilton Rating Scale for Depression (HRSD) (defined as an HRSD-24 score ≤10 and 60% reduction in symptoms for at least 2 days after the last treatment).38 These remission criteria were standardized from previous ECT depression trials.39,40 Further details of the treatment protocol are available,30 and comprehensive clinical and neurophysiologic trial results will be reported separately.

The article intended to refer the reader to the trial registration for further description of treatment, but the superscript citation in the article is inaccurate. Regardless, given other deviations from registration, readers can’t tell whether any deviations from what was proposed. In in the registration, seizure therapy was described as involving:

100% machine output at between 25 and 100 Hz, with coil directed over frontal brain regions, until adequate seizure achieved. Six treatment sessions, at a frequency of two or three times per week will be administered. If subjects fail to achieve the pre-defined criteria of remission at that point, the dose will be increased to the maximal stimulator output and 3 additional treatment sessions will be provided. This will be repeated a total of 5 times (i.e., maximum treatment number is 24). 24 treatments is typically longer that a conventional ECT treatment course.

One important implication is for this treatment being proposed as resolving suicidal ideation. It takes place over a considerable period of time. Patients who die by suicide notoriously break contact before doing so. It would seem that a required 24 treatments delivered on an outpatient basis would provide ample opportunities for breaks – including demoralization because so many treatments are needed in some cases – and therefore death by suicide

But a protocol that involves continuing treatment until a prespecified reduction in the Hamilton Depression Rating Scale is achieved assures that there will be a drop in suicidal ideation. The interview-based Hamilton depression rating scales and suicidal ideation are highly correlated.

eeg-electroencephalogrphy-250x250There is no randomization or even adequate description of patient accrual in terms of the population from which the patients came. There is no control group and therefore no control for nonspecific factors. The patients are being subject to an elaborate, intrusive ritual In terms of nonspecific effects. The treatment involves patients in an elaborate ritual, starting with electroencephalographic (EEG) assessment [http://www.mayoclinic.org/tests-procedures/eeg/basics/definition/prc-20014093].

The ritual will undoubtedly will undoubtedly have strong nonspecific factors associated with it – instilling a positive expectations and considerable personal attention.

The article’s discussion of results

The discussion opens with some strong claims, unjustified by the modesty of the study and the likelihood that its specific results are not reproducible:

We found that TMS-EEG measures of cortical inhibition (ie, the N100 and LICI) in the frontal cortex, but not in the motor cortex, were strongly correlated with changes in suicidal ideation in patients with TRD who were treated with MST. These findings suggest that patients who benefitted the most from MST demonstrated the greatest cortical inhibition at baseline. More important, when patients were divided into remitters and nonremitters based on their SSI score, our results show that these measures can indicate remission of suicidal ideation from a course of MST with 90% sensitivity and 89% specificity.

Pledge of AllegianceThe discussion contains a Pledge of Allegiance to the research domain criteria approach that is not actually a reflection of the results at hand. Among the many things that we knew before the study was done and that was not shown by the study, is to suicidal ideation is so hopelessly linked to hopelessness, negative affect, and attentional biases, that in such a situation is best seen as a surrogate measure of depression, rather than a marker for risk of suicidal acts or death by suicide.



Wave that RDoC flag and maybe you will attract money from NIMH.

Our results also support the research domain criteria approach, that is, that suicidal ideation represents a homogeneous symptom construct in TRD that is targeted by MST. Suicidal ideation has been shown to be linked to hopelessness, negative affect, and attentional biases. These maladaptive behaviors all fall under the domain of negative valence systems and are associated with the specific constructs of loss, sustained threat, and frustrative nonreward. Suicidal ideation may represent a better phenotype through which to understand the neurobiologic features of mental illnesses.In this case, variations in GABAergic-mediated inhibition before MST treatment explained much of the variance for improvements in suicidal ideation across individuals with TRD.

Debunking ‘a better phenotype through which to understand the neurobiologic features of mental illnesses.’

  • Suicide is not a disorder or a symptom, but an infrequent, difficult to predict and complex act that varies greatly in nature and circumstances.
  • While some features of a brain or brain functioning may be correlated with eventual death by suicide, most identifications they provide of persons at risk to eventually die by suicide will be false positives.
  • In the United States, access to a firearm is a reliable proximal cause of suicide and is likely to be more so than anything in the brain. However, this basic observation is not consistent with American politics and can lead to grant applications not being funded.

In an important sense,

  • It’s not what’s going on in the brain, but what’s going in the interpersonal context of the brain, in terms of modifiable risk for death by suicide.

The editorial commentary

On the JAMA: Psychiatry website, both the article and the editorial commentary contain sidebar links to each other. Is only in the last two paragraphs of a 14 paragraph commentary that the target article is mentioned. However, the commentary ends with a resounding celebration of the innovation this article represents [emphasis added]:

Sun and colleagues10 report that 2 different EEG measures of cortical inhibition (a negative evoked potential in the EEG that happens approximately 100 milliseconds after a stimulus or event of interest and long-interval cortical inhibition) evoked by TMS to the left dorsolateral prefrontal cortex, but not to the left motor cortex, predicted remission of suicidal ideation with great sensitivity and specificity. This study10 illustrates the potential of multimodal TMS to study physiological properties of relevant circuits in neuropsychiatric populations. Significantly, it also highlights the anatomical specificity of these measures because the predictive value was exclusive to the inhibitory properties of prefrontal circuits but not motor systems.

Multimodal TMS applications allow us to study the physiology of human brain circuitry noninvasively and with causal resolution, expanding previous motor applications to cognitive, behavioral, and affective systems. These innovations can significantly affect psychiatry at multiple levels, by studying disease-relevant circuits to further develop systems for neuroscience models of disease and by developing tools that could be integrated into clinical practice, as they are in clinical neurophysiology clinics, to inform decision making, the differential diagnosis, or treatment planning.

Disclosures of conflicts of interest

The article’s disclosure of conflicts of interest statement is longer than the abstract.

conflict of interest disclosure

The disclosure for the conflicts of interest for the editorial commentary is much shorter but nonetheless impressive:

editorial commentary disclosures

How did this article get into JAMA: Psychiatry with an editorial comment?

Editorial commentaries are often provided by reviewers who either simply check the box on the reviewers’ form indicating their willingness to provide a comment. For reviewers who already have a conflict of interest, this provides an additional one: a non-peer-reviewed paper in which they can promote their interest.

Alternatively, commentators are simply picked by an editor who judges an article to be noteworthy of special recognition. It’s noteworthy that at least one of the associate editors of JAMA: Psychiatry is actively campaigning for a particular direction to suicide research funded by NIMH as seen in an editorial comment of his own that I recently discussed. One of the authors of this paper currently under discussion was until recently a senior member of this associate editor’s department, before departing to become Chair of the Department of Psychiatry at University of Toronto.

Essentially the authors of the paper and the authors of the commentary of providing carefully constructed advertisers for themselves and their agenda. The opportunity for them to do so is because of consistency with the agenda of at least one of the editors, if not the journal itself.

The Committee on Publication Ethics (COPE)   requires that non-peer-reviewed material in ostensibly peer reviewed journals be labeled as such. This requirement is seldom met.

The journal further promoted this article by providing 10 free continuing medical education credits for reading it.

I could go on much longer identifying other flaws in this paper and its editorial commentary. I could raise other objections to the article being published in JAMA:Psychiatry. But out of mercy for the authors, the editor, and my readers, I’ll stop here.

I would welcome comments about other flaws.

Special thanks to Bernard “Barney” Carroll for his helpful comments and encouragement, but all opinions expressed and all factual errors are my own responsibility.