Creating illusions of wondrous effects of yoga and meditation on health: A skeptic exposes tricks

The tour of the sausage factory is starting, here’s your brochure telling you’ll see.

 

A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.

What Is the Molecular Signature of Mind–Body Interventions? A Systematic Review of Gene Expression Changes Induced by Meditation and Related Practices.  Frontiers in Immunology. 2017;8.

Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release.  Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:

“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”

And

“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”

[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]

Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.

Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.

This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.

Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring  confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments.  Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:

More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.

Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review.  But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.

Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]

As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA.  We can be no more confident with what the trials provide than we would be without them ever having been done.

I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.

I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.

Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.

The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.

One giveaway is how the introductions  referred to the biggest, most badass, comprehensive and well-done review, that of Goyal and colleagues.

That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored.  If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing.  Only a handful  of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.

One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?

For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.

Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.

Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.

If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:

In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).

Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.

Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.

Some general rules:

  1. Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.

Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.

When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!

Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.

Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up.  If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.

  1. Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.

What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.

For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.

I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.

  1. Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.

In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants.  A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.

  1. Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.

Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).

  1. Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or  browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
  1. Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.

Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.

But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.

  1. Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
  1. Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.

Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.”  Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and  . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.

Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of  altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:

One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.

How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?

 

 

Unintended consequences of universal mindfulness training for schoolchildren?

the mindful nationThis is the first installment of what will be a series of occasional posts about the UK Mindfulness All Party Parliamentary Group report,  Mindful Nation.

  • Mindful Nation is seriously deficient as a document supposedly arguing for policy based on evidence.
  • The professional and financial interests of lots of people involved in preparation of the document will benefit from implementation of its recommendations.
  • After an introduction, I focus on two studies singled in Mindful Nation out as offering support for the benefits of mindfulness training for school children.
  • Results of the group’s cherrypicked studies do not support implementation of mindfulness training in the schools, but inadvertently highlight some issues.
  • Investment in universal mindfulness training in the schools is unlikely to yield measurable, socially significant results, but will serve to divert resources from schoolchildren more urgently in need of effective intervention and support.
  • Mindfulness Nation is another example of  delivery of  low intensity  services to mostly low risk persons to the detriment of those in greatest and most urgent need.

The launch event for the Mindful Nation report billed it as the “World’s first official report” on mindfulness.

Mindful Nation is a report written by the UK Mindfulness All-Party Parliamentary Group.

The Mindfulness All-Party Parliamentary Group (MAPPG)  was set up to:

  • review the scientific evidence and current best practice in mindfulness training
  • develop policy recommendations for government, based on these findings
  • provide a forum for discussion in Parliament for the role of mindfulness and its implementation in public policy.

The Mindfulness All-Party Parliamentary Group describes itself as

Impressed by the levels of both popular and scientific interest, and launched an inquiry to consider the potential relevance of mindfulness to a range of urgent policy challenges facing government.

Don’t get confused by this being a government-commissioned report. The report stands in sharp contrast to one commissioned by the US government in terms of unbalanced constitution of the committee undertaking the review, and lack  of transparency in search for relevant literature,  and methodology for rating and interpreting of the quality of available evidence.

ahrq reportCompare the claims of Mindful Nation to a comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ) that reviewed 18,753 citations, and found only 47 trials (3%) that included an active control treatment. The vast majority of studies available for inclusion had only a wait list or no-treatment control group and so exaggerated any estimate of the efficacy of mindfulness.

Although the US report was available to those  preparing the UK Mindful Nation report, no mention is made of either the full contents of report or a resulting publication in a peer-reviewed journal. Instead, the UK Mindful Nation report emphasized narrative and otherwise unsystematic reviews, and meta-analyses not adequately controlling for bias.

When the abridged version of the AHRQ report was published in JAMA: Internal Medicine, an accompanying commentary raises issues even more applicable to the Mindful Nation report:

The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?

The members of the UK Mindfulness All-Party Parliamentary Group were selected for their positive attitude towards mindfulness. The collection of witnesses they called to hearings were saturated with advocates of mindfulness and those having professional and financial interests in arriving at a positive view. There is no transparency in terms of how studies or testimonials were selected, but the bias is notable. Many of the scientific studies were methodologically poor, if there was any methodology at all. Many were strongly stated, but weakly substantiated opinion pieces. Authors often included those having  financial interests in obtaining positive results, but with no acknowledgment of conflict of interest. The glowing testimonials were accompanied by smiling photos and were unanimous in their praise of the transformative benefits of mindfulness.

As Mark B. Cope and David B. Allison concluded about obesity research, such a packing of the committee and a highly selective review of the literature leads to a ”distortion of information in the service of what might be perceived to be righteous ends.” [I thank Tim Caulfield for calling this quote to my attention].

Mindfulness in the schools

The recommendations of Mindfulness Nation are

  1. The Department for Education (DfE) should designate, as a first step, three teaching schools116 to pioneer mindfulness teaching,co-ordinate and develop innovation, test models of replicability and scalability and disseminate best practice.
  2. Given the DfE’s interest in character and resilience (as demonstrated through the Character Education Grant programme and its Character Awards), we propose a comparable Challenge Fund of £1 million a year to which schools can bid for the costs of training teachers in mindfulness.
  3. The DfE and the Department of Health (DOH) should recommend that each school identifies a lead in schools and in local services to co-ordinate responses to wellbeing and mental health issues for children and young people117. Any joint training for these professional leads should include a basic training in mindfulness interventions.
  4. The DfE should work with voluntary organisations and private providers to fund a freely accessible, online programme aimed at supporting young people and those who work with them in developing basic mindfulness skills118.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.

Leading up to these recommendations, the report outlined an “alarming crisis” in the mental health of children and adolescents and proposes:

Given the scale of this mental health crisis, there is real urgency to innovate new approaches where there is good preliminary evidence. Mindfulness fits this criterion and we believe there is enough evidence of its potential benefits to warrant a significant scaling-up of its availability in schools.

Think of all the financial and professional opportunities that proponents of mindfulness involved in preparation of this report have garnered for themselves.

Mindfulness to promote executive functioning in children and adolescents

For the remainder of the blog post, I will focus on the two studies cited in support of the following statement:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

The terms “executive control” and “emotional stability” were clarified:

Many argue that the most important prerequisites for child development are executive control (the management of cognitive processes such as memory, problem solving, reasoning and planning) and emotion regulation (the ability to understand and manage the emotions, including and especially impulse control). These main contributors to self-regulation underpin emotional wellbeing, effective learning and academic attainment. They also predict income, health and criminality in adulthood69. American psychologist, Daniel Goleman, is a prominent exponent of the research70 showing that these capabilities are the biggest single determinant of life outcomes. They contribute to the ability to cope with stress, to concentrate, and to use metacognition (thinking about thinking: a crucial skill for learning). They also support the cognitive flexibility required for effective decision-making and creativity.

Actually, Daniel Goleman is the former editor of the pop magazine Psychology Today and an author of numerous pop books.

The first cited paper.

73 Flook L, Smalley SL, Kitil MJ, Galla BM, Kaiser-Greenland S, Locke J, et al. Effects of mindful  awareness practices on executive functions in elementary school children. Journal of Applied School Psychology. 2010;26(1):70-95.

Journal of Applied School Psychology is a Taylor-Francis journal, formerly known as Special Services in the Schools (1984 – 2002).  Its Journal Impact Factor is 1.30.

One of the authors of the article, Susan Kaiser-Greenland is a mindfulness entrepreneur as seen in her website describing her as an author, public speaker, and educator on the subject of sharing secular mindfulness and meditation with children and families. Her books are The Mindful Child: How to Help Your Kid Manage Stress and Become Happier, Kinder, and More Compassionate and Mindful Games: Sharing Mindfulness and Meditation with Children, Teens, and Families and the forthcoming The Mindful Games Deck: 50 Activities for Kids and Teens.

This article represents the main research available on Kaiser-Greenfield’s Inner Kids program and figures prominently in her promotion of her products.

The sample consisted of 64 children assigned to either mindful awareness practices (MAPs; n = 32) or a control group consisting of a silent reading period (n = 32).

The MAPs training used in the current study is a curriculum developed by one of the authors (SKG). The program is modeled after classical mindfulness training for adults and uses secular and age appropriate exercises and games to promote (a) awareness of self through sensory awareness (auditory, kinesthetic, tactile, gustatory, visual), attentional regulation, and awareness of thoughts and feelings; (b) awareness of others (e.g., awareness of one’s own body placement in relation to other people and awareness of other people’s thoughts and feelings); and (c) awareness of the environment (e.g., awareness of relationships and connections between people, places, and things).

A majority of exercises involve interactions among students and between students and the instructor.

Outcomes.

The primary EF outcomes were the Metacognition Index (MI), Behavioral Regulation Index (BRI), and Global Executive Composite (GEC) as reported by teachers and parents

Wikipedia presents the results of this study as:

The program was delivered for 30 minutes, twice per week, for 8 weeks. Teachers and parents completed questionnaires assessing children’s executive function immediately before and following the 8-week period. Multivariate analysis of covariance on teacher and parent reports of executive function (EF) indicated an interaction effect baseline EF score and group status on posttest EF. That is, children in the group that received mindful awareness training who were less well regulated showed greater improvement in EF compared with controls. Specifically, those children starting out with poor EF who went through the mindful awareness training showed gains in behavioral regulation, metacognition, and overall global executive control. These results indicate a stronger effect of mindful awareness training on children with executive function difficulties.

The finding that both teachers and parents reported changes suggests that improvements in children’s behavioral regulation generalized across settings. Future work is warranted using neurocognitive tasks of executive functions, behavioral observation, and multiple classroom samples to replicate and extend these preliminary findings.”

What I discovered when I scrutinized the study.

 This study is unblinded, with students and their teachers and parents providing the subjective ratings of the students well aware of which group students are assigned. We are not given any correlations among or between their ratings and so we don’t know whether there is just a global subjective factor (easy or difficult child, well-behaved or not) operating for either teachers or parents, or both.

It is unclear for what features of the mindfulness training the comparison reading group offers control or equivalence. The two groups are  different in positive expectations and attention and support that are likely to be reflected the parent and teacher ratings. There’s a high likelihood of any differences in outcomes being nonspecific and not something active and distinct ingredient of mindfulness training. In any comparison with the students assigned to reading time, students assigned to mindfulness training have the benefit of any active ingredient it might have, as well as any nonspecific, placebo ingredients.

This is exceedingly weak design, but one that dominates evaluations of mindfulness.

With only 32 students per group, note too that this is a seriously underpowered study. It has less than a 50% probability of detecting a moderate sized effect if one is present. And because of the larger effect size needed to achieve statistical significance with such a small sample size, and statistically significant effects will be large, even if unlikely to replicate in a larger sample. That is the paradox of low sample size we need to understand in these situations.

Not surprisingly, there were no differences between the mindfulness and reading control groups on any outcomes variable, whether rated by parents or teachers. Nonetheless, the authors rescued their claims for an effective intervention with:

However, as shown by the significance of interaction terms, baseline levels of EF (GEC reported by teachers) moderated improvement in posttest EF for those children in the MAPs group compared to children in the control group. That is, on the teacher BRIEF, children with poorer initial EF (higher scores on BRIEF) who went through MAPs training showed improved EF subsequent to the training (indicated by lower GEC scores at posttest) compared to controls.

Similar claims were made about parent ratings. But let’s look at figure 3 depicting post-test scores. These are from the teachers, but results for the parent ratings are essentially the same.

teacher BRIEF quartiles

Note the odd scaling of the X axis. The data are divided into four quartiles and then the middle half is collapsed so that there are three data points. I’m curious about what is being hidden. Even with the sleight-of-hand, it appears that scores for the intervention and control groups are identical except for the top quartile. It appears that just a couple of students in the control group are accounting for any appearance of a difference. But keep in mind that the upper quartile is only a matter of eight students in each group.

This scatter plot is further revealing:

teacher BRIEF

It appears that the differences that are limited to the upper quartile are due to a couple of outlier control students. Without them, even the post-hoc differences that were found in the upper quartile between intervention control groups would likely disappear.

Basically what we are seeing is that most students do not show any benefit whatsoever from mindfulness training over being in a reading group. It’s not surprising that students who were not particularly elevated on the variables of interest do not register an effect. That’s a common ceiling effect in such universally delivered interventions in general population samples

Essentially, if we focus on the designated outcome variables, we are wasting the students’ time as well as that of the staff. Think of what could be done if the same resources could be applied in more effective ways. There are a couple of students in in this study were outliers with low executive function. We don’t know how else they otherwise differ.Neither in the study, nor in the validation of these measures is much attention given to their discriminant validity, i.e., what variables influence the ratings that shouldn’t. I suspect strongly that there are global, nonspecific aspects to both parent and teacher ratings such that they are influenced by the other aspects of these couple of students’ engagement with their classroom environment, and perhaps other environments.

I see little basis for the authors’ self-congratulatory conclusion:

The present findings suggest that mindfulness introduced in a general  education setting is particularly beneficial for children with EF difficulties.

And

Introduction of these types of awareness practices in elementary education may prove to be a viable and cost-effective way to improve EF processes in general, and perhaps specifically in children with EF difficulties, and thus enhance young children’s socio-emotional, cognitive, and academic development.

Maybe the authors stared with this conviction and it was unshaken by disappointing findings.

Or the statement made in Mindfulness Nation:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

But we have another study that is cited for this statement.

74. Huppert FA, Johnson DM. A controlled trial of mindfulness training in schools: The importance of practice for an impact on wellbeing. The Journal of Positive Psychology. 2010; 5(4):264-274.

The first author, Felicia Huppert is a  Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University.

This study involved 173 14- and 15- year old  boys from a private Catholic school.

The Journal of Positive Psychology is not known for its high methodological standards. A look at its editorial board suggests a high likelihood that manuscripts submitted will be reviewed by sympathetic reviewers publishing their own methodologically flawed studies, often with results in support of undeclared conflicts of interest.

The mindfulness training was based on the program developed by Kabat-Zinn and colleagues at the University of Massachusetts Medical School (Kabat-Zinn, 2003). It comprised four 40 minute classes, one per week, which presented the principles and practice of mindfulness meditation. The mindfulness classes covered the concepts of awareness and acceptance, and the mindfulness practices included bodily awareness of contact points, mindfulness of breathing and finding an anchor point, awareness of sounds, understanding the transient nature of thoughts, and walking meditation. The mindfulness practices were built up progressively, with a new element being introduced each week. In some classes, a video clip was shown to highlight the practical value of mindful awareness (e.g. “The Last Samurai”, “Losing It”). Students in the mindfulness condition were also provided with a specially designed CD, containing three 8-minute audio files of mindfulness exercises to be used outside the classroom. These audio files reflected the progressive aspects of training which the students were receiving in class. Students were encouraged to undertake daily practice by listening to the appropriate audio files. During the 4-week training period, students in the control classes attended their normal religious studies lessons.

A total of 155 participants had complete data at baseline and 134 at follow-up (78 in the mindfulness and 56 in the control condition). Any student who had missing data are at either time point was simply dropped from the analysis. The effects of this statistical decison are difficult to track in the paper. Regardless, there was a lack of any difference between intervention and control group and any of a host of outcome variables, with none designated as primary outcome.

Actual practicing of mindfulness by students was inconsistent.

One third of the group (33%) practised at least three times a week, 34.8% practised more than once but less than three times a week, and 32.7% practised once a week or less (of whom 7 respondents, 8.4%, reported no practice at all). Only two students reported practicing daily. The practice variable ranged from 0 to 28 (number of days of practice over four weeks). The practice variable was found to be highly skewed, with 79% of the sample obtaining a score of 14 or less (skewness = 0.68, standard error of skewness = 0.25).

The authors rescue their claim of a significant effect for the mindfulness intervention with highly complex multivariate analyses with multiple control variables in which outcomes within-group effects for students assigned to mindfulness  were related to the extent of students actually practicing mindfulness. Without controlling for the numerous (and post-hoc) multiple comparisons, results were still largely nonsignificant.

One simple conclusion that can be drawn is that despite a lot of encouragement, there was little actual practice of mindfulness by the relatively well-off students in a relatively highly resourced school setting. We could expect results to improve with wider dissemination to schools with less resources and less privileged students.

The authors conclude:

The main finding of this study was a significant improvement on measures of mindfulness and psychological well-being related to the degree of individual practice undertaken outside the classroom.

Recall that Mindful Nation cited the study in the following context:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

These are two methodologically weak studies with largely null findings. They are hardly the basis for launching a national policy implementing universal mindfulness in the schools.

As noted in the US AHRQ report, despite a huge number of studies of mindfulness having been conducted, few involved a test with an adequate control group, and so there’s little evidence that mindfulness has any advantage over any active treatment. Neither of these studies disturbed that conclusion, although they are spun both in the original studies and in the Mindful Nation report to be positive. Both papers were published in journals where the reviewers were likely to be overly sympathetic and not at him tentative to serious methodological and statistical problems.

The committee writing Mindful Nation arrived at conclusions consistent with their prior enthusiasm for mindfulness and their vested interest in it. They sorted through evidence to find what supported their pre-existing assumptions.

Like UK resilience programs, the recommendations of Mindful Nation put considerable resources in the delivery of services to a large population and likely to have the threshold of need to register a socially in clinically significant effect. On a population level, results of the implementation are doomed to fall short of its claims. Those many fewer students in need more timely, intensive, and tailored services are left underserved. Their presence is ignored or, worse, invoked to justify the delivery of services to the larger group, with the needy students not benefiting.

In this blog post, I mainly focused on two methodologically poor studies. But for the selection of these particular studies, I depended on the search of the authors of Mindful Nation and the emphasis that were given to these two studies for some sweeping claims in the report. I will continue to be writing about the recommendations of Mindful Nation. I welcome reader feedback, particularly from readers whose enthusiasm for mindfulness is offended. But I urge them not simply to go to Google and cherry pick an isolated study and ask me to refute its claims.

Rather, we need to pay attention to the larger literature concerning mindfulness, its serious methodological problems, and the sociopolitical forces and vested interests that preserve a strong confirmation bias, both in the “scientific” literature and its echoing in documents like Mindful Nation.

Effect of a missing clinical trial on what we think about cognitive behavior therapy

  • Data collection for a large, well-resourced study of cognitive behavior therapy (CBT) for psychosis was completed years ago, but the study remains unpublished.
  • Its results could influence the overall evaluation of CBT versus alternative treatments if integrated with what is already known.
  • Political considerations can determine whether completed psychotherapy studies get published or remain lost.
  • This rich example demonstrates the strong influence of publication bias on how we assess psychotherapies.
  • What can be done to reduce the impact of this particular study having gone missing?

A few years ago Ben Goldacre suggested that we do a study of the registration of clinical trials.

lets'collaborate

I can’t remember the circumstances, but Goldacre and I did not pursue the idea further. I was already committed to studying psychological interventions, in which Goldacre was much less interested. Having battled to get American Psychological Association to fully accept and implement CONSORT in its journals, I was well aware how difficult it was getting the professional organizations offering the prime outlets for psychotherapy studies to accept needed reform. I wanted to stay focused on that.

I continue to follow Goldacre’s work closely and cite him often. I also pay particular attention to John Ioannidis’ follow up of his documentation that much of what we found in the biomedical literature is false or exaggerated, like:

Ioannidis JP. Clinical trials: what a waste. BMJ. 2014 Dec 10;349:g7089

Many trials are entirely lost, as they are not even registered. Substantial diversity probably exists across specialties, countries, and settings. Overall, in a survey conducted in 2012, only 30% of journal editors requested or encouraged trial registration.

In a seeming parallel world, I keep showing that in psychology the situation is worse. I had a simple explanation why that I now recognize was naïve: Needed reforms enforced by regulatory bodies like the US Food and Drug Administration (FDA) take longer to influence the psychotherapy literature, where there are no such pressures.

I think we now know that in both biomedicine and, again, psychology, that broad declarations of government and funding bodies and even journals’ of a commitment to disclose a conflict of interest, registering trials, sharing data, are insufficient to ensure that the literature gets cleaned up.

Statements were published across 14 major medical journals endorsing routine data sharing]. Editors of some of the top journals immediately took steps to undermine the implementation in their particular journals. Think of the specter of “research parasites, raised by the editors of New England Journal of Medicine (NEJM).

Another effort at reform

Following each demonstration that reforms are not being implemented, we get more pressures to do better. For instance, the 2015 World Health Organization (WHO) position paper:

Rationale for WHO’s New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results

WHO’s 2005 statement called for all interventional clinical trials to be registered. Subsequently, there has been an increase in clinical trial registration prior to the start of trials. This has enabled tracking of the completion and timeliness of clinical trial reporting. There is now a strong body of evidence showing failure to comply with results-reporting requirements across intervention classes, even in the case of large, randomised trials [37]. This applies to both industry and investigator-driven trials. In a study that analysed reporting from large clinical trials (over 500 participants) registered on clinicaltrials.gov and completed by 2009, 23% had no results reported even after a median of 60 months following trial completion; unpublished trials included nearly 300,000 participants [3]. Among randomised clinical trials (RCTs) of vaccines against five diseases registered in a variety of databases between 2006–2012, only 29% had been published in a peer-reviewed journal by 24 months following study completion [4]. At 48 months after completion, 18% of trials were not reported at all, which included over 24,000 participants. In another study, among 400 randomly selected clinical trials, nearly 30% did not publish the primary outcomes in a journal or post results to a clinical trial registry within four years of completion [5].

Why is this a problem?

  • It affects understanding of the scientific state of the art.

  • It leads to inefficiencies in resource allocation for both research and development and financing of health interventions.

  • It creates indirect costs for public and private entities, including patients themselves, who pay for suboptimal or harmful treatments.

  • It potentially distorts regulatory and public health decision making.

Furthermore, it is unethical to conduct human research without publication and dissemination of the results of that research. In particular, withholding results may subject future volunteers to unnecessary risk.

How the psychotherapy literature is different from a medical literature.

Unfortunately for the trustworthiness of the psychotherapy literature, the WHO statement is limited to medical interventions. We probably won’t see any direct effects on the psychotherapy literature anytime soon.

The psychotherapy literature has all the problems in implementing reforms that we see in biomedicine – and more. Professional organizations like the American Psychological Association and British Psychological Society publishing psychotherapy research have the other important function of ensuring their clinical membership developer’s employment opportunities. More opportunities for employment show the organizations are meeting their members’ needs this results in more dues-paying members.

The organizations don’t want to facilitate third-party payers citing research that particular interventions that their membership is already practicing are inferior and need to be abandoned. They want the branding of members practicing “evidence-based treatment” but not the burden of members having to make decisions based on what is evidence-based. More basically, psychologists’ professional organizations are cognizant of the need to demonstrate a place in providing services that are reimbursed because they improve mental and physical health. In this respect, they are competing with biomedical interventions for the same pot of money.

So, journals published by psychological organizations have vested interests and not stringently enforcing standards. The well-known questionable research practices of investigators are strengthened by questionable publication practices, like confirmation bias, that are tied to the organizations’ institutional agenda.

And the lower status journals that are not published by professional organizations may compromise their standards for publishing psychotherapy trials because of the status that having these articles confers.

Increasingly, medical journals like The Lancet and The Lancet Psychiatry are seen as more prestigious for publishing psychotherapy trials, but they take less seriously the need to enforce standards for psychotherapy studies the regulatory agencies require for biomedical interventions. Example: The Lancet violated its own policies and accepted publication Tony Morrison’s CBT for psychosis study  for publication when it wasn’t registered until after the trial and started. The declared outcomes were vague enough so they could be re-specified after results were known .

Bottom line, in the case of publishing all psychotherapy trials consistent with published protocols: the problem is taken less seriously than if it were a medical trial.

Overall, there is less requirement for psychotherapy trials be registered and less attention paid by editors and reviewers as to whether trials were registered, and whether outcomes are analytic plans were consistent between the registration in the published study.

In a recent blog post, I identified results of a trial that had been published with switched outcomes and then re-published in another paper with different outcomes, without the registration even being noted.

But for all the same reasons cited by the recent WHO statement, publication of all psychotherapy trials matters.

archaeologist digging for goldRecovering an important CBT trial gone missing

I am now going to review the impact of a large, well resourced study of CBT for psychosis remaining on published. I identified the study by a search of the ISRCTN:

The ISRCTN registry is a primary clinical trial registry recognised by WHO and ICMJE that accepts all clinical research studies (whether proposed, ongoing or completed), providing content validation and curation and the unique identification number necessary for publication. All study records in the database are freely accessible and searchable.

I then went back to the literature to see what it happened with it. Keep in mind that this step is not even possible for the many psychotherapy trials that are simply not registered at all.

Many trials are not registered because they are considered pilot and feasibility studies and therefore not suitable for entering effect sizes into the literature. Yet, if significant results are found, they will be exaggerated because they come from an underpowered study. And such results become the basis for entering results into the literature as if it were a planned clinical trial, with considerable likelihood of not being able to be replicated.

There are whole classes of clinical and health psychology interventions that are dominated by underpowered, poor quality studies that should have been flagged as for evidence or excluded altogether. So, in centering on this trial, I’m picking an important example because it was available to be discovered, but there is much of their there is not available to be discovered, because it was not registered.

CBT versus supportive therapy for persistent positive symptoms in psychotic disorders

The trial registration is:

Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders SRCTN29242879DOI 10.1186/ISRCTN29242879

The trial registration indicates that recruitment started on January 1, 2007 and ended on December 31, 2008.

No publications are listed. I and others have sent repeated emails to the principal investigator inquiring about any publications and have failed to get a response. I even sent a German colleague to visit him and all he would say was that results were being written up. That was two years ago.

Google Scholar indicates the principal investigator continues to publish, but not the results of this trial.

A study to die for

The study protocol is available as a PDF

Klingberg S, Wittorf A, Meisner C, Wölwer W, Wiedemann G, Herrlich J, Bechdolf A, Müller BW, Sartory G, Wagner M, Kircher T. Cognitive behavioural therapy versus supportive therapy for persistent positive symptoms in psychotic disorders: The POSITIVE Study, a multicenter, prospective, single-blind, randomised controlled clinical trial. Trials. 2010 Dec 29;11(1):123.

The methods section makes it sound like a dream study with resources beyond what is usually encountered for psychotherapy research. If the protocol is followed, the study would be an innovative, large, methodologically superior study.

Methods/Design: The POSITIVE study is a multicenter, prospective, single-blind, parallel group, randomised clinical trial, comparing CBT and ST with respect to the efficacy in reducing positive symptoms in psychotic disorders. CBT as well as ST consist of 20 sessions altogether, 165 participants receiving CBT and 165 participants receiving ST. Major methodological aspects of the study are systematic recruitment, explicit inclusion criteria, reliability checks of assessments with control for rater shift, analysis by intention to treat, data management using remote data entry, measures of quality assurance (e.g. on-site monitoring with source data verification, regular query process), advanced statistical analysis, manualized treatment, checks of adherence and competence of therapists.

The study was one of the rare ones providing for systematic assessments of adverse events and any harm to patients. Preumably if CBT is powerful enough to affect positive change, it can have negative effects as well. But these remain entirely a matter of speculation.

Ratings of outcome were blinded and steps were taken to preserve the blinding even if an adverse event occurred. This is important because blinded trials are less susceptible to investigator bias.

Another unusual feature is the use of a supportive therapy (ST) credible, but nonspecific condition as a control/comparison.

ST is thought as an active treatment with respect to the patient-therapist relationship and with respect to therapeutic commitment [21]. In the treatment of patients suffering from psychotic disorders these ingredients are viewed to be essential as it has been shown consistently that the social network of these patients is limited. To have at least one trustworthy person to talk to may be the most important ingredient in any kind of treatment. However, with respect to specific processes related to modification of psychotic beliefs, ST is not an active treatment. Strategies specifically designed to change misperceptions or reasoning biases are not part of ST.

Use of this control condition allows evaluation of the important question of whether any apparent effects of CBT are due to the active ingredients of that approach or to the supportive therapeutic relationship within which the active ingredients are delivered.

Being able to rule out the effects of CBT are due to nonspecific effects justifies the extra resources needed to provide specialized training in CBT, if equivalent effects are obtained in the ST group, it suggests that equivalent outcomes can be achieved simply by providing more support to patients, presumably by less trained and maybe even lay personnel.

It is a notorious feature of studies of CBT for psychosis that they lack comparison/control groups in any way equivalent to the CBT in terms of nonspecific intensity, support, encouragement, and positive expectations. Too often, the control group are ill-defined treatment as usual (TAU) that lacks regular contact and inspires any positive expectations. Basically CBT is being compared to inadequate treatment and sometimes no treatment and so any apparent effects that are observed are due to correcting these inadequacies, not any active ingredient.

The protocol hints in passing at the investigators’ agenda.

This clinical trial is part of efforts to intensify psychotherapy research in the field of psychosis in Germany, to contribute to the international discussion on psychotherapy in psychotic disorders, and to help implement psychotherapy in routine care.

Here we see an aim to justify implementation of CBT for psychosis in routine care in Germany. We have seen something similar with repeated efforts of German to demonstrate that long-term psychodynamic psychotherapy is more effective than shorter, less expensive treatments, despite the lack of credible data [ ].

And so, if the results would not contribute to getting psychotherapy implemented in routine care in Germany, do they get buried?

Science & Politics of CBT for Psychosis

A rollout of a CBT study for psychosis published in Lancet made strong claims in a BBC article and audiotape promotion.

morroson slide-page-0

 

 

 

The attention attracted critical scrutiny that these claims couldn’t sustain. After controversy on Twitter, the BBC headline was changed to a more modest claim.

Criticism mounted:

  • The study retained fewer participants receiving CBT at the end of the study than authors.
  • The comparison treatment was ill-defined, but for some patients meant no treatment because they were kicked out of routine care for refusing medication.
  • A substantial proportion of patients assigned to CBT began taking antipsychotic medication by the end of the study.
  • There was no evidence that the response to CBT was comparable to that achieved with antipsychotic medication alone in clinical trials.
  • No evidence that less intensive, nonspecific supportive therapy would not have achieved the same results as CBT.

And the authors ended up conceding in a letter to the editor that their trial had been registered after data collection had started and it did not produce evidence of equivalence to antipsychotic medication.

In a blog post containing the actual video of the presentation before his British Psychological Society, Keith Laws declares

Politics have overcome the science in CBT for psychosis

Recently the British Psychological Society invited me to give a public talk entitled CBT: The Science & Politics behind CBT for Psychosis. In this talk, which was filmed…, I highlight the unquestionable bias shown by the National Institute of Clinical Excellence (NICE) committee  (CG178) in their advocacy of CBT for psychosis.

The bias is not concealed, but unashamedly served-up by NICE as a dish that is high in ‘evidence-substitute’, uses data that are past their sell-by-date and is topped-off with some nicely picked cherries. I raise the question of whether committees – with such obvious vested interests – should be advocating on mental health interventions.

I present findings from our own recent meta-analysis (Jauhar et al 2014) showing that three-quarters of all RCTs have failed to find any reduction in the symptoms of psychosis following CBT. I also outline how trials which have used non-blind assessment of outcomes have inflated effect sizes by up to 600%. Finally, I give examples where CBT may have adverse consequences – both for the negative symptoms of psychosis and for relapse rates.

A pair of well-conducted and transparently reported Cochrane reviews suggest there is little evidence for the efficacy of CBT for psychosis (*)

cochrane slide-page-0                          cochrane2-page-0

 

These and other slides are available in a slideshow presentation of a talk I gave at the Edinburgh Royal  Infirmary.

Yet, even after having to be tempered in the face of criticism, the original claims of the Morrison study get echoed in the antipsychiatry Understanding Psychosis:

“Other forms of therapy can also be helpful, but so far it is CBTp that has been most intensively researched. There have now been several meta-analyses (studies using a statistical technique that allows findings from various trials to be averaged out) looking at its effectiveness. Although they each yield slightly different estimates, there is general consensus that on average, people gain around as much benefit from CBT as they do from taking psychiatric medication.”

Such misinformation can confuse patients making difficult decisions about whether to accept antipsychotic medication.

go on without mejpgIf the results from the missing CBT for psychosis study became available…

If the Klingberg study were available and integrated with existing data, it would be one of the largest and highest quality studies and it would provide insight into any advantage of CBT for psychosis. For those who can be convinced by data, a null finding from a large studythat added to mostly small and methodologically unsophisticated studies could be decisive.

A recent meta-analysis of CBT for prevention of psychosis by Hutton and Taylor includes six studies and mentions the trial protocol in passing:

Two recent trials of CBT for established psychosis provide examples of good practice for reporting harms (Klingberg et al. 20102012) and CONSORT (Consolidated Standards of Reporting Trials) provide a sensible set of recommendations (Ioannidis et al. 2004).

Yet, it does not provide indicate why it is missing and is not included in a list of completed but unpublished studies. Yet, the protocol indicates a study considerably larger than any of the studies that were included.

To communicate a better sense of the potential importance of this missing study and perhaps place more pressures on the investigators to release its results, I would suggest that future meta-analyses state:

The protocol for Klingberg et al. Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders indicates that recruitment was completed in 2008. No publications have resulted. Emails to Professor Klingberg about the status of the study failed to get a response. If the study were completed consistent with its protocol, it would represent one of the largest studies of CBT for psychosis ever and one of the few with a fair comparison between CBT and supportive therapy. Inclusion of the results could potentially substantially modify the conclusions of the current meta-analysis.

 

Was independent peer review of the PACE trial articles possible?

I ponder this question guided by Le Chavalier C. Auguste Dupin, the first fictional detective, before anyone was called “detective.”

mccartney too manyArticles reporting the PACE trial have extraordinary numbers of authors, acknowledgments, and institutional affiliations. A considerable proportion of all persons and institutions involved in researching chronic fatigue and related conditions in the UK have a close connection to PACE.

This raises issues about

  • Obtaining independent peer review of these articles that is not tainted by reviewer conflict of interest.
  • Just what authorship on a PACE trial paper represents and whether granting of authorship conforms to international standards.
  • The security of potential critics contemplating speaking out about whatever bad science they find in the PACE trial articles. The security of potential reviewers who are negative and can be found out. Critics within the UK risk isolation and blacklisting from a large group who have investments in what could be exaggerated estimates of the quality and outcome of PACE trial.
  • Whether grants associated with multimillion pound PACE study could have received the independent peer review that is so crucial to assuring that proposals selected to be funded are of the highest quality.

Issues about the large number of authors, acknowledgments, and institutional affiliations become all the more salient as critics [1, 2, 3] find again serious flaws inthe conduct and the reporting of the Lancet Psychiatry 2015 long-term follow-up study. Numerous obvious Questionable Research Practices (QRPs) survived peer review. That implies at least ineptness in peer review or even Questionable Publication Practices (QPPs).

The important question becomes: how is the publication of questionable science to be explained?

Maybe there were difficulties finding reviewers with relevant expertise who were not in some way involved in the PACE trial or affiliated with departments and institutions that would be construed as benefiting from a positive review outcome, i.e. a publication?

Or in the enormous smallness of the UK, is independent peer review achieved by persons putting those relationships and affiliations aside to produce an impeccably detached and rigorous review process?

The untrustworthiness of both the biomedical and psychological literatures are well-established. Nonpharmacological interventions have fewer safeguards than drug trials, in terms of adherence to preregistration, reporting standards like CONSORT, and enforcement of sharing of data.

Open-minded skeptics should be assured of independent peer review of nonpharmacological clinical trials, particularly when there is evidence that persons and groups with considerable financial interests attempt to control what gets published and what is said about their favored interventions. Reviewers with potential conflicts of interest should be excluded from evaluation of manuscripts.

Independent peer review of the PACE trial by those with relevant expertise might not be possible the UK where much of the conceivable expertise is in some way directly or indirectly attached to the PACE trial.

A Dutch observer’s astute observations about the PACE articles

My guest blogger Dutch research biologist Klaas van Dijk  called attention to the exceptionally large number of authors and institutions listed for a pair of PACE trial papers.

klaasKlaas noted

The Pubmed entry for the 2011 Lancet paper lists 19 authors:

B J Angus, H L Baber, J Bavinton, M Burgess, T Chalder, L V Clark, D L Cox, J C DeCesare, K A Goldsmith, A L Johnson, P McCrone, G Murphy, M Murphy, H O’Dowd, PACE trial management group*, L Potts, M Sharpe, R Walwyn, D Wilks and P D White (re-arranged in an alphabetic order).

The actual article from the Lancet website ( http://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(11)60096-2.pdf and also http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext ) lists 19 authors who are acting ‘on behalf of the PACE trial management group†’. But the end of the paper (page 835) states: “PACE trial group.” This term is not identical to “PACE trial management group”.
.
In total, another 19 names are listed under “PACE trial group” (page 835): Hiroko Akagi, Mansel Aylward, Barbara Bowman Jenny Butler, Chris Clark, Janet Darbyshire, Paul Dieppe, Patrick Doherty, Charlotte Feinmann, Deborah Fleetwood, Astrid Fletcher, Stella Law, M Llewelyn, Alastair Miller, Tom Sensky, Peter Spencer, Gavin Spickett, Stephen Stansfeld and Alison Wearden (re-arranged in an alphabetic order).

There is no overlap with the first 19 people who are listed as author of the paper.

So how many people can claim to be an author of this paper? Are all these 19 people of the “PACE trial management group” (not identical to “PACE trial group”???) also some sort of co-author of this paper? Do all these 19 people of the second group also agree with the complete contents of the paper? Do all 38 people agree with the full contents of the paper?

The paper lists many affiliations:
* Queen Mary University of London, UK
* King’s College London, UK
* University of Cambridge, UK
* University of Cumbria, UK
* University of Oxford, UK
* University of Edinburgh, UK
* Medical Research Council Clinical Trials Unit, London, UK
* South London and Maudsley NHS Foundation Trust, London, UK
* The John Radcliffe Hospital, Oxford, UK
* Royal Free Hospital NHS Trust, London, UK
* Barts and the London NHS Trust, London, UK
* Frenchay Hospital NHS Trust, Bristol, UK;
* Western General Hospital, Edinburgh, UK

Do all these affiliations also agree with the full contents of the paper? Am I right to assume that all 38 people (names see above) and all affiliations / institutes (see above) plainly refuse to give critics / other scientists / patients / patient groups (etc.) access to the raw research data of this paper and am I am right with my assumption that it is therefore impossible for all others (including allies of patients / other scientists / interested students, etc.) to conduct re-calculations, check all statements with the raw data, etc?

Decisions whether to accept manuscripts for publication are made in dark places based on opinions offered by people whose identities may be known only to editors. Actually, though, in a small country like the UK, peer-reviewed may be a lot less anonymous than intended and possibly a lot less independent and free of conflict of interests. Without a lot more transparency than is currently available concerning peer review the published papers underwent, we are left to our speculation.

Prepublication peer review is just one aspect of the process of getting research findings vetted and shaped and available to the larger scientific community, and an overall process that is now recognized as tainted with untrustworthiness.

Rules for granting authorship

Concerns about gift and unwarranted authorship have increased not only because of growing awareness of unregulated and unfair practices, but because of the importance attached to citations and authorship for professional advancement. Journals are increasingly requiring documentation that all authors have made an appropriate contribution to a manuscript and have approved the final version

Yet operating rules for granting authorship in many institutional settings vary greatly from the stringent requirements of journals. Contrary to the signed statements that corresponding authors have to make in submitting a manuscript to a journal, many clinicians expect an authorship in return for access to patients. Many competitive institutions award and withhold authorship based on politics and good or bad behavior that have nothing to do with requirements of journals.

Basically, despite the existence of numerous ethical guidelines and explicit policies, authors and institutions can largely do what they want when it comes to granting and withholding authorship.

Persons are quickly disappointed when they are naïve enough to complain about unwarranted authorships or being forced to include authors on papers without appropriate contribution or being denied authorship for an important contribution. They quickly discover that whistleblowers are generally considered more of a threat to institutions and punished more severely than alleged wrongdoers, no matter how strong the evidence may be.

The Lancet website notes

The Lancet is a signatory journal to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, issued by the International Committee of Medical Journal Editors (ICMJE Recommendations), and to the Committee on Publication Ethics (COPE) code of conduct for editors. We follow COPE’s guidelines.

The ICMJE recommends that an author should meet all four of the following criteria:

  • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;
  • Drafting the work or revising it critically for important intellectual content;
  • Final approval of the version to be published;
  • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.”

The intent of these widely endorsed recommendations is that persons associated with a large project have to do a lot to claim their places as authors.

Why the fuss about acknowledgments?

I’ve heard from a number of graduate students and junior investigators that they have had their first manuscripts held up in the submission process because they did not obtain written permission for acknowledgments. Why is that considered so important?

Mention in an acknowledgment is an honor. But it implies involvement in a project and approval of a resulting manuscript. In the past, there were numerous instances where people were named in acknowledgments without having given permission. There was a suspicion sometimes confirmed, that they had been acknowledged only to improve the prospects of a manuscript for getting published. There are other instances where persons were included in acknowledgments without permission with the intent of authors avoiding them in the review process because of the appearance of a conflict of interest.

The expectation is that anyone contributing enough to a manuscript to be acknowledged as a potential conflict of interest in deciding whether it is suitable for publication.

But, as in other aspects of a mysterious and largely anonymous review process, whether people who were acknowledged in manuscripts were barred from participating in review of a manuscript cannot be established by readers.

What is the responsibility of reviewers to declare conflict of interest?

Reviewers are expected to declare conflicts of interest accepting a manuscript to review. But often they are presented with a tick box without a clear explanation of the criteria for the appearance of conflict of interest. But reviewers can usually continue considering a manuscript after acknowledging that they do have an association with authors or institutional affiliation, but they do not consider it a conflict. It is generally accepted that statement.

Authors excluding from the review process persons they consider to have a negative bias

In submitting a manuscript, authors are offered an opportunity to identify persons who should be excluded because of the appearance of a negative bias. Editors generally take these requests quite seriously. As an editor, I sometimes receive a large number of requested exclusions by authors who worry about opinions of particular people.

While we don’t know what went on in prepublication peer review, the PACE investigators have repeatedly and aggressively attempted to manipulate post publication portrayals of their trial in the media. Can we rule out that they similarly try to control potential critics in the prepublication peer review of their papers?

The 2015 Lancet Psychiatry secondary mediation analysis article

Chalder, T., Goldsmith, K. A., Walker, J., & White, P. D. Sharpe, M., Pickles, A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. The Lancet Psychiatry, 2: 141–52

The acknowledgments include

We acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, excluding ARP, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, LV Clark, DL Cox, JC DeCesare, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks. This report is independent research partly arising from a doctoral research fellowship supported by the NIHR.

Fifteen of the authors of the 2011 Lancet PACE paper are no longer present, and another author has been added. The PACE Trial Management Group is again acknowledged, but there is no mention of the separate PACE trial group. We can’t tell why there has been a major reduction in the number of authors and acknowledgments or why it came about. Or whether people who would been dropped participated in a review of this paper. But what is obvious is that this is an exceedingly flawed mediation analysis crafted to a foregone conclusion. I’ll say more about that in future blogs, but we can only speculate how the bad publication practices made it through peer review.

This article is a crime against the practice of secondary mediation analyses. If I were a prospect of author present in a discussion, I would flee before it became a crime scene.

I am told I have over 350 publications, but I considered vulgar for authors to keep track of exact numbers. But there are many potential publications that are not included in this number because I declined authorship because I could not agree with the spin that others were trying to put on the reporting of the findings. In such instances, I exclude myself from review of the resulting manuscript because of the appearance of a conflict of interest. We can ponder how many of the large pool of past PACE authors refused authorship on this paper when it was offered and homely declined to participate in subsequent peer review because of the appearance of a conflict of interest.

The 2015 Lancet Psychiatry long-term follow-up article

Sharpe, M., Goldsmith, K. A., Chalder, T., Johnson, A.L., Walker, J., & White, P. D. (2015). Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry, http://dx.doi.org/10.1016/S2215-0366(15)00317-X

The acknowledgments include

We gratefully acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, L V Clark, D L Cox, J C DeCesare, E Feldman, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks, and the King’s Clinical Trials Unit. We thank Hannah Baber for facilitating the long-term follow-up data collection.

Again, there are authors and acknowledgments missing from the early paper and were in the dark about how and why that happened and whether missing persons were considered free enough of conflict of interest to evaluate this article when it was in manuscript form. But as documented in a blog post at Mind the Brain, there were serious, obvious flaws in the conduct and reporting of the follow-up study. It is a crime against best practices for the proper conduct and reporting of clinical trials. And again we can speculate how it got through peer review.

… And grant reviews?

Where can UK granting agencies obtain independent peer review of past and future grants associated with the PACE trial? To take just one example, the 2015 Lancet Psychiatry secondary mediation analysis was funded in part by a NIHR doctoral research fellowship grant. The resulting paper has many fewer authors than the 2011 Lancet. Did everyone who was an author or mentioned in the acknowledgments on that paper exclude themselves from review of the screen? Who, then, would be left

In Germany and the Netherlands, concerns about avoiding the appearance of conflict of interest in obtaining independent peer review of grants has led to heavy reliance on expertise from outside the country. This does not imply any improprieties from expertise within these countries, but rather the necessity of maintaining a strong appearance that vested interests have not unduly influenced grant review. Perhaps the situation of apparent with the PACE trial suggests that journals and grant review panels within the UK might consider similar steps.

Contemplating the evidence against independent peer review

  • We have a mob of people as authors and mentions in acknowledgments. We have a huge conglomerate of institutions acknowledged.
  • We have some papers with blatant questionable research and reporting practices published in prestigious journals after ostensible peer review.
  • We are left in the dark about what exactly happened in peer review, but that the articles were adequately peer reviewed is a crucial part of their credability.

What are we to conclude?

The_Purloined_LetterI think of what Edgar Allen Poe’s wise character, Le Chevalier C. Auguste Dupin would say. For those of you who don’t know who he is:

Le Chevalier C. Auguste Dupin  is a fictional detective created by Edgar Allan Poe. Dupin made his first appearance in Poe’s “The Murders in the Rue Morgue” (1841), widely considered the first detective fiction story.[1] He reappears in “The Mystery of Marie Rogêt” (1842) and “The Purloined Letter” (1844)…

Poe created the Dupin character before the word detective had been coined. The character laid the groundwork for fictitious detectives to come, including Sherlock Holmes, and established most of the common elements of the detective fiction genre.

I think if we asked Dupin, he would say the danger is that the question is too fascinating to give up, but impossible to resolve without evidence we cannot access. We can blog, we can discuss this important question, but in the end we cannot answer it with certainty.

Sigh.

Amazingly spun mindfulness trial in British Journal of Psychiatry: How to publish a null trial

mindfulness chocolateSince when is “mindfulness therapy is not inferior to routine primary care” newsworthy?

 

Spinning makes null results a virtue to be celebrated…and publishable.

An article reporting a RCT of group mindfulness therapy

Sundquist, J., Lilja, Å., Palmér, K., Memon, A. A., Wang, X., Johansson, L. M., & Sundquist, K. (2014). Mindfulness group therapy in primary care patients with depression, anxiety and stress and adjustment disorders: randomised controlled trial. The British Journal of Psychiatry.

was previously reviewed in Mental Elf. You might want to consider their briefer evaluation before beginning mine. I am going to be critical not only of the article, but the review process that got it into British Journal of Psychiatry (BJP).

I am an Academic Editor of PLOS One,* where we have the laudable goal of publishing all papers that are transparently reported and not technically flawed. Beyond that, we leave decisions about scientific quality to post-publication commentary of the many, not a couple of reviewers whom the editor has handpicked. Yet, speaking for myself, and not PLOS One, I would have required substantial revisions or rejected the version of this paper that got into the presumably highly selective, even vanity journal BJP**.

The article is paywalled, but you can get a look at the abstract here  and write to the corresponding author for a PDF at Jan.sundquist@med.lu.se

As always, examine the abstract carefully  when you suspect spin, but expect that you will not fully appreciate the extent of spin until you have digested the whole paper. This abstract declares

Mindfulness-based group therapy was non-inferior to treatment as usual for patients with depressive, anxiety or stress and adjustment disorders.

“Non-inferior” meaning ‘no worse than routine care?’ How could that null result be important enough to get into a journal presumably having a strong confirmation bias? The logic sounds just like US Senator George Aiken famously proposing getting America out of the war it was losing in Vietnam by declaring America had won and going home.

There are hints of other things going on, like no reporting of how many patients were retained for analysis or whether there were intention-to-treat analyses. And then the weird mention of outcomes being analyzed with “ordinal mixed models.”  Have you ever seen that before? And finally, do the results hold for patients with any of those disorders or only a particular sample of unknown mix and maybe only representing those who could be recruited from specific settings? Stay tuned…

What is a non-inferiority trial and when should one conduct one?

An NHS website explains

The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.

Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials  found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs  so that misuse of the design in the service of getting publishable results is more readily detected. You might want to download the CONSORT checklist for NI and apply the checklist to the trial under discussion. Right away, you can see how deficient the reporting is in the abstract of the paper under discussion.

Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.

Conflicts of interest could be a reason. Demonstration that the treatment is as good as existing treatments might warrant marketing of the new treatment or dissemination into existing markets. There could be financial rewards or simply promoters and enthusiasts favoring what they would find interesting. Yup, some bandwagons, some fads and fashions psychotherapy are in large part due to promoters simply seeking the new and different, without evidence that a treatment is better than existing ones.

Suspicions are reduced when the new treatment has other advantages, like greater acceptability or a lack of side effects, or when the existing treatments are so good that an RCT of the new treatment with a placebo-control condition would be unethical.

We should give evaluate whether there is an adequate rationale for authors doing an NI RCT, rather than them relying on the conventional test whether the null hypothesis can be rejected of no differences between the intervention and a control condition. Suitable support would be a strong record of efficacy for a well defined control condition. It would also help if the trial were pre-registered as NI, quieting concerns that it was declared as such after peeking at the data.

net-smart-mindfulnessThe first things I noticed in the methods section…trouble

  • The recruitment procedure is strangely described, but seems to indicate that the therapist providing mindfulness training were present during recruitment and probably weren’t blinded to group assignment and conceivably could influence it. The study thus does not have clear evidence of an appropriate randomization procedure and initial blinding. Furthermore, the GPs administering concurrent treatment also were not blinded and might take group assignment into account in subsequent prescribing and monitoring of medication.
  • During the recruitment procedure, GPs assessed whether medication was needed and made prescriptions before randomization occurred. We will need to see – we are not told in the methods section – but I suspect a lot of medication is being given to both intervention and control patients. That is going to complicate interpretation of results.
  • In terms of diagnosis, a truly mixed group of patients was recruited. Patients experiencing stress or adjustment reactions were thrown in with patients who had mild or moderate depression or anxiety disorders. Patients were excluded who were considered severe enough to need psychiatric care.
  • Patients receiving any psychotherapy at the start of the trial were excluded, but the authors ignored whether patients were receiving medication.

This appears to be a mildly distressed sample that is likely to show some recovery in the absence of any treatment. The authors’ not controlling for the medication was received is going to be a big problem later. Readers won’t be able to tell whether any improvement in the intervention condition is due to its more intensive support and encouragement that results in better adherence to medication.

  • The authors go overboard in defending their use of multiple overlapping
    Play at https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0CC4QFjAC&url=https%3A%2F%2Fmyspace.com%2Fkevncoyne%2Fmusic%2Fsong%2Felvis-is-dead-86812363-96247308&ei=GvYYVbegOKTf7AaRzIHoCg&usg=AFQjCNHM4EKRwFYkepeT-yROFk4LOtfhCA&bvm=bv.89381419,d.ZGU
    Play Elvis is Dead at athttp://tinyurl.com/p78pzcn

    measures and overboard in praising the validity of their measures. For instance, The Hospital Anxiety and Depression Scale (HADS) is a fatally flawed instrument, even if still widely used. I considered the instrument dead in terms of reliability and validity, but like Elvis, it is still being cited.

Okay, the authors claim these measures are great, and attach clinical importance to cut points that others no longer consider valid. But then, why do they decide that the scales are ordinal, not interval? Basically, they are saying the scales are so bad that the differences between one number to the next higher or lower for pairs of items can’t be considered equal. This is getting weird. If the scales are as good as the authors claim, why do the authors take the unusual step of considering them as psychometrically inadequate?

I know, I’m getting technical to the point that I risk losing some readers, but the authorsspin no are setting readers up to be comfortable with a decision to focus on medians, not mean scores – making it more difficult to detect any differences between the mindfulness therapy and routine care. Spin, spin!

There are lots of problems with the ill described control condition, treatment as usual (TAU). My standing gripe with this choice is  that TAU varies greatly across settings, and often is so inadequate that at best the authors are comparing whether mindfulness therapy is better than some unknown mix of no treatment and inadequate treatment.

We know enough about mindfulness therapy at this point to not worry about whether it is better than nothing at all, but should be focusing on whether is better than another active treatment and whether its effectiveness is due to particular factors. The authors state that most of the control patients were receiving CBT, but don’t indicate how they knew that, except for case records. Notoriously, a lot of the therapy done in primary care that is labeled by practitioners as CBT does not pass muster. I would be much more comfortable with some sort of control over what patients were receiving in the control arm, or at least better specification.

Analyses

I’m again trying to avoid getting very technical here, but point out for those who have a developed interest in statistics, that there were strange things going on.

  • Particular statistical analyses (depending on group medians, rather than means are chosen that are less likely to reveal differences between intervention and control group than the parametric statistics that are typically done.
  • Complicated decisions justify throwing away data and then using multivariate techniques to estimate what the data were. The multivariate techniques require assumptions that are not tested.
  • The power analysis is not conducted to detect differences between groups, but to be able to provide a basis for saying that mindfulness does not differ from routine care. Were the authors really interested in that question rather than whether mindfulness is better than routine care in initially designing a study and its analytic plan? Without pre-registration, we cannot know.

Results

There are extraordinary revelations in table 1, baseline characteristics.

Please click to enlarge

  • The intervention and control group initially differed for two of the four outcome variables before they even received the intervention. Thus, intervention and control conditions are not comparable in important baseline characteristics. This is in itself a risk of bias, but also raises further questions about the adequacy of the randomization procedure and blinding.
  • We are told nothing about the distribution of diagnoses across the intervention and control group, which is very important in interpreting results and considering what generalizations can be made.
  • Most patients in both the intervention and control groups were receiving antidepressants and about a third of them either condition were receiving a “tranquilizer” or have missing data for that variable.

Signals that there is something amiss in this study are growing stronger. Given the mildness of disturbance and high rates of prescription of medication, we are likely dealing with a primary care sample where medications are casually distributed and poorly monitored. Yet, this study is supposedly designed to inform us whether adding mindfulness to this confused picture produces outcomes that are not worse.

Table 5 adds to the suspicions. There were comparable, significant changes in both the intervention and control condition over time. But we can’t know if that was due to the mildness of distress or effectiveness of both treatments.

table 5

Twice as many patients assigned to mindfulness dropped out of treatment, compared to those assigned to routine care. Readers are given some information about how many sessions of mindfulness patients attended, but not the extent to which they practiced mindfulness.

positive spin 2Discussion

We are told

The main finding of the present RCT is that mindfulness group therapy given in a general practice setting, where a majority of patients with depression, anxiety, and stress and adjustment disorders are treated, is non-inferior to individual-based therapy, including CBT. To the best of our knowledge, this is the first RCT performed in a general practice setting where the effect of mindfulness group therapy was compared with an active control group.

Although a growing body of research has examined the effect of mindfulness on somatic as well as psychiatric conditions, scientific knowledge from RCT studies is scarce. For example, a 2007 review…

It’s debatable whether the statement was true in 2007, but a lot has happened since then. Recent reviews suggest that mindfulness therapy is better than nothing and better than inactive control conditions that do not provide comparable levels of positive expectations and support. Studies are accumulating that indicate mindfulness therapy is not consistently better than active control conditions. Differences become less likely when the alternative treatments are equivalent in positive expectations conveyed to patients and providers, support, and intensity in terms of frequency and amount of contact. Resolving this latter question of whether mindfulness is better than reasonable alternatives is now critical in this study provides no relevant data.

An Implications section states

Patients who receive antidepressants have a reported remission rate of only 35–40%.41 Additional treatment is therefore needed for non-responders as well as for those who are either unable or unwilling to engage in traditional psychotherapy.

The authors are being misleading to the point of being irresponsible in making this statement in the context of discussing the implications of their study. The reference is to the American STAR*D treatment study, which dealt with very different, more chronically and unremittingly depressed population.

An appropriately referenced statement about primary care populations like what this study was recruited would point to the lack of diagnosis on which prescription of medicaton was based, unnecessary treatment with medication of patients who would not be expected to benefit from it, and poor monitoring and follow-up of patients who could conceivably benefit from medication if appropriately minutes. The statement would reflect the poor state of routine care for depression in the community, but would undermine claims that the control group received an active treatment with suitable specification that would allow any generalizations about the efficacy of mindfulness.

MY ASSESSMENT

This RCT has numerous flaws in its conduct and reporting that preclude making any contribution to the current literature about mindfulness therapy. What is extraordinary is that, as a null trial, it got published in BJP. Maybe its publication in its present form represents incompetent reviewing and editing, or maybe a strategic, but inept decision to publish a flawed study with null findings because it concerns the trendy topic of mindfulness and GPs to whom British psychiatrists want to reach out.

An RCT of mindfulness psychotherapy is attention-getting. Maybe the BJP is willing to sacrifice trustworthiness of the interpretation of results for newsworthiness. BJP will attract readership it does not ordinarily get with publication of this paper.

What is most fascinating is that the study was framed as a noninferiority trial and therefore null results are to be celebrated. I challenge anyone to find similar instances of null results for a psychotherapy trial being published in BJP except in the circumstances that make a lack of effect newsworthy because it suggests that investment in the dissemination of a previously promising treatment is not justified. I have a strong suspicion that this particular paper got published because the results were dressed up as a successful demonstration of noninferiority.

I would love to see the reviews this paper received, almost as much as any record of what the authors intended when they planned the study.

Will this be the beginning of a trend? Does BJP want to encourage submission of noninferiority psychotherapy studies? Maybe the simple explanation is that the editor and reviewers do not understand what a noninferiority trial is and what it can conceivably conclude.

Please, some psychotherapy researcher with a null trial sitting in the drawer, test the waters by dressing the study up as a noninferior trial and submitted to BJP.

How bad is this study?

The article provides a non-intention-to-treat analysis of a comparison of mindfulness to an ill specified control condition that would not qualify as an active condition. The comparison does not allow generalization to other treatments in other settings. The intervention and control conditions had significant differences in key characteristics at baseline. The patient population is ill-described in ways that does not allow generalization to other patient populations. The high rates of co-treatment confounding due to antidepressants and tranquilizers precludes determination of any effects of the mindfulness therapy. We don’t know if there were any effects, or if both the mindfulness therapy and control condition benefited from the natural decline in distress of a patient population largely without psychiatric diagnoses. Without a control group like a waiting list, we can’t tell if these patients would have improved any way. I could go on but…

This study was not needed and may be unethical

lipstickpigThe accumulation of literature is such that we need less mindfulness therapy research, not more. We need comparisons with well specified active control groups that can answer the question of whether mindfulness therapy offers any advantage over alternative treatments, not only in efficacy, but in the ability to retain patients so they get an adequate exposure to the treatment. We need mindfulness studies with cleverly chosen comparison conditions that allow determination of whether it is the mindfulness component of mindfulness group therapy that has any effectiveness, rather than relaxation that mindfulness therapy shares with other treatments.

To conduct research in patient populations, investigators must have hypotheses and methods with the likelihood of making a meaningful contribution to the literature commensurate with all the extra time and effort they are asking of patients. This particular study fails this ethical test.

Finally, the publication of this null trial as a noninferiority trial pushes the envelope in terms of the need for preregistration of design and analytic plans for trials. If authors of going to claim a successful demonstration of non-inferiority, we need to know that is what they set out to do, rather than just being stuck with null findings they could not otherwise publish.

*DISCLAIMER: This blog post presents solely the opinions of the author, and not necessarily PLOS. Opinions about the publishability of papers reflect only the author’s views and not necessarily an editorial decision for a manuscript submitted to PLOS One.

**I previously criticized the editorial process at BJP, calling for the retraction of a horribly flawed meta-analysis of the mental health effects of abortion written by an American antiabortion activist. I have pointed out how another flawed review of the efficacy of long-term psychodynamic psychotherapy represented duplicate publication . But both of these papers were published under the last editor. I still hope that the current editor can improve the trustworthiness of what is published at BJP. I am not encouraged by this particular paper, however.

Failing grade for highly cited meta-analysis of positive psychology interventions

The many sins of Sin and  Lyubomirsky

failing gradeI recently blogged about Linda Bolier and colleagues’  meta-analysis of positive psychology interventions [PPIs] in BMC Public Health. It is the new kid on the block. Sin and Lyubomirsky’s  meta analysis is accepted as the authoritative summary of the evidence and has been formally identified by Web of Science as among the top 1% in terms of citations of papers in psychology and psychiatry for 2009, with 187 citations according to Web of Science ,487 citations according to Google Scholar.

This meta-analysis ends on a resoundingly positive note:

Do positive psychology interventions effectively boost well-being and ameliorate depression? The overwhelming evidence from our meta-analysis suggests that the answer is ‘‘yes.’’ The combined results of 49 studies revealed that PPIs do, in fact, significantly enhance WB, and the combined results of 25 studies showed that PPIs are also effective for treating depressive symptoms. The magnitude of these effects is medium-sized (mean r =.29 for WB, mean r= .31 for depression), indicating that not only do PPIs work, they work well.

According to Sin and  Lyubomirsky , the strength of evidence justifies PPIs be disseminated and implemented in the community:

The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.

The authors also claimed to have dispensed with concerns that clinically depressed persons may be less able to benefit from PPIs.  Hmm…

In this blog post I will critically review Sin and  Lyubomirsky’s meta-analysis, focusing on effects of PPIs on  depressive symptoms, as I did in the  earlier blog post concerning Bolier and colleagues’  meta-analysis. As the title of this blog post suggests, I found the Sin and  Lyubomirsky meta-analysis misleading, falling far short of accepted standards for doing and reporting meta-analyses. I hope to convince you that authors who continue to cite this meta-analysis are either naïve, careless, or eager to promote PPIs in defiance of the available evidence. And I will leave you with the question of what its uncritical acceptance and citation says about that the positive psychology community’s standards.

Read on and I will compare and contrast the Sin and  Lyubomirsky and meta analyses and you will get a chance to see how to grade the meta-analysis using the validated checklist, AMSTAR.

stop sign[If you are interested in using AMSTAR yourself  to evaluate the Sin and  Lyubomirsky and Bolier and colleagues’  meta-analysis independently, this would be a good place to stop and get the actual checklist and the article explaining it.].

The Sin and  Lyubomirsky meta-analysis

The authors indicate the purpose of the meta-analysis was to

Provide guidance to clinical practitioners by answering the following vital questions:

  • Do PPIs effectively enhance WB and ameliorate depression relative to control groups and, if so, with what magnitude?
  • Which variables—with respect to both the characteristics of the participants and the methodologies used—moderate the effectiveness of PPIs?

Similar to Bolier and colleagues, this meta-analysis focused primarily on interventions

aimed at increasing positive feelings, positive behaviors, or positive cognitions, as opposed to ameliorating pathology or fixing negative thoughts or maladaptive behavior patterns.

However, Sin and  Lyubomirsky’s  meta-analysis was less restrictive than Bolier et al in including interventions such as mindfulness, life review therapy, and forgiveness therapy.  These approaches were not developed explicitly within the positive psychology framework, even if they’ve been appropriated by positive psychology.

Positive psychologists have a bad habit of selectively claiming older interventions as their own, as they did with specific interventions from Aaron T Beck’s cognitive therapy for depression. We need to ask if what is considered effective in “positive psychology interventions” is new and distinctly positive psychology or if what is effective is mainly what is old and borrowed from elsewhere.

worse than itSin and  Lyubomirsky’s  meta-analysis also differs from Bolier et al in including nonrandomized trials, although that was nowhere explicitly acknowledged. Sin and  Lyubomirsky included studies in which what was done to student participants depended on what classrooms they were in, not on their individually being randomized. Lots of problems are introduced. For instance, any pre-existing differences associated with students being in particular classrooms are attributed to the participants having gotten PPIs. One should not combine studies with randomization by individual with studies in which interventions depended on being in particular classrooms – unless perhaps, a check is been made statistically of whether they can be considered in the same class of interventions.

[I know, I’m getting into technical details that casual readers of the meta-analysis might want to ignore, but the validity of authors’ conclusions depend on such details. Time and time again, we will see Sin and  Lyubomirsky not providing them.]

Using AMSTAR

If authors have done a meta-analysis and want to submit it to a journal like PLOS One, they must accompany their submission with a completed PRISMA checklist. That is to allow the editor and reviewers to determine whether you’ve provided basic details need for them and for future readers to evaluate for themselves what you actually did. PRISMA is a checklist about transparency in reporting, and does not evaluate the appropriateness or competency of what authors do. Authors can do meta-analysis badly and still score points on PRISMA because readers got the details have the details to see for themselves.

In contrast, AMSTAR evaluates both what is reported and what was done. So, authors don’t get points for reporting how  they did the meta-analyses inappropriately. And unlike a lot of checklists, the items of AMSTAR has been externally validated.

One final thing, before we start, is that you can add up the number of items for which he meta-analysis meets AMSTAR criteria, but a higher score does not indicate that one meta-analysis is better than another. That’s because some items are more important than others in terms of what the authors of meta-analysis have done and whether they’ve given enough details to readers. So, two meta-analyses may get the moderate score using AMSTAR, but may differ in whether the items which they didn’t meet are fatal to the meta-analyses being able to make a valid contribution to the literature.

Some of the problems of Sin and Lyubomirsky’s meta-analysis revealed by AMSTAR

5. Was a list of studies (included and excluded) provided?

While a list of it included studies was provided, there was no list of excluded studies. It is confusing, for instance, why Barbara Fredrickson et al.’s (2008) study of loving kindness meditation with null findings is never mentioned. The study is never identified as a randomized trial in the original article, but is subsequently cited by Barbara Fredrickson and many others within positive psychology as such. That’s a serious problem with the positive psychology literature: you never know when an experimental manipulation is a randomized trial or whether a study will be later cited as evidence of the effectiveness of positive psychology interventions.

Most of the rest of the psychological intervention literature adheres to CONSORT and one of the first requirements is that articles indicate either in their title or abstract that a randomized trial is being discussed. So, when it comes to a meta-analysis of PPIs, it, is particularly important to know what studies were excluded so that readers can judge how that might have affected the effect size that was obtained.

6. Were the characteristics of the included studies provided?

Sin and  Lyubomirsky’s Table 1 is incomplete and misleading in reporting characteristics of the included studies. It doesn’t indicate whether or not studies involved randomization. It is misleading in indicating that studies selected for depression, because it lumps together studies that used a self-report measure of mildly depressed students selected on the basis of self-report questionnaires who were not necessarily clinically depressed in with patients with more severe who meet criteria for formal clinical diagnoses.  The table indicates sample size, but it is not sample size that matters most, but the size of the smallest group, whether intervention or control. A number of positive psychology studies have a big imbalance in the size of the intervention versus the control group. So, there may be a seemingly sufficient number of participants in the study, but the size of the control group would make the study underpowered, with a suspicion that effect sizes were exaggerated.

7. Was the scientific quality of the included studies assessed and documented?

card_3_monkeys_see_no_evil_hear_no_evil_see-ra33d04ad8edf4f008e5230ac381ec8b0_xvuak_8byvr_512Sin and  Lyubomirsky made no effort to evaluate the quality of the included studies! That is a serious, fatal flaw.

On this basis alone, I would judge the meta-analyses either to have somehow evaded adequate peer review or that the editor of Journal of Clinical Psychology and reviewers of this particular paper were incompetent. Certainly this problem would not have been missed at PLOS One and I would hope that other journals were readily picked it up.

Bolier and colleagues explained their rating system and presented its application in evaluating the individual trials included in the meta-analysis. Readers had the opportunity to examine the rating system and its application. We were able to see that the studies evaluating positive psychology interventions tend to be of low quality. We can also see that the studies producing the largest effect sizes tend to be those of the lowest quality and small size.

I was somewhat critical of Bolier and colleagues in an earlier blog, because they liberalized the quality rating scales in order to even be able to conduct a meta-analysis. Nonetheless, they were transparent enough to allow me to make that independent evaluation. Because we have their readings available, we can extrapolate to the studies included in Sin and Lyubomirsky and be warned that this analysis is likely to provide an overly positive evaluation of PPIs. But we have to go outside of what in Sin and Lyubomirsky provides.

8. Was the scientific quality of the included studies used appropriately in formulating conclusions?

AMSTAR indicates

The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.

Sin and Lyubomirsky could not take quality into account in interpreting their meta-analysis because they did not rate quality. And so they didn’t allow readers a chance to use quality ratings to independently evaluate for themselves.  We are now further in the realm of fatal flaws. We know from other sources that much of the “evidence” for positive psychology interventions comes from small, underpowered studies likely to produce exaggerated estimates of effects. If this is not taken into account, conclusions are invalid.

9. Were the methods used to combine the findings of studies appropriate?

AMSTAR indicates

For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).

Sin and Lyubomirsky used an ordinary chi-squared test and found

the set of effect sizes was heterogeneous (c2(23) = 146:32, one-tailed p < 2 x 10-19), indicating that moderators may account for the variation in effect sizes.

[I’ll try to be as non-technical as possible in explaining a vital point. Do try to struggle through this, rather than simply accepting my conclusion this one statistic alone indicates a meta-analysis seriously in trouble. Think of it like a warning message on your car dashboard that should compel you to immediately drive to the side of the road, sure the engine, and call a tow truck].

Tests for heterogeneity basically tell you whether there are enough similarities between the effect sizes for individual studies to warrant combining them. A test for heterogeneity examines whether  the likelihood of too much variation can be rejected within certain limits. The Cochrane collaboration specifically warns against using an ordinary chi-squared test to test for heterogeneity, because it is low powered in situations where the studies vary greatly in sample size, with some of them being small sized. The Cochrane collaboration percent the number of alternatives derived from the chi-square which quantify inconsistency in effect sizes, such as Q and I2. Sin and Lyubomirsky didn’t use either of these, but instead use the standard chi-square, which is prone to miss problems in inconsistency between studies.

wowBut don’t worry, the results are so wild that serious problems are indicated. Look above to the significance of the chi-square that  Sin and Lyubomirsky report. Have you ever seen anything so highly significant : p<. 0000000000000000002?

Rather than panicking like they should have, Sin and Lyubomirsky simply proceeded to examine moderators of effect size and concluded that most of them did not matter for depressive symptoms, including initial depression status of participants and whether participants individually volunteered to be in the study, rather than being assigned because they were in a particular classroom.

Sin and Lyubomirsky’s moderator analyses are not much help in figuring out what was going wrong. If they had examined quality of the studies and sample size, they would’ve gotten on the right path. But they really don’t have many studies, and so they can’t carefully examine these factors. They are basically left with a very serious warning not to proceed, but do so anyway. Once again, where the hell was the editor and reviewers when they could have saved Sin and Lyubomirsky from embarrassing themselves and misleading readers?

10. Was the likelihood of publication bias assessed?

AMSTAR indicates

An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).

Bolier and colleagues provided a funnel plot of effect sizes in gave a clear indication that small studies with negative or null effects were somehow missing from the studies they had selected for the meta-analysis. Readers with some familiarity meta-analysis can interpret for themselves.

Sin and Lyubomirsky did no such thing. Instead they used Rosenthal’s failsafe N to give readers a false reassurance that hundreds of unpublished null studies of PPIs had to be lurking in drawers in order for their glowing assessment to be unseeded. Perhaps they should be forgiven for using failsafe N because they acknowledged Rosenthal has a consultant. But outside of psychology, experts on meta-analysis reject failsafe N as providing false reassurance.

11. Was the conflict of interest stated?

AMSTAR indicates

Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.

Lyubomirsky had already published The How of Happiness:  A New Approach to Getting the Life You Want. Its extravagant claims prompted a rare display of negativity from within the positive psychology community, an insightful negative review from the editor of Journal of Happiness Studies.

goods to declare_redConflict of interest in the authors – many of whom are also involved in the sale of positive psychology products – of the actual studies was ignored. We certainly know from analyses of studies conducted by pharmaceutical companies that the prospect of financial gain tends to lead to exaggerated effect sizes. Indeed, my colleagues and I were awarded the Bill Silverman award from the Cochrane collaboration for alerting them to its lack of attention to conflict of interest as a formal indicator of risk of bias. The collaboration is now in the process of revising their risk of bias tool to incorporate conflict of interest is a consideration.

Conclusion

omgSin and  Lyubomirsky provides a biased and seriously flawed assessment of  the efficacy of positive psychology interventions. Anyone who uncritical cites this paper is either naïve, careless, or bent on presenting a positive evaluation of positive psychology interventions in defiance of available evidence.  Whatever limitations I pointed out to the meta-analysis of Bolier and colleagues, I prefer it to this one. Yet just watch. I predict Sin and  Lyubomirsky will continue to be cited without acknolwedging Bolier and colleagues. If so, it will add to lots of other evidence of the confirmatory bias and lack of critical thinking within the positive psychology community.

Postscript

Presumably if you’re reading this postscript, you’ve read through my scathing analysis. But I noticed something was wrong in my initial 15 minute casual reading of the meta-analysis after completion of my blog post concerning  about Linda Bolier and colleagues. Among the things I noted was

  1. In their introduction, Sin and Lyubomirsky made positive statements about the efficacy of PPIs based on two underpowered, flawed studies (Fava et al., 2005; Seligman et al., 2006 ) that were outliers in Bolier and colleagues’ analyses. Citing these two studies as positive evidence suggests both prejudgment and a lack of application of critical skills that foreshadowed what followed.
  2. Their method section gave no indication of attention to quality of studies they were going to review. Bad, bad.
  3. Their method section declared that they would use one tailed tests for the significance of effect sizes. Since the 1950s, psychologists consistently rely on two-tailed tests. Unwary readers might except one tailed tests of p<.05 with a more customary two-tailed test would be p<.10  The same results. Reliance on one tailed test is almost always an indication of a bias towards finding significant results or attempts to mislead readers.
  4. The article included no forest plot that would’ve allowed a quick assessment of the distribution of effect sizes, whether they differed greatly, and whether some were outliers. As I analyzed in a earlier blog post, Bolier and colleagues’ inclusion of a forest plot, along with details in the table 1, allowed quick assessment that the overall effect size for positive psychology interventions was strongly influenced by outlier small studies of poor methodological quality.
  5. The wild chi-square concerning heterogeneity was glossed over.
  6. The resounding positive assessment of positive psychology interventions that open the discussion was subsequently contradicted by acknowledgment of some, but not the most serious limitations of the meta-analysis. Other conclusions in the discussion section were not based on any results of the meta-analysis.

I speak only for myself, and not for the journal PLOS One or the other Academic Editors.  I typically take 15 minutes or so to decide whether to send a paper out for review. My perusal of this one would have led to sending it back to the authors, requesting that they attempt to adhere to basic standards for conducting and reporting meta-analyses, before even considering resubmission to me. If they did resubmit, I would check again before even sending out to reviewers. We need to protect reviewers and subsequent readers from meta-analyses that are not only poorly conducted, but that lack transparency in to promoting interventions with undisclosed conflicts of interest.

 

 

Salvaging psychotherapy research: a manifesto

"Everybody has won, and all must have prizes." Chapter 3 of Lewis Carroll's Alice's Adventures in Wonderland
“Everybody has won, and all must have prizes.” Chapter 3 of Lewis Carroll’s Alice’s Adventures in Wonderland

NOTE: Additional documentation and supplementary links and commentary are available at What We Need to Do to Redeem Psychotherapy Research.

Fueling Change in Psychotherapy Research with Greater Scrutiny and Public Accountability

John Ioannidis’s declarations that most positive findings are false and that most breakthrough discoveries are exaggerated or fail to replicate apply have as much to with psychotherapy as they do with biomedicine.

BadPharma-Dec2012alltrials_basic_logo2We should take a few tips from Ben Goldacre’s Bad Pharma and clean up the psychotherapy literature, paralleling what is being accomplished with pharmaceutical trials. Sure, much remains to be done to ensure the quality and transparency of drug studies and to get all of the data into public view. But the psychotherapy literature lags far behind and is far less reliable than the pharmaceutical literature.

As it now stands, the psychotherapy literature does not provide a dependable guide to policy makers, clinicians, and consumers attempting to assess the relative costs and benefits of choosing a particular therapy over others. If such stakeholders uncritically depend upon the psychotherapy literature to evaluate the evidence-supported status of treatments, they will be confused or misled.

Psychotherapy research is scandalously bad.

Many RCTs are underpowered, yet consistently obtain positive results by redefining the primary outcomes after results are known. The typical RCT is a small, methodologically flawed study conducted by investigators with strong allegiances to one of the treatments being evaluated. Which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated.

Many positive findings are created by spinning a combination of confirmatory bias, flexible rules of design, data analysis and reporting and significance chasing.

Many studies considered positive, including those that become highly cited, are basicallycherrypicking null trials for which results for the primary outcome are ignored, and post-hoc analysis of secondary outcomes and subgroup analyses are emphasized. Spin starts in abstracts and results that are reported there are almost always positive.

noceboThe bulk of psychotherapy RCTs involve comparisons between a single active treatment and an inactive or neutral control group such as wait list, no treatment, or “routine care” which is typically left undefined but in which exposure to treatment of adequate quality and intensity is not assured. At best these studies can tell us whether a treatment is better than doing nothing at all or than patients expecting treatment because they have enrolled in a trial and not getting it (nocebo).

Meta-silliness?

Meta-analyses of psychotherapy often do not qualify conclusions by grade of evidence, ignore clinical and statistical heterogeneity, inadequately address investigator allegiance, downplay the domination by small trials with statistically improbable rates of positive findings, and ignore the extent to which positive effect sizes occur mainly in comparisons between active and inactive treatments.

Meta-analyses of psychotherapies are strongly biased toward concluding that treatments work, especially when conducted by those who have undeclared conflicts of interest, including developers and promoters of treatments that stand to gain financially from their branding as “evidence-supported.”

Overall, meta-analyses too heavily depend on underpowered, flawed studies conducted by investigators with strong allegiances to a particular treatment or to finding that psychotherapy is in general efficacious. When controls are introduced for risk of bias or investigator allegiance, affects greatly diminish or even disappear.

Conflicts of interest associated with authors having substantial financial benefits at stake are rarely disclosed in the studies that are reviewed or the meta-analyses themselves.

Designations of Treatments as Evidence-Supported

There are low thresholds for professional groups such as the American Psychological Association Division 12 or governmental organizations such as the US Substance Abuse and Mental Health Services Administration (SAMHSA) declaring treatments to be “evidence-supported.” Seldom are any treatments deemed ineffective or harmful by these groups.

Professional groups have conflicts of interest in wanting their members to be able to claim the treatments they practice are evidence-supported, while not wanting to restrict practitioner choice with labels of treatment as ineffective. Other sources of evaluation like SAMHSA depend heavily and uncritically on what promoters of particular psychotherapies submit in applications for “evidence supported status.”

"Everybody has won, and all must have prizes." Chapter 3 of Lewis Carroll's Alice's Adventures in Wonderland
“Everybody has won, and all must have prizes.” Chapter 3 of Lewis Carroll’s Alice’s Adventures in Wonderland

The possibility that there are no consistent differences among standardized, credible treatments across clinical problems is routinely ridiculed as the “dodo bird verdict” and rejected without systematic consideration of the literature for particular clinical problems. Yes, some studies find differences between two active, credible treatments in the absence of clear investigator allegiance, but these are unusual.

The Scam of Continuing Education Credit

thought field therapyRequirements that therapists obtain continuing education credit are intended to protect consumers from outdated, ineffective treatments. There is inadequate oversight of the scientific quality of what is offered. Bogus treatments are promoted with pseudoscientific claims. Organizations like the American Psychological Association (APA) prohibit groups of their members making statements protesting the quality of what is being offered and APA continues to allow CE for bogus and unproven treatments like thought field therapy and somatic experiencing.

Providing opportunities for continuing education credit is a lucrative business for both accrediting agencies and sponsors. In the competitive world of workshops and trainings, entertainment value trumps evidence. Training in delivery of manualized evidence-supported treatments has little appeal when alternative trainings emphasize patient testimonials and dramatic displays of sudden therapeutic gain in carefully edited videotapes, often with actors rather than actual patients.

Branding treatments as evidence supported is used to advertise workshops and trainings in which the particular crowd-pleasing interventions that are presented are not evidence supported.

Those who attend Acceptance and Commitment (ACT) workshops may see videotapes where the presenter cries with patients, recalling his own childhood.  They should ask themselves: “Entertaining, moving perhaps, but is this an evidence supported technique?

Psychotherapies with some support from evidence are advocated for conditions for which there is no evidence for their efficacy. What would be disallowed as “off label applications” for pharmaceuticals is routinely accepted in psychotherapy workshops.

We Know We Can Do Better

Psychotherapy research has achieved considerable sophistication in design, analyses, and strategies to compensate for missing data and elucidate mechanisms of change.

Psychotherapy research lags behind pharmaceutical research, but nonetheless hasCONSORT recommendations and requirements for trial preregistration, including specification of primary outcomes; completion of CONSORT checklists to ensure basic details of trials are reported; preregistration of meta-analyses and systematic reviews at sites like PROSPERO, as well as completion of the PRISMA checklist for adequacy of reporting of meta-analyses and systematic reviews.

nothing_to_declare1Declarations of conflicts of interest are rare and exposure of authors who routinely failed to disclose conflicts of interest is even rarer.

Departures from preregistered protocols in published reports of RCTs are common, and there is little checking of discrepancies in abstracts from results that were actually obtained or promised in preregistration by authors.  There is  inconsistent and incomplete adherence to these requirements. There is little likelihood that noncompliant authors will held accountable and  high incentive to report positive findings in order for a study is to be published in a prestigious journal such as the APA’s Journal of Consulting and Clinical Psychology (JCCP). Examining the abstracts of papers published in JCCP gives the impression that trials are almost always positive, even when seriously underpowered.

Psychotherapy research is conducted and evaluated within a club, a mutual admiration society in which members are careful not to disparage others’ results or enforce standards that they themselves might want relaxed when it comes to publishing their own research. There are rivalries between tribes like psychodynamic therapy and cognitive behavior therapy, but suppression of criticism within the tribes and in strenuous efforts to create the appearance that members of the tribes only do what works.

Reform from Without

Journals and their editors have often resisted changes such as adoption of CONSORT, structured abstracts, and preregistration of trials. The Communications and Publications Baord of the American Psychological Association made APA one of the last major holdout publishers to endorse CONSORT and initially provided an escape clause that CONSORT only applied to articles explicitly labeled as a randomized trial. The board also blocked a push by the Editor of Health Psychology for structured abstracts that reliably reported details needed to evaluate what had actually been done in trials and the results were obtained. In both instances, the committee was most concerned about the implications for the major outlet for clinical trials among its journals, Journal of Consulting and Clinical Psychology.

Although generally not an outlet for psychotherapy trials, the journals of the Associationvagal tone for Psychological Science (APS) show signs of even being worse offenders in terms of ignoring standards and commitment to confirmatory bias. For instance, it takes a reader a great deal of probing to discover that a high-profile paper of Barbara Fredrickson in Psychological Science was actually a randomized trial and further detective work to discover that it was a null trial. There is no sign that a CONSORT checklist was ever filed the study. And despite Frederickson using the spun Psychological Science trial report to promote her workshops, there is no conflict of interest declared.

The new APS Clinical Psychological Science show signs of even more selective publication and confirmatory bias than APA journals, producing newsworthy articles, to the exclusion of null and modest findings. There will undoubtedly be a struggle between APS and APA clinical journals for top position in the hierarchy publishing only papers that that are attention grabbing, even if flawed, while leaving to other journals that are considered less prestigious, the  publishing of negative trials and failed replications.

If there is to be reform, pressures must come from outside the field of psychotherapy, from those without vested interest in promoting particular treatments or the treatments offered by members of professional organizations. Pressures must come from skeptical external review by consumers and policymakers equipped to understand the games that psychotherapy researchers play in creating the appearance that all treatments work, but the dodo bird is dead.

Specific journals are reluctant to publish criticism of their publishing practices.  If we at first cannot gain publication in the offending journals of our concerns, we can rely on blogs and Twitter to call out editors and demand explanations of lapses in peer review and upholding of quality.

We need to raise stakeholders’ levels of skepticism, disseminate critical appraisal skills widely and provide for their application in evaluating exaggerated claim and methodological flaws in articles published in prestigious, high impact journals. Bad science in the evaluation of psychotherapy must be recognized as the current norm, not an anomaly.

We could get far by enforcing rules that we already have.

We need to continually expose journals’ failures to enforce rules about preregistration, disclosure of conflicts of interest, and discrepancies between published clinical trials and their preregistration.

There are too many blatant examples of investigators failing to deliver what they promised in the preregistration, registering after trials have started to accrue patients, and reviewers apparently not ever checking if the primary outcomes and analyses promised in trial registration are actually delivered.

Editors should

  • Require an explicit statement of whether the trial has been registered and where.
  • Insist that reviewers consult trial registration, including modifications, and comment on any deviation.
  • Explicitly label registration dated after patient accrual has started.

spin noCONSORT for abstracts should be disseminated and enforced. A lot of hype and misrepresentation in the media starts with authors’ own spin in the abstract . Editors should insist that main analyses for the preregistered primary outcome be presented in the abstract and highlighted in any interpretation of results.

No more should underpowered in exploratory pilot feasibility studies be passed off as RCTs when they achieve positive results. An orderly sequence of treatment development should occur before conducting what are essentially Phase 3 randomized trials.

Here as elsewhere in reforming psychotherapy research, there is something to be learned from drug trials. A process of intervention development ought to include establishing the feasibility and basic parameters of clinical trials needs to proceed phase 3 randomized trials, but cannot be expected to become phase 3 or to provide effect sizes for the purposes of demonstrating efficacy or comparison to other treatments.

Use of wait list, no treatment, and ill-defined routine care should be discouraged as control groups. For clinical conditions for which there are well-established treatments, head-to-head comparisons should be conducted, as well as including control groups that might elucidate mechanism. A key example of the latter would be structured, supportive therapy that controls for attention and positive expectation. There is little to be gained by further accumulation of studies in which the efficacy of the preferred treatment is assured by comparison to a lamed control group that lacks any conceivable element of affective care.

Evaluations of treatment effects should take into account prior probabilities suggested by the larger literature concerning comparisons between two active, credible treatments. The well-studied treatment of depression literature suggests some parameters: effect size is associated with a treatment are greatly reduced when comparisons are restricted to credible, active treatments; better quality studies; and controls are introduced for investigator allegiance. It is unlikely that initial claims about a breakthrough treatment exceeding the efficacy of existing treatments will be sustained in larger studies conducted by investigators independent of developers and promoters.

Disclosure of conflict of interest should be enforced and nondisclosure identified in correction statements and further penalized. Investigator allegiance should be considered in assessing risk of bias.

Developers of treatments and persons with significant financial gain from a treatment being declared “evidence-supported” should be discouraged from conducting meta-analyses of their own treatments.

Trials should be conducted with sample sizes adequate to detect at least moderate effects. When positive findings from underpowered studies are published,  readers scrutinize the literature for similarly underpowered trials that achieve similarly positive effects.

Meta-analyses of psychotherapy should incorporate p-hacking techniques to evaluate the likelihood that pattern of significant findings exceeds likely probability.

Adverse events and harms should routinely be reported, including lost opportunity costs such as failure to obtain more effective treatment.

We need to shift the culture of doing and reporting psychotherapy research. We need to shift from praising exaggerated claims about treatment and faux evidence generated to  promote opportunities for therapists and their professional organizations.  Instead, it is much more praiseworthy to provide  robust, sustainable, even if more modest claims and to call out hype and hokum in ways that preserve the credibility of psychotherapy.

criticism
Click to Enlarge

The alternative is to continue protecting psychotherapy research from stringent criticism and enforcement of standards for conducting and reporting research. We can simply allow the branding of psychotherapies as “evidence supported” to fall into appropriate disrepute.