Peter was exceptionally prepared, had a definite point of view, but was open to what I said. In the end seem to be persuaded by me on a number of points. The resulting article in Inverse faithfully conveyed my perspective and juxtaposed quotes from me with those from an author of the Psych Science piece in a kind of debate.
My point of view
When evaluating an article about mindfulness in a peer-reviewed journal, we need to take into account that authors may not necessarily be striving to do the best science, but to maximally benefit their particular brand of mindfulness, their products, or the settings in which they operate. Many studies of mindfulness are a little more than infomercials, weak research intended only to get mindfulness promoters’ advertisement of themselves into print or to allow the labeling of claims as “peer-reviewed”. Caveat Lector.
We cannot assume authors of mindfulness studies are striving to do the best possible science, including being prepared for the possibility of being proven incorrect by their results. Rather they may be simply try to get the strongest possible claims through peer review, ignoring best research practices and best publication practices.
There was much from the author of the Psych Science article with which I would agree:
“In my opinion, there are far too many organizations, companies, and therapists moving forward with the implementation of ‘mindfulness-based’ treatments, apps, et cetera before the research can actually tell us whether it actually works, and what the risk-reward ratio is,” corresponding author and University of Melbourne research fellow Nicholas Van Dam, Ph.D. tells Inverse.
“People are spending a lot of money and time learning to meditate, listening to guest speakers about corporate integration of mindfulness, and watching TED talks about how mindfulness is going to supercharge their brain and help them live longer. Best case scenario, some of the advertising is true. Worst case scenario: very little to none of the advertising is true and people may actually get hurt (e.g., experience serious adverse effects).”
But there were some statements that renewed the discomfort and disappointment I experienced when I read the original article in Psychological Science:
“I think the biggest concern among my co-authors and I is that people will give up on mindfulness and/or meditation because they try it and it doesn’t work as promised,” says Van Dam.
“There may really be something to mindfulness, but it will be hard for us to find out if everyone gives up before we’ve even started to explore its best potential uses.”
So, how long before we “give up” on thousands of studies pouring out of an industry? In the meantime, should consumers act on what seem to be extravagant claims?
The Inverse article segued into some quotes from me after delivering another statement from the author which I could agree:
The authors of the study make their attitudes clear when it comes to the current state of the mindfulness industry: “Misinformation and poor methodology associated with past studies of mindfulness may lead public consumers to be harmed, misled, and disappointed,” they write. And while this comes off as unequivocal, some think they don’t go far enough in calling out specific instances of quackery.
“It’s not bare-knuckle, that’s for sure. I’m sure it got watered down in the review process,” James Coyne, Ph.D., an outspoken psychologist who’s extensively criticized the mindfulness industry, tells Inverse.
Coyne agrees with the conceptual issues outlined in the paper, specifically the fact that many mindfulness therapies are based on science that doesn’t really prove their efficacy, as well as the fact that researchers with copyrights on mindfulness therapies have financial conflicts of interest that could influence their research. But he thinks the authors are too concerned with tone policing.
“I do appreciate that they acknowledged other views, but they kept out anybody who would have challenged their perspective,” he says.
Regarding Coyne’s criticism about calling out individuals, Van Dam says the authors avoided doing that so as not to alienate people and stifle dialogue.
“I honestly don’t think that my providing a list of ‘quacks’ would stop people from listening to them,” says Van Dam. “Moreover, I suspect my doing so would damage the possibility of having a real conversation with them and the people that have been charmed by them.” If you need any evidence of this, look at David “Avocado” Wolfe, whose notoriety as a quack seems to make him even more popular as a victim of “the establishment.” So yes, this paper may not go so far as some would like, but it is a first step toward drawing attention to the often flawed science underlying mindfulness therapies.
To whom is the dialogue directed about unwarranted claims from the mindfulness industry?
As one of the authors of an article claiming to be an authoritative review from a group of psychologists with diverse expertise, Van Dam says he is speaking to consumers. Why won’t he and his co-authors provide citations and name names so that readers can evaluate for themselves what they are being told? Is the risk of reputational damage and embarrassment to the psychologists so great as to cause Van Dam to protect them versus protecting consumers from the exaggerated and even fraudulent claims of psychologists hawking their products branded as ‘peer-reviewed psychological and brain science’.
I use the term ‘quack’ sparingly outside of discussing unproven and unlikely-to-be-proven products supposed to promote physical health and well-being or to prevent or cure disease and distress.
I think Harvard psychologist Ellen Langer deserves the term “quack” for her selling of expensive trips to spas in Mexico to women with advanced cancer so that they can change their mind set to reverse the course of their disease. Strong evidence, please! Given that this self-proclaimed mother of mindfulness gets her claims promoted through the Association for Psychological Science website, I think it particularly appropriate for Van Dam and his coauthors to name her in their publication in an APS journal. Were they censored or only censoring themselves?
Let’s put aside psychologists who can be readily named as quacks. How about Van Dam and co-authors naming names of psychologists claiming to alter the brains and immune systems of cancer patients with mindfulness practices so that they improve their physical health and fight cancer, not just cope better with a life-altering disease?
I simply don’t buy Van Dam’s suggestion that to name names promotes quackery any more than I believe exposing anti-vaxxers promotes the anti-vaccine cause.
Is Van Dam only engaged in a polite discussion with fellow psychologists that needs to be strictly tone-policed to avoid offense or is he trying to reach, educate, and protect consumers as citizen scientists looking after their health and well-being? Maybe that is where we parted ways.
The SMILE trial holds many anomalies and leaves us with more questions than answers.
A guest post by Dr. Keith Geraghty
Honorary Research Fellow at the University of Manchester, Centre for Primary Care, Division of Population Health and Health Services Research
The Advertising Standards Authority previously ruled that the Lightning Process (LP) should not be advertised as a treatment for CFS/ME. So how then, did LP end up getting tested as a treatment in a clinical trial involving adolescents with CFS/ME? Publication of the trial sparked controversy after it was claimed that LP, in addition to specialist medical care, out-performed specialist medical care alone. This blog attempts to shed light on just how a quack alternative online teaching programme, ended up in a costly clinical trial and discusses how the SMILE trial exemplifies all that is wrong with contemporary psycho-behavioural trials; that are clearly vulnerable to bias and spin.
The SMILE trial compared LP plus specialist medical care (SMC) to SMC alone (commonly a mix of cognitive behavioural therapy and graded exercise therapy). LP is a trademarked training programme created by Phil Parker from osteopathy, life coaching and neuro-linguistic programming. It costs over £600 and after assessment and telephone briefings, clients attend group sessions over three days. While there is much secrecy about what exactly these sessions involve, a cursory search online shows us that past clients were told to ‘block out all negative thoughts’ and to consider themselves well, not sick. A person with an illness is said to be ‘doing illness’ (LP spells doing as duing, to signify LP means more than just doing). LP appears to attempt to get a participant to ‘stop doing’ by blocking negative thoughts and making positive affirmations.
Leading psychologists have raised concerns. Professor James Coyne called LP “quackery” and said neuro-linguistic programming “…has been thoroughly debunked for its pseudoscience”. In an expert reaction to the SMILE trial for the Science Media Centre, Professor Dorothy Bishop of Oxford University stated: “the intervention that was assessed is commercial and associated with a number of warning signs. The Lightning Process appears based on neuro-linguistic programming, which, despite its scientific-sounding name, has long been recognised as pseudoscience“.
The first and most obvious question is why did the SMILE trial take place? Trial lead Professor Esther Crawley, who runs an NHS paediatric CFS/ME clinic, says she undertook the trial after many of her patients and their parents asked about LP. Patients with CFS/ME often report a lack of support from doctors and health care providers and some turn to the internet seeking help; some are drawn to try alternative approaches, such as LP. But is that justification enough for spending over £160,000 on testing LP on children? I think not. Should we test every quack approach peddled online: herbs, crystals, spiritual healing – particularly when funding in CFS/ME research is so limited currently? There must also be a compelling scientific plausibility to justify a trial. Simply wanting to see if something helps, does not merit adequate justification.
The SMILE trial has a fundamental design flaw. The trial compared specialist medical care alone (SMC) against SMC plus LP (SMC&LP). To the novice observer this may appear acceptable, but clinical trials are used to test item x against item y. For example, imagine trying to see which drug works better, drug A or drug B, you would not give drug A to one group and both drugs A and B to another group – yet this is exactly what happened in SMILE. In seeking to test LP, Prof. Crawley gave LP&SMC together – rendering any findings from this trial arm as pretty meaningless. The proper controls were missing. In addition, a trial of this magnitude would normally have a third arm, a do-nothing or usual care group, or another talk therapy control – yet such controls were missing.
Next we turn to the trial’s primary outcome measures. These were subjective self-reports of changes in physical function (using SF-36). Secondary outcomes were quality of life, anxiety and school attendance. These outcomes were assessed at 6 months with a follow-up at 12 months. It is reported that SMC+LP outperformed SMC alone on these measures at 6 and maintained at 12 months. However, there is no way to determine whether any claimed improvements came from LP alone, given LP was mixed with SMC. We could assume that LP+SMC meant more support, positive expectations and increased contact time. Here we see how farcical SMILE is as a trial. We have one group getting two treatments (possible double help) and one group getting one treatment (possible half help).
Of particular concern is how few of the available patients enrolled in and completed the trial: 637 children aged 12-18 attended screening or appointment at a specialist CFS/ME clinic; fewer than half (310) were deemed eligible; just 136 consented to receiving trial information and then only 100 were randomised (less than 1/3 of the eligible group). 49 had SMC and 51 had SMC+LP. Overall 207 patients either declined to participate or were not sufficiently interested to return the consent form. Were patients self-selecting? Were those less likely to respond to nonspecific factors choosing not to participate, and were we left with a group interested in LP – give Prof. Crawley said many patients asked about LP?
As the trial progressed, patients dropped out: of the 51 participants allocated to SMC+LP, only 39 received full SMC+LP. At 6-month assessment just 38 of the 48 allocated to SMC and 46 of the 51 in SMC+LP are fully recorded. At 12 months there are further losses to follow-up in both cohorts: 14% in LP and 24% in SMC. The reasons for participant loss are not fully clear, though the paper reports 5 adverse events (3 in the SMC+LP arm). It is worth noting that physical function at 6 months deteriorated in 9 participants (roughly 10% overall), 8 in the SMC arm, with 5 participants having a fall of ≤10 on the SF-36 physical function subscale (deemed not clinically important). Again questions are raised as to whether some degree of self-selection took place? The fact 3 of the participants assigned to SMC alone appear to have received LP reflects possible contamination of research cohorts that are meant to be kept apart.
Seven problems stand out in SMILE:
The use of the SF-36 physical function test was questionable. This self-report instrument is not designed or adequately validated for use in children.
Many of the participants appear to have had symptoms of anxiety and depression at the start of the trial. SMILE defined anxiety and depression as a score of ≥12 out of 22 on the self-report HADS. Usually a score of 8 or above is considered positive for mild anxiety and depression, and of above 12 for moderate anxiety and depression. The average mean HADS score at trial entry was 9.6 (meaning using standard cut-offs, most participants met a criteria for anxiety and depression). On the Spence Anxiety Scale (SCAS) the average entry score was 35, with above 33 indicative of anxiety in this age group. Such mild to moderate elevations in depression and anxiety symptoms are very responsive to nonspecific support.
There is an anomaly in the data on improvement: in the physical function test, the average base level of the children at entry into the trial was 54.5 (n=99), considered severely physically impaired. Only 52.5% of participants had been able to attend at least 3 days of school in the week prior to their entry into the study. Yet those assigned to SMC+LP were well enough to attend 3 consecutive days of sessions lasting 4 hours. The reports of severe physical disablement do not match the capabilities of those who participated in the course. Were the children’s self-reported poor physical abilities exaggerated to justify enrolment in the trial? Were the children’s elevated depression and anxiety symptoms responsive to the nonspecific elements in extra time of being assigned to LP plus standard care?
If the subjective self-report is accepted as a recovery criterion, in LP, just 12 hours of talk therapy, added to SMC would cure the majority of children with CFS. Such an effect would be astonishing, if true. In randomized controlled trials in adults with CFS/ME, such dramatic restoration of physical function (a wholesale return to near normal) is universally not seen. The SMILE Trial is clearly unbelievable.
SMILE’s reliance on the broad NICE criteria means there is a clear risk patients were included in the trial who would not have met stricter definitions of the illness. There is a growing concern that loose entry criteria in clinical trials in ME/CFS allow enrolments of many participants who do not in fact have ME/CFS. A detailed study of CFS prevalence found many children are wrongly diagnosed with CFS, when they may just be suffering from general fatigue and/or mental health complaints (Jones et al., 2004). SMILE uses NICE guidelines to diagnose CFS: fatigue must be present for at least 3 months with one or more of four other symptoms, which can be as general as sleep disturbance. In contrast, Jones et al. showed that using the Centre for Disease Control criteria of at least four specific symptoms alongside detailed clinical examination, many children believed to have CFS are diagnosed with other exclusionary disorders, often general fatigue, mental health complaints, drug and alcohol abuse or eating disorders (that are often not readily disclosed to parents or doctors).
LP involves attempting to coerce clients into thinking that they have control over their symptoms and to block out symptoms. This alone would distort any response by a participant in a follow-on questionnaire about symptoms.
LP was delivered by people from the Lightning Process Company. Phil Parker and his employees held a clear financial interest in a positive outcome in SMILE. Such an obvious conflict of interest is hard to disentangle and totally nullifies any outcomes from this trial.
The SMILE trial holds many anomalies and leaves us with more questions than answers.
It is not clear whether the children enrolled in the trial, diagnosed with CFS using NICE criteria, might of been deemed non-CFS using more stringent clinical screening (e.g. CDC or IOM Criteria).
There is no way of determining whether any effect following SMC+LP was anything more than the result of non-specific factors, psychological tricks and persuasion.
The fact LP+SMC appears to have cured the majority of participants with as little as 12 hours talk therapy is a big flashing red light that this trial is clearly fundamentally flawed.
There is a very real danger of promoting LP as a treatment for CFS/ME: The UK ME Association conducted a survey of members (4,217 members) and found that 20% of those who tried LP reported feeling worse (7.9% slightly worse,12.9% much worse). SMILE cannot be, and should not be, used to justify LP as a treatment for CFS/ME.
The Lightning Process has no scientific credibility and this trial highlights a fundamental flaw in contemporary clinical trials: they are susceptible to suggestion, bias and spin. The SMILE trial appears to draw paediatric CFS/ME clinical care for children into a swamp of pseudoscience and mysticism. This is a clear step backward. There is little to smile about after reviewing the SMILE trial.
Dr. Geraghty is currently an Honorary Research Fellow within the Centre for Primary Care, Division of Population Health and Health Services Research at the University of Manchester. He previously worked as a research associate at Cardiff University and Imperial College London. He left a career in clinical medicine after becoming ill with ME/CFS. The main themes of his work are doctor-patient relationships, medically unexplained symptoms, quality and safety in health care delivery, physician well-being and evidence-based medicine. He has a special interest in medically unexplained symptoms (MUS), and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome.
1. Crawley, E., et al., Chronic disabling fatigue at age 13 and association with family adversity. Pediatrics, 2012. 130(1): p. e71-e79.
2. Crawley, E.M., et al., Clinical and cost-effectiveness of the Lightning Process in addition to specialist medical care for paediatric chronic fatigue syndrome: randomised controlled trial. Archives of Disease in Childhood, 2017.
3. Jones, J.F., et al., Chronic fatigue syndrome and other fatiguing illnesses in adolescents: a population-based study. Journal of Adolescent Health, 2004. 35(1): p. 34-40.
The tour of the sausage factory is starting, here’s your brochure telling you’ll see.
A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.
Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release. Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:
“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”
“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”
[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]
Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.
Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.
This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.
Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments. Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:
More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.
Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review. But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.
Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]
As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA. We can be no more confident with what the trials provide than we would be without them ever having been done.
I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.
I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.
Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.
The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.
That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored. If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing. Only a handful of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.
One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?
For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.
Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.
Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.
If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:
In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).
Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.
Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.
Some general rules:
Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.
Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.
When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!
Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.
Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up. If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.
Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.
What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.
For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.
I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.
Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.
In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants. A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.
Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.
Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).
Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.
Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.
But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.
Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.
Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.” Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.
Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:
One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.
How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?
The trial was registered long after patient recruitment had started and the trial protocol can be found here
[Aside: What is the value of registering a trial long after recruitment commenced? Do journal articles have a responsibility to acknowledge a link they publish for trial registration is for what occurred after the trial commenced? Is trial registration another ritual like acupuncture?]
Uncritical reports of the results of the trial as interpreted by the authors echoed through both the lay and physician-aimed media.
Coverage by Reuters was somewhat more interesting than the rest. The trial authors’ claim that acupuncture for preventing migraines was ready for prime time was paired with some reservations expressed in the accompanying editorial.
“Placebo response is strong in migraine treatment studies, and it is possible that the Deqi sensation . . . that was elicited in the true acupuncture group could have led to a higher degree of placebo response because there was no attempt made to elicit the Deqi sensation in the sham acupuncture group,” Dr. Amy Gelfand writes in an accompanying editorial.
Come on, Dr. Gelfand, if you checked the article, you would have that Deqi is not measured. If you checked the literature, even proponents concede that Deqi remains a vague, highly subjective judgment in, this case, being made by an unblinded acupuncturist. Basically the acupuncturist persisted in whatever was being done until there was indication that a sensation of soreness, numbness, distention, or radiating seemed to be elicited from the patient. What part of a subjective response to acupuncture, with or without Deqi, would you consider NOT a placebo response?
Dr. Gelfand also revealed some reasons why she may bother to write an editorial for a treatment with an incoherent and implausible nonscientific rationale.
“When I’m a researcher, placebo response is kind of a troublesome thing, because it makes it difficult to separate signal from noise,” she said. But when she’s thinking as a doctor about the patient in front of her, placebo response is welcome, Gelfand said.
“You know, what I really want is my patient to feel better, and to be improved and not be in pain. So, as long as something is safe, even if it’s working through a placebo mechanism, it may still be something that some patients might want to use,” she said.
Let’s contemplate the implications of this. This editorial in JAMA Internal Medicine accompanies an article in which the trial author suggests acupuncture is ready to become a standard treatment for migraine. There is nothing in the article to which suggests that the unscientific basis of acupuncture had been addressed, only that it might have achieved a placebo response. Is Dr. Gelfand suggesting that would be sufficient, although there are some problems in the trial. What if that became the standard for recommending medications and medical procedures?
With increasing success in getting acupuncture and other now-called “integrative medicine” approaches ensconced in cancer centers and reimbursed by insurance, will be facing again and again some of the issues that started this blog post. Is acupuncture not doing obvious from a reason for reimbursing it? Trials like this one can be cited in support for reimbursement.
The JAMA: Internal Medicine report of an RCT of acupuncture for preventing migraines
Participants were randomly assigned to one of three groups: true acupuncture, sham acupuncture, or a waiting-list control group.
Participants in the true acupuncture and sham acupuncture groups received treatment 5 days per week for 4 weeks for a total of 20 sessions.
Participants in the waiting-list group did not receive acupuncture but were informed that 20 sessions of acupuncture would be provided free of charge at the end of the trial.
As the editorial comment noted, this is incredibly intensive treatment that burdens patients coming in five days a week for treatment for four weeks. Yet the effects were quite modest in terms of number of migraine attacks, even if statistically significant:
The mean (SD) change in frequency of migraine attacks differed significantly among the 3 groups at 16 weeks after randomization (P < .001); the mean (SD) frequency of attacks decreased in the true acupuncture group by 3.2 (2.1), in the sham acupuncture group by 2.1 (2.5), and the waiting-list group by 1.4 (2.5); a greater reduction was observed in the true acupuncture than in the sham acupuncture group (difference of 1.1 attacks; 95%CI, 0.4-1.9; P = .002) and in the true acupuncture vs waiting-list group (difference of 1.8 attacks; 95%CI, 1.1-2.5; P < .001). Sham acupuncture was not statistically different from the waiting-list group (difference of 0.7 attacks; 95%CI, −0.1 to 1.4; P = .07).
There were no group by time differences in use of medication for migraine. Receiving “true” versus sham acupuncture did not matter.
Four acupoints were used per treatment. All patients received acupuncture on 2 obligatory points, including GB20 and GB8. The 2 other points were chosen according to the syndrome differentiation of meridians in the headache region. The potential acupoints included SJ5, GB34, BL60, SI3, LI4, ST44, LR3, and GB40.20. The use of additional acupoints other than the prescribed ones was not allowed.We chose the prescriptions as a result of a systematic review of ancient and modern literature,22,23 consensus meetings with clinical experts, and experience from our previous study.
Note that the “headache region” is not the region of the head where headaches occur, the selection of which there is no scientific basis. Since when does such a stir fry of ancient and contemporary wisdom, consensus meetings with experts, and the clinical experience of the investigators become the basis of the mechanism justified for study in a clinical trial published in a prestigious American medical journal?
What was sham about the sham acupuncture (SA) treatment?
The number of needles, electric stimulation, and duration of treatment in the SA group were identical in the TA group except that an attempt was not made to induce the Deqi sensation. Four nonpoints were chosen according to our previous studies.
From the trial protocol, we learn that the effort to induce the Deqi sensation involves the acupuncturist twirling and rotating the needles.
In a manner that can easily escape notice, the authors indicate that they acupuncture was administered by electro stimulation.
In the methods section, they abruptly state:
Electrostimulation generates an analgesic effect, as manual acupuncture does.21
I wonder if the reviewers or the editorialist checked this reference. It is to an article that provides the insight that “meridians” -the 365 designated acupuncture points- are identified on a particular patient by
feeling for 12 organ-specific pulses located on the wrists and with cosmological interpretations including a representation of five elements: wood, water, metal, earth, and fire.
The authors further state that they undertook a program of research to counter the perception in the United States in the 1970s that acupuncture was quackery and even “Oriental hypnosis.” Their article describes some of the experiments they conducted, including one in which the benefits of a rabbit having received finger-pressure acupuncture was transferred to another via a transfusion of cerebrospinal fluid.
In discussing the results of the present study in JAMA Internal Medicine, the authors again comment in passing:
We added electrostimulation to manual acupuncture because manual acupuncture requires more time until it reaches a similar analgesic effect as electrical stimulation.27 Previous studies have reported that electrostimulation is better than manual acupuncture in relieving pain27-30 and could induce a longer lasting effect.28
The citations are to methodologically poor laboratory studies in which dramatic results are often obtained with very small cell size (n= 10).
Can we dispense with the myth that the acupuncture provided in this study is an extension of traditional Chinese needle therapy?
It is high time that we dispense with the notion that acupuncture applied to migraines and other ailments represents a traditional Chinese medicine that is therefore not subject to any effort to critique its plausibility and status as a science-based treatment. If we dispense with that idea, we still have to confront how unscientific and nonsensical the rationale is for the highly ritualized treatment provided in this study.
reformed and “sanitized” acupuncture and the makeshift theoretical framework of Maoist China that have flourished in the West as “Traditional,” “Chinese,” “Oriental,” and most recently as “Asian” medicine.
Kavoussi, who studied to become an acupuncturist, notes that:
Traditional theories for selecting points and means of stimulation are not based on an empirical rationale, but on ancient cosmology, astrology and mythology. These theories significantly resemble those that underlined European and Islamic astrological medicine and bloodletting in the Middle-Ages. In addition, the alleged predominance of acupuncture amongst the scholarly medical traditions of China is not supported by evidence, given that for most of China’s long medical history, needling, bloodletting and cautery were largely practiced by itinerant and illiterate folk-healers, and frowned upon by the learned physicians who favored the use of pharmacopoeia.
In the early 1930s a Chinese pediatrician by the name of Cheng Dan’an (承淡安, 1899-1957) proposed that needling therapy should be resurrected because its actions could potentially be explained by neurology. He therefore repositioned the points towards nerve pathways and away from blood vessels-where they were previously used for bloodletting. His reform also included replacing coarse needles with the filiform ones in use today.38 Reformed acupuncture gained further interest through the revolutionary committees in the People’s Republic of China in the 1950s and 1960s along with a careful selection of other traditional, folkloric and empirical modalities that were added to scientific medicine to create a makeshift medical system that could meet the dire public health and political needs of Maoist China while fitting the principles of Marxist dialectics. In deconstructing the events of that period, Kim Taylor in her remarkable book on Chinese medicine in early communist China, explains that this makeshift system has achieved the scale of promotion it did because it fitted in, sometimes in an almost accidental fashion, with the ideals of the Communist Revolution. As a result, by the 1960s acupuncture had passed from a marginal practice to an essential and high-profile part of the national health-care system under the Chinese Communist Party, who, as Kim Taylor argues, had laid the foundation for the institutionalized and standardized format of modern Chinese medicine and acupuncture found in China and abroad today.39 This modern construct was also a part of the training of the “barefoot doctors,” meaning peasants with an intensive three- to six-month medical and paramedical training, who worked in rural areas during the nationwide healthcare disarray of the Cultural Revolution era.40 They provided basic health care, immunizations, birth control and health education, and organized sanitation campaigns. Chairman Mao believed, however, that ancient natural philosophies that underlined these therapies represented a spontaneous and naive dialectical worldview based on social and historical conditions of their time and should be replaced by modern science.41 It is also reported that he did not use acupuncture and Chinese medicine for his own ailments.42
What is a suitable comparison/control group for a theatrical administration of a placebo?
A randomized double-blind crossover pilot study published in NEJM highlight some of the problems arising from poorly chosen control groups. The study compared an inhaled albuterol bronchodilator to one of three control conditions placebo inhaler, sham acupuncture, or no intervention. Subjective self-report measures of perceived improvement in asthma symptoms and perceived credibility of the treatments revealed only that the no-intervention condition was inferior to the active treatment of inhaled albuterol and the two placebo conditions, but no difference was found between the active treatment and the placebo conditions. However, strong differences were found between the active treatment in the three comparison/control conditions in an objective measure of physiological responses – improvement in forced expiratory volume (FEV1), measured with spirometry.
One take away lesson is we should be careful about accepting subjective self-report measures when objective measures are available. One objective measure in the present study was the taking of medication for migraines and there were no differences between groups. This point is missed in both the target article in JAMA Internal Medicine and the accompanying editorial.
The editorial does comment on the acupuncturists being unblinded – they clearly knew when they are providing the preferred “true” acupuncture and when they were providing sham. They had some instructions to avoid creating a desqi sensation in the sham group, but some latitude in working till it was achieved in the “true” group. Unblinded treatment providers are always a serious risk of bias in clinical trials, but we here we have a trial where the primary outcomes are subjective, the scientific status of desqi is dubious, and the providers might be seen as highly motivated to promote the “true” treatment.
I’m not sure why the editorialist was not stopped in her tracks by the unblinded acupuncturists – or for that matter why the journal published this article. But let’s ponder a bit difficulties in coming up with a suitable comparison/control group for what is – until proven otherwise – a theatrical and highly ritualized placebo. If a treatment has no scientifically valid crucial ingredient, how we construct a comparison/control group differs only in the absence of the active ingredient, but is otherwise equivalent?
There is a long history of futile efforts to apply sham acupuncture, defined by what practitioners consider the inappropriate meridians. An accumulation of failures to distinguish such sham from “true” acupuncture in clinical trials has led to arguments that the distinction may not be valid: the efficacy of acupuncture may depend only on the procedure, not choice of a correct meridian. There are other studies would seem to show some advantage to the active or “true” treatments. These are generally clinical trials with high risk of bias, especially the inability to blind practitioners as to what she treatment they are providing.
There are been some clever efforts to develop sham acupuncture techniques that can fool even experienced practitioners. A recent PLOS One article tested needles that collapsed into themselves.
Up to 68% of patients and 83% of acupuncturists correctly identified the treatment, but for patients the distribution was not far from 50/50. Also, there was a significant interaction between actual or perceived treatment and the experience of de qi (p = 0.027), suggesting that the experience of de qi and possible non-verbal clues contributed to correct identification of the treatment. Yet, of the patients who perceived the treatment as active or placebo, 50% and 23%, respectively, reported de qi. Patients’ acute pain levels did not influence the perceived treatment. In conclusion, acupuncture treatment was not fully double-blinded which is similar to observations in pharmacological studies. Still, the non-penetrating needle is the only needle that allows some degree of practitioner blinding. The study raises questions about alternatives to double-blind randomized clinical trials in the assessment of acupuncture treatment.
Thirty-six studies were included for qualitative analysis while 14 were in the meta-analysis. The meta-analysis does not support the notion of either the Streitberger or the Park Device being inert control interventions while none of the studies involving the Takakura Device was included in the meta-analysis. Sixteen studies reported the occurrence of adverse events, with no significant difference between verum and placebo acupuncture. Author-reported blinding credibility showed that participant blinding was successful in most cases; however, when blinding index was calculated, only one study, which utilised the Park Device, seemed to have an ideal blinding scenario. Although the blinding index could not be calculated for the Takakura Device, it was the only device reported to enable practitioner blinding. There are limitations with each of the placebo devices and more rigorous studies are needed to further evaluate their effects and blinding credibility.
Really, must we we await better technology the more successfully fool’s acupuncturists and their patients whether they are actually penetrating the skin?
Results Between baseline and weeks 9 to 12, the mean (SD) number of days with headache of moderate or severe intensity decreased by 2.2 (2.7) days from a baseline of 5.2 (2.5) days in the acupuncture group compared with a decrease to 2.2 (2.7) days from a baseline of 5.0 (2.4) days in the sham acupuncture group, and by 0.8 (2.0) days from a baseline if 5.4 (3.0) days in the waiting list group. No difference was detected between the acupuncture and the sham acupuncture groups (0.0 days, 95% confidence interval, −0.7 to 0.7 days; P = .96) while there was a difference between the acupuncture group compared with the waiting list group (1.4 days; 95% confidence interval; 0.8-2.1 days; P<.001). The proportion of responders (reduction in headache days by at least 50%) was 51% in the acupuncture group, 53% in the sham acupuncture group, and 15% in the waiting list group.
Conclusion Acupuncture was no more effective than sham acupuncture in reducing migraine headaches although both interventions were more effective than a waiting list control.
I welcome someone with more time on their hands to compare and contrast the results of these two studies and decide which one has more credibility.
Maybe we step should back and ask “why is anyone care about such questions, when there is such doubt that a plausible scientific mechanism is in play?”
Time for JAMA: Internal Medicine to come clean
The JAMA: Internal Medicine article on acupuncture for prophylaxis of migraines is yet another example of a publication where revelation of earlier drafts, reviewer critiques, and author responses would be enlightening. Just what standard to which the authors are being held? What issues were raised in the review process? Beyond resolving crucial limitations like blinding of acupuncturists, under what conditions would be journal conclude that studies of acupuncture in general are sufficiently scientifically unsound and medically irrelevant to warrant publication in a prestigious JAMA journal.
Alternatively, is the journal willing to go on record that it is sufficient to establish that patients are satisfied with a pain treatment in terms of self-reported subjective experiences? Could we then simply close the issue of whether there is a plausible scientific mechanism involved where the existence of one can be seriously doubted? If so, why stop with evaluations with subjective pain would days without pain as the primary outcome?
We must question the wisdom of JAMA: Internal Medicine of inviting Dr. Amy Gelfand for editorial comment. She is apparently willing to allow that demonstration of a placebo response is sufficient for acceptance as a clinician. She also is attached to the University of California, San Francisco Headache Center which offers “alternative medicine, such as acupuncture, herbs, massage and meditation for treating headaches.” Endorsement of acupuncture in a prestigious journal as effective becomes part of the evidence considered for its reimbursement. I think there are enough editorial commentators out there without such conflicts of interest.
I will soon be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. Sign up at my new website to get notified about these courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.
According to the website of an advocacy foundation, coverage of two recent clinical trials published in in Journal of Psychopharmacology evaluating psilocybin for distress among cancer patients garnered over 1 billion views in the social media. To put that in context, the advocacy group claimed that this is one sixth of the attention that the Super Bowl received.
In this blog post I’ll review the second of the two clinical trials. Then, I will discuss some reasons why we should be concerned about the success of this public relations campaign in terms of what it means for both the integrity of scientific publishing, as well as health and science journalism.
The issue is not doubt that cancer patients will find benefit from the ingesting psychedelic mushroom in a safe environment. Nor that sale and ingestion of psilocybin is currently criminalized (Schedule 1, classified same as heroin).
We can appreciate the futility of the war on drugs, and the absurdity of the criminalization of psilocybin, but still object to how, we were strategically and effectively manipulated by this PR campaign.
Even if we approve of a cause, we need to be careful about subordinating the peer-review process and independent press coverage to the intended message of advocates.
Tolerating causes being promoted in this fashion undermines the trustworthiness of peer review and of independent press coverage of scientific papers.
To contradict a line from the 1964 acceptance speech of Republican Presidential Candidate Barry Goldwater, “Extremism in pursuit of virtue is no [a] vice. “
In this PR campaign –
We witnessed the breakdown of expected buffer of checks and balances between:
An advocacy group versus reporting of clinical trials in a scientific journal evaluating its claims.
Investigators’ exaggerated self-promotional claims versus editorial review and peer commentary.
Materials from the publicity campaign versus supposedly independent evaluation by journalists.
Is this part of a larger trend, where advocacy and marketing shape supposedly peer-reviewed publications in prestigious medical journals?
The public relations campaign for the psilocybin RCTs also left in tatters the credibility of altmetrics as an alternative to journal impact factors. The orchestrating of 1 billion views is a dramatic demonstration how altmetrics can be readily gamed. Articles published in a journal with a modest impact factor scored spectacularly, as seen in these altmetrics graphics the Journal of Psychopharmacology posted.
I reviewed in detail one of the clinical trials in my last blog post and will review the second in this one. They are both mediocre, poorly designed clinical trials that got lavishly praised as being highest quality by an impressive panel of commentators. I’ll suggest that in particular the second trial is best seen as what Barney Caroll has labeled an experimercial, a clinical trial aimed at generating enthusiasm for a product, rather than a dispassionate evaluation undertaken with some possibility of not been able to reject the null hypothesis. If this sounds harsh, please indulge me and read on and be entertained and I think persuaded that this was not a clinical trial but an elaborate ritual, complete with psychobabble woo that has no place in the discussion of the safety and effectiveness of medicine.
After skeptically scrutinizing the second trial, I’ll consider the commentaries and media coverage of the two trials.
I’ll end with a complaint that this PR effort is only aimed at securing the right of wealthy people with cancer to obtain psilocybin under supervision of a psychiatrist and in the context of woo psychotherapy. The risk of other people in other circumstances ingesting psilocybin is deliberately exaggerated. If psilocybin is as safe and beneficial as claimed by these articles, why should use remain criminalized for persons who don’t have cancer or don’t want to get a phony diagnosis from a psychiatrist or don’t want to submit to woo psychotherapy?
The normally pay walled Journal of Psychopharmacology granted free access to the two articles, along with most but not all of the commentaries. However, extensive uncritical coverage in Medscape Medical News provides a fairly accurate summary, complete with direct quotes of lavish self-praise distributed by the advocacy-affiliated investigators and echoed in seemingly tightly coordinated commentaries.
The praise one of the two senior authors heaped upon their two studies as captured in Medscape Medical News and echoed elsewhere:
The new findings have “the potential to transform the care of cancer patients with psychological and existential distress, but beyond that, it potentially provides a completely new model in psychiatry of a medication that works rapidly as both an antidepressant and anxiolytic and has sustained benefit for months,” Stephen Ross, MD, director of Substance Abuse Services, Department of Psychiatry, New York University (NYU), Langone Medical Center, told Medscape Medical News.
“That is potentially earth shattering and a big paradigm shift within psychiatry,” Dr Ross told Medscape Medical News.
The trial’s available registration is at ClinicalTrial.gov is available here.
The trial’s website is rather drab and typical for clinical trials. It contrasts sharply with the slick PR of the website for the NYU trial . The latter includes a gushy, emotional video from a clinical psychologist participating as a patient in the study. She delivers a passionate pitch for the “wonderful ritual” of the transformative experimental session. You can also get a sense of how session monitor structured the session and cultivated positive expectations. You also get a sense of the psilocybin experience being slickly marketed to appeal to the same well-heeled patients who pay out-of-pocket for complementary and alternative medicine at integrative medicine centers.
Conflict of interest
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Roland Griffiths is on the Board of Directors of the Heffter Research Institute.
Heffter Research Institute is listed as one of the funders of the study.
The Hopkins study starts with some familiar claims from psycho-oncology ] that portray cancer as a mental health issue. The exaggerated estimates of 40% of cancer patients experiencing a mood disorder is arrived at by lumping adjustment reactions with a smaller proportion of diagnoses of generalized anxiety and major depression.
The introduction ends with a strong claim to the rigor and experimental control exercised in the clinical trial:
The present study provides the most rigorous evaluation to date of the efficacy of a classic hallucinogen for treatment of depressed mood and anxiety in psychologically distressed cancer patients. The study evaluated a range of clinically relevant measures using a double-blind cross-over design to compare a very low psilocybin dose (intended as a placebo) to a moderately high psilocybin dose in 51 patients under conditions that minimized expectancy effects.
The methods and results
In a nutshell: Despite claims to the contrary, this study cannot be considered a blinded study. At the six month follow-up, which is the outcome assessment point of greatest interest, it could no longer meaningfully considered a randomized trial. All benefits of randomization were lost. In addition, the effects of psilocybin were confounded with a woo psychotherapy in which positive expectations and support were provided and reinforced in a way that likely influenced assessments of outcome. Outcomes at six months also reflected changes in distress which would’ve occurred in the absence of treatment. The sample is inappropriate for generalizations about the treatment of major depression and generalized anxiety. The characterization of patients as facing impending death is inaccurate.
The study involved a crossover design, which provides a lower level of evidence than a placebo controlled comparison study. The study compared a high psilocybin dose (22 or 30 mg/70 kg) with a low dose (1 or 3 mg/70 kg) administered in identically appearing capsules. While the low dose might not be homeopathic, it can be readily distinguished soon after administration from the larger dosage. The second drug administration occurred approximately 5 weeks later. Not surprisingly, with the high difference in dosage, session monitors who were supposedly blinded readily identified the group to which the participant they were observing had been assigned.
Within a cross over design, the six month follow-up data basically attributed any naturalistic decline in distress to the drug treatments. As David Colquhoun would argue, any estimate of the effects of the drug was inflated by including regression to the mean and get-better anyway effects. Furthermore, the focus on outcomes at six months meant patients assigned to either group in the crossover design had received high dosage psilocybin by at least five weeks into the study. Any benefits of randomization were lost.
Like the NYU study, the study Johns Hopkins involves selecting a small, unrepresentative sample of a larger group responding to a mixed recruitment strategy utilizing flyers, the internet, and physician referral.
Less than 10% of the cancer patients calling in were randomized.
Almost half of the final sample were currently using marijuana and, similarly, almost half had used hallucinogens in the past.
The sample is relatively young for cancer patients and well educated. More than half had postgraduate education, almost all were white, but there were two black people.
The sample is quite heterogeneous with respect to psychiatric diagnoses, with almost half having an adjustment disorder, and the rest anxiety and mood disorders.
In terms of cancer diagnoses and staging, it was also a select and heterogeneous group with only about a quarter having recurrent/metastatic disease with less than two years of expected survival. This suggests the odd “life-threatening” in the title is misleading.
Any mental health effects of psilocybin as a drug are inseparable from the effects of accompanying psychotherapy designed by a clinical psychologist “with extensive experience in studies of classic hallucinogens.” Participants met with that “session monitor” several times before the session in which the psilocybin was ingested in the monitor guided and aided in the interpretation of the drug experience. Aside from providing therapy, the session monitor instructed the patient to have positive expectations before the ingestion of the drug and work to maintain these expectations throughout the experience.
I found this psychotherapeutic aspect of the trial strikingly similar to one that was included in a trial of homeopathy in Germany that I accepted for publication in PLOS One. [See here for my rationale for accepting the trial and the ensuing controversy.] Trials of alternative therapies notoriously have such an imbalance of nonspecific placebo factors favoring the intervention group.
The clinical trial registration indicates that the primary outcome was the Pahnke-Richards Mystical Experience Questionnaire. This measure is included among 20 participant questionnaires listed in the Table 3 in the article as completed seven hours after administration of psilocybin. Although I haven’t reviewed all of these measures, I’m skeptical about their psychometric development, intercorrelation, and validation beyond face validity. What possibly could be learned from administering such a battery?
The authors make unsubstantiated assumptions in suggesting that these measures either individually or collectively capture mediation of later response assessed by mental health measures. A commentary echoed this:
Mediation analysis indicates that the mystical experience was a significant mediator of the effects of psilocybin dose on therapeutic outcomes.
But one of the authors of the commentary later walked that back with a statement to Medscape Medical News:
As for the mystical experiences that some patients reported, it is not clear whether these are “a cause, consequence or corollary of the anxiolytic effect or unconstrained cognition.”
Clinical outcomes at six months are discussed in terms of multiple measures derived from the unblinded, clinician-rated Hamilton scales. However, there are repeated references to box scores of the number of significant findings from at least 17 clinical measures (for instance, significant effects for 11 of the 17 measures), in addition to other subjective patient and significant-other measures. It is unclear why the authors would choose to administer so many measures that are highly likely intercorrelated.
There were no adverse events attributed to administration of psilocybin, and while there were a number of adverse psychological effects during the session with the psilocybin, none were deemed serious.
My summary evaluation
The clinical trial registration indicates broad inclusion criteria which may suggest the authors anticipated difficulty in recruiting patients that had significant psychiatric disorder for which psychotropic medication would be appropriate, as well as difficulty obtaining cancer patients that actually had poorer prognoses. Regardless, descriptions of the study is focusing on anxiety and depression and on “life-threatening” cancer seem to be marketing. You typically do not see a mixed sample with a large proportion of adjustment reaction characterized in the title of a psychiatric journal as treatment of “anxiety” and “depression”. You typically do not see a the adjective “life-threatening” in the title of an oncology article with such a mixed sample of cancer patients.
The authors could readily have anticipated that at the six-month assessment point of interest that they no longer had a comparison they could have been described as a rigorous double-blind, randomized trial. They should have thought through exactly what was being controlled by a control comparison group of a minimal dose of psilocybin. They should have been clearer that they were not simply evaluating psilocybin, but psilocybin administered in the context of a psychotherapy and an induction of strong positive expectations and promise of psychological support.
The finding of a lack of adverse events is consistent with a large literature, but is contradicted in the way the study is described to the media.
The accompanying editorial and commentary
Medscape Medical News reports the numerous commentaries accompanies these two clinical trials were hastily assembled. Many of the commentaries read that way, with the authors uncritically passing on the psilocybin authors’ lavish self praise of their work, after a lot of redundant recounts of the chemical nature of psilocybin and its history in psychiatry. When I repeatedly encountered claims that these trials represented rigorous, double blinded clinical trials or suggestions that the cancer was in a terminal phase, I assumed that the authors had not read the studies, only the publicity material, or simply had suspended all commitment to truth.
I have great admiration for David Nutt and respect his intellectual courage in campaigning for the decriminalization of recreational drugs, even when he knew that it would lead to his dismissal as chairman of the UK’s Advisory Council on the Misuse of Drugs (ACMD). He has repeatedly countered irrationality and prejudice with solid evidence. His graph depicting the harms of various substances to the uses and others deserves the wide distribution that it has received.
He ends his editorial with praise for the two trials as “the most rigorous double-blind placebo-controlled trials of a psychedelic drug in the past 50 years.” I’ll give him a break and assume that that reflects his dismal assessment of the quality of the other trials. I applaud his declaration, available nowhere else in the commentaries that:
There was no evidence of psilocybin being harmful enough to be controlled when it was banned, and since then, it has continued to be used safely by millions of young people worldwide with a very low incidence of problems. In a number of countries, it has remained legal, for example in Mexico where all plant products are legal, and in Holland where the underground bodies of the mushrooms (so-called truffles) were exempted from control.
His description of the other commentaries accompanying the two trials is apt:
The honours list of the commentators reads like a ‘who’s who’ of American and European psychiatry, and should reassure any waverers that this use of psilocybin is well within the accepted scope of modern psychiatry. They include two past presidents of the American Psychiatric Association (Lieberman and Summergrad) and the past-president of the European College of Neuropsychopharmacology (Goodwin), a previous deputy director of the Office of USA National Drug Control Policy (Kleber) and a previous head of the UK Medicines and Healthcare Regulatory Authority (Breckenridge). In addition, we have input from experienced psychiatric clinical trialists, leading pharmacologists and cancer-care specialists. They all essentially say the same thing..
The other commentaries. I do not find many of the commentaries worthy of further comment. However, one by Guy M Goodwin, Psilocybin: Psychotherapy or drug? Is unusual in offering even mild skepticism about the way the investigators are marketing their claims:
The authors consider this mediating effect as ‘mystical’, and show that treatment effects correlate with a subjective scale to measure such experience. The Oxford English Dictionary defines mysticism as ‘belief that union with or absorption into the Deity or the absolute, or the spiritual apprehension of knowledge inaccessible to the intellect, may be attained through contemplation and self-surrender’. Perhaps a scale really can measure a relevant kind of experience, but it raises the caution that the investigation of hallucinogens as treatments may be endangered by grandiose descriptions of their effects and unquestioning acceptance of their value.
The experiences of salience, meaningfulness, and healing that accompanied these powerful spiritual experiences and that were found to be mediators of clinical response in both of these carefully performed studies are also important to understand in their own right and are worthy of further study and contemplation. None of us are immune from the transitory nature of human life, which can bring fear and apprehension or conversely a real sense of meaning and preciousness if we carefully number our days. Understanding where these experiences fit in healing, well-being, and our understanding of consciousness may challenge many aspects of how we think about mental health or other matters, but these well-designed studies build upon a recent body of work that confronts us squarely with that task.
Coverage in of the two studies in the media
The website for Heffter Research Institute provides a handy set of links to some of the press coverage of the studies have received. There’s remarkable sameness to the portrayal of the study in the media, suggesting that journalists stayed closely to the press releases, except occasionally supplementing these with direct quotes from the authors. The appearance of a solicitation of independent evaluation of the trial almost entirely dependent on the commentaries published with the two articles.
There’s a lot of slick marketing by the two studies’ authors. In addition to what I wrote noted earlier in the blog, there are recurring unscientific statements marketing the psilocybin experience:
“They are defined by a sense of oneness – people feel that their separation between the personal ego and the outside world is sort of dissolved and they feel that they are part of some continuous energy or consciousness in the universe. Patients can feel sort of transported to a different dimension of reality, sort of like a waking dream.
The new studies, however, suggest psilocybin be used only in a medical setting, said Dr. George Greer, co-founder, medical director and secretary at the Heffter Research Institute in Santa Fe, New Mexico, which funded both studies.
“Our focus is scientific, and we’re focused on medical use by medical doctors,” Greer said at the news conference. “This is a special type of treatment, a special type of medicine. Its use can be highly controlled in clinics with specially trained people.”
He added he doubts the drug would ever be distributed to patients to take home.
There are only rare admissions from an author of one of the studies that:
The results were similar to those they had found in earlier studies in healthy volunteers. “In spite of their unique vulnerability and the mood disruption that the illness and contemplation of their death has prompted, these participants have the same kind of experiences, that are deeply meaningful, spiritually significant and producing enduring positive changes in life and mood and behaviour,” he said.
I’m not sure that demand would be great except among previous users of psychedelics and current users of cannabis.
But should psilocybin remain criminalized outside of cancer centers where wealthy patients can purchase a diagnosis of adjustment reaction from a psychiatrist? Cancer is not especially traumatic and PTSD is almost as common in the waiting rooms of primary care physicians. Why not extend to primary care physicians the option of prescribing psilocybin to their patients? What would be accomplished is that the purity could be assured. But why should psilocybin use being limited to mental health conditions, once we accept that a diagnosis of adjustment reaction is such a distorted extension of the term? Should we exclude patients who are atheists and only wants a satisfying experience, not a spiritual one?
Experience in other countries suggests that psilocybin can safely be ingested in a supportive, psychologically safe environment. Why not allow cancer patients and others to obtain psilocybin with assured purity and dosage? They could then ingest it in the comfort of friends and intimate partners who have been briefed on how the experience needs to be managed. The patients in the studies were mostly not facing immediate death from terminal cancer. But should we require that persons need to be dying in order to have a psilocybin experience without the risk of criminal penalties? Why not allow psilocybin to be ingested in the presence of pastoral counselors or priests whose religious beliefs are more congruent with the persons seeking such experiences than are New York City psychiatrists?
This is the first installment of what will be a series of occasional posts about the UK Mindfulness All Party Parliamentary Group report, Mindful Nation.
Mindful Nation is seriously deficient as a document supposedly arguing for policy based on evidence.
The professional and financial interests of lots of people involved in preparation of the document will benefit from implementation of its recommendations.
After an introduction, I focus on two studies singled in Mindful Nation out as offering support for the benefits of mindfulness training for school children.
Results of the group’s cherrypicked studies do not support implementation of mindfulness training in the schools, but inadvertently highlight some issues.
Investment in universal mindfulness training in the schools is unlikely to yield measurable, socially significant results, but will serve to divert resources from schoolchildren more urgently in need of effective intervention and support.
Mindfulness Nation is another example of delivery of low intensity services to mostly low risk persons to the detriment of those in greatest and most urgent need.
The launch event for the Mindful Nation report billed it as the “World’s first official report” on mindfulness.
The Mindfulness All-Party Parliamentary Group (MAPPG) was set up to:
review the scientific evidence and current best practice in mindfulness training
develop policy recommendations for government, based on these findings
provide a forum for discussion in Parliament for the role of mindfulness and its implementation in public policy.
The Mindfulness All-Party Parliamentary Group describes itself as
Impressed by the levels of both popular and scientific interest, and launched an inquiry to consider the potential relevance of mindfulness to a range of urgent policy challenges facing government.
Don’t get confused by this being a government-commissioned report. The report stands in sharp contrast to one commissioned by the US government in terms of unbalanced constitution of the committee undertaking the review, and lack of transparency in search for relevant literature, and methodology for rating and interpreting of the quality of available evidence.
Compare the claims of Mindful Nation to a comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ) that reviewed 18,753 citations, and found only 47 trials (3%) that included an active control treatment. The vast majority of studies available for inclusion had only a wait list or no-treatment control group and so exaggerated any estimate of the efficacy of mindfulness.
Although the US report was available to those preparing the UK Mindful Nation report, no mention is made of either the full contents of report or a resulting publication in a peer-reviewed journal. Instead, the UK Mindful Nation report emphasized narrative and otherwise unsystematic reviews, and meta-analyses not adequately controlling for bias.
When the abridged version of the AHRQ report was published in JAMA: Internal Medicine, an accompanying commentary raises issues even more applicable to the Mindful Nation report:
The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?
The members of the UK Mindfulness All-Party Parliamentary Group were selected for their positive attitude towards mindfulness. The collection of witnesses they called to hearings were saturated with advocates of mindfulness and those having professional and financial interests in arriving at a positive view. There is no transparency in terms of how studies or testimonials were selected, but the bias is notable. Many of the scientific studies were methodologically poor, if there was any methodology at all. Many were strongly stated, but weakly substantiated opinion pieces. Authors often included those having financial interests in obtaining positive results, but with no acknowledgment of conflict of interest. The glowing testimonials were accompanied by smiling photos and were unanimous in their praise of the transformative benefits of mindfulness.
As Mark B. Cope and David B. Allison concluded about obesity research, such a packing of the committee and a highly selective review of the literature leads to a ”distortion of information in the service of what might be perceived to be righteous ends.” [I thank Tim Caulfield for calling this quote to my attention].
Mindfulness in the schools
The recommendations of Mindfulness Nation are
The Department for Education (DfE) should designate, as a first step, three teaching schools116 to pioneer mindfulness teaching,co-ordinate and develop innovation, test models of replicability and scalability and disseminate best practice.
Given the DfE’s interest in character and resilience (as demonstrated through the Character Education Grant programme and its Character Awards), we propose a comparable Challenge Fund of £1 million a year to which schools can bid for the costs of training teachers in mindfulness.
The DfE and the Department of Health (DOH) should recommend that each school identifies a lead in schools and in local services to co-ordinate responses to wellbeing and mental health issues for children and young people117. Any joint training for these professional leads should include a basic training in mindfulness interventions.
The DfE should work with voluntary organisations and private providers to fund a freely accessible, online programme aimed at supporting young people and those who work with them in developing basic mindfulness skills118.
Leading up to these recommendations, the report outlined an “alarming crisis” in the mental health of children and adolescents and proposes:
Given the scale of this mental health crisis, there is real urgency to innovate new approaches where there is good preliminary evidence. Mindfulness fits this criterion and we believe there is enough evidence of its potential benefits to warrant a significant scaling-up of its availability in schools.
Think of all the financial and professional opportunities that proponents of mindfulness involved in preparation of this report have garnered for themselves.
Mindfulness to promote executive functioning in children and adolescents
For the remainder of the blog post, I will focus on the two studies cited in support of the following statement:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
The terms “executive control” and “emotional stability” were clarified:
Many argue that the most important prerequisites for child development are executive control (the management of cognitive processes such as memory, problem solving, reasoning and planning) and emotion regulation (the ability to understand and manage the emotions, including and especially impulse control). These main contributors to self-regulation underpin emotional wellbeing, effective learning and academic attainment. They also predict income, health and criminality in adulthood69. American psychologist, Daniel Goleman, is a prominent exponent of the research70 showing that these capabilities are the biggest single determinant of life outcomes. They contribute to the ability to cope with stress, to concentrate, and to use metacognition (thinking about thinking: a crucial skill for learning). They also support the cognitive flexibility required for effective decision-making and creativity.
Actually, Daniel Goleman is the former editor of the pop magazine Psychology Today and an author of numerous pop books.
The first cited paper.
73 Flook L, Smalley SL, Kitil MJ, Galla BM, Kaiser-Greenland S, Locke J, et al. Effects of mindful awareness practices on executive functions in elementary school children. Journal of Applied School Psychology. 2010;26(1):70-95.
Journal of Applied School Psychology is a Taylor-Francis journal, formerly known as Special Services in the Schools (1984 – 2002). Its Journal Impact Factor is 1.30.
One of the authors of the article, Susan Kaiser-Greenland is a mindfulness entrepreneur as seen in her website describing her as an author, public speaker, and educator on the subject of sharing secular mindfulness and meditation with children and families. Her books are The Mindful Child: How to Help Your Kid Manage Stress and Become Happier, Kinder, and More Compassionate and Mindful Games: Sharing Mindfulness and Meditation with Children, Teens, and Families and the forthcoming The Mindful Games Deck: 50 Activities for Kids and Teens.
This article represents the main research available on Kaiser-Greenfield’s Inner Kids program and figures prominently in her promotion of her products.
The sample consisted of 64 children assigned to either mindful awareness practices (MAPs; n = 32) or a control group consisting of a silent reading period (n = 32).
The MAPs training used in the current study is a curriculum developed by one of the authors (SKG). The program is modeled after classical mindfulness training for adults and uses secular and age appropriate exercises and games to promote (a) awareness of self through sensory awareness (auditory, kinesthetic, tactile, gustatory, visual), attentional regulation, and awareness of thoughts and feelings; (b) awareness of others (e.g., awareness of one’s own body placement in relation to other people and awareness of other people’s thoughts and feelings); and (c) awareness of the environment (e.g., awareness of relationships and connections between people, places, and things).
A majority of exercises involve interactions among students and between students and the instructor.
The primary EF outcomes were the Metacognition Index (MI), Behavioral Regulation Index (BRI), and Global Executive Composite (GEC) as reported by teachers and parents
The program was delivered for 30 minutes, twice per week, for 8 weeks. Teachers and parents completed questionnaires assessing children’s executive function immediately before and following the 8-week period. Multivariate analysis of covariance on teacher and parent reports of executive function (EF) indicated an interaction effect baseline EF score and group status on posttest EF. That is, children in the group that received mindful awareness training who were less well regulated showed greater improvement in EF compared with controls. Specifically, those children starting out with poor EF who went through the mindful awareness training showed gains in behavioral regulation, metacognition, and overall global executive control. These results indicate a stronger effect of mindful awareness training on children with executive function difficulties.
The finding that both teachers and parents reported changes suggests that improvements in children’s behavioral regulation generalized across settings. Future work is warranted using neurocognitive tasks of executive functions, behavioral observation, and multiple classroom samples to replicate and extend these preliminary findings.”
What I discovered when I scrutinized the study.
This study is unblinded, with students and their teachers and parents providing the subjective ratings of the students well aware of which group students are assigned. We are not given any correlations among or between their ratings and so we don’t know whether there is just a global subjective factor (easy or difficult child, well-behaved or not) operating for either teachers or parents, or both.
It is unclear for what features of the mindfulness training the comparison reading group offers control or equivalence. The two groups are different in positive expectations and attention and support that are likely to be reflected the parent and teacher ratings. There’s a high likelihood of any differences in outcomes being nonspecific and not something active and distinct ingredient of mindfulness training. In any comparison with the students assigned to reading time, students assigned to mindfulness training have the benefit of any active ingredient it might have, as well as any nonspecific, placebo ingredients.
This is exceedingly weak design, but one that dominates evaluations of mindfulness.
With only 32 students per group, note too that this is a seriously underpowered study. It has less than a 50% probability of detecting a moderate sized effect if one is present. And because of the larger effect size needed to achieve statistical significance with such a small sample size, and statistically significant effects will be large, even if unlikely to replicate in a larger sample. That is the paradox of low sample size we need to understand in these situations.
Not surprisingly, there were no differences between the mindfulness and reading control groups on any outcomes variable, whether rated by parents or teachers. Nonetheless, the authors rescued their claims for an effective intervention with:
However, as shown by the significance of interaction terms, baseline levels of EF (GEC reported by teachers) moderated improvement in posttest EF for those children in the MAPs group compared to children in the control group. That is, on the teacher BRIEF, children with poorer initial EF (higher scores on BRIEF) who went through MAPs training showed improved EF subsequent to the training (indicated by lower GEC scores at posttest) compared to controls.
Similar claims were made about parent ratings. But let’s look at figure 3 depicting post-test scores. These are from the teachers, but results for the parent ratings are essentially the same.
Note the odd scaling of the X axis. The data are divided into four quartiles and then the middle half is collapsed so that there are three data points. I’m curious about what is being hidden. Even with the sleight-of-hand, it appears that scores for the intervention and control groups are identical except for the top quartile. It appears that just a couple of students in the control group are accounting for any appearance of a difference. But keep in mind that the upper quartile is only a matter of eight students in each group.
This scatter plot is further revealing:
It appears that the differences that are limited to the upper quartile are due to a couple of outlier control students. Without them, even the post-hoc differences that were found in the upper quartile between intervention control groups would likely disappear.
Basically what we are seeing is that most students do not show any benefit whatsoever from mindfulness training over being in a reading group. It’s not surprising that students who were not particularly elevated on the variables of interest do not register an effect. That’s a common ceiling effect in such universally delivered interventions in general population samples
Essentially, if we focus on the designated outcome variables, we are wasting the students’ time as well as that of the staff. Think of what could be done if the same resources could be applied in more effective ways. There are a couple of students in in this study were outliers with low executive function. We don’t know how else they otherwise differ.Neither in the study, nor in the validation of these measures is much attention given to their discriminant validity, i.e., what variables influence the ratings that shouldn’t. I suspect strongly that there are global, nonspecific aspects to both parent and teacher ratings such that they are influenced by the other aspects of these couple of students’ engagement with their classroom environment, and perhaps other environments.
I see little basis for the authors’ self-congratulatory conclusion:
The present findings suggest that mindfulness introduced in a general education setting is particularly beneficial for children with EF difficulties.
Introduction of these types of awareness practices in elementary education may prove to be a viable and cost-effective way to improve EF processes in general, and perhaps specifically in children with EF difficulties, and thus enhance young children’s socio-emotional, cognitive, and academic development.
Maybe the authors stared with this conviction and it was unshaken by disappointing findings.
Or the statement made in Mindfulness Nation:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
But we have another study that is cited for this statement.
74. Huppert FA, Johnson DM. A controlled trial of mindfulness training in schools: The importance of practice for an impact on wellbeing. The Journal of Positive Psychology. 2010; 5(4):264-274.
The first author, Felicia Huppert is a Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University.
This study involved 173 14- and 15- year old boys from a private Catholic school.
The Journal of Positive Psychology is not known for its high methodological standards. A look at its editorial board suggests a high likelihood that manuscripts submitted will be reviewed by sympathetic reviewers publishing their own methodologically flawed studies, often with results in support of undeclared conflicts of interest.
The mindfulness training was based on the program developed by Kabat-Zinn and colleagues at the University of Massachusetts Medical School (Kabat-Zinn, 2003). It comprised four 40 minute classes, one per week, which presented the principles and practice of mindfulness meditation. The mindfulness classes covered the concepts of awareness and acceptance, and the mindfulness practices included bodily awareness of contact points, mindfulness of breathing and finding an anchor point, awareness of sounds, understanding the transient nature of thoughts, and walking meditation. The mindfulness practices were built up progressively, with a new element being introduced each week. In some classes, a video clip was shown to highlight the practical value of mindful awareness (e.g. “The Last Samurai”, “Losing It”). Students in the mindfulness condition were also provided with a specially designed CD, containing three 8-minute audio files of mindfulness exercises to be used outside the classroom. These audio files reflected the progressive aspects of training which the students were receiving in class. Students were encouraged to undertake daily practice by listening to the appropriate audio files. During the 4-week training period, students in the control classes attended their normal religious studies lessons.
A total of 155 participants had complete data at baseline and 134 at follow-up (78 in the mindfulness and 56 in the control condition). Any student who had missing data are at either time point was simply dropped from the analysis. The effects of this statistical decison are difficult to track in the paper. Regardless, there was a lack of any difference between intervention and control group and any of a host of outcome variables, with none designated as primary outcome.
Actual practicing of mindfulness by students was inconsistent.
One third of the group (33%) practised at least three times a week, 34.8% practised more than once but less than three times a week, and 32.7% practised once a week or less (of whom 7 respondents, 8.4%, reported no practice at all). Only two students reported practicing daily. The practice variable ranged from 0 to 28 (number of days of practice over four weeks). The practice variable was found to be highly skewed, with 79% of the sample obtaining a score of 14 or less (skewness = 0.68, standard error of skewness = 0.25).
The authors rescue their claim of a significant effect for the mindfulness intervention with highly complex multivariate analyses with multiple control variables in which outcomes within-group effects for students assigned to mindfulness were related to the extent of students actually practicing mindfulness. Without controlling for the numerous (and post-hoc) multiple comparisons, results were still largely nonsignificant.
One simple conclusion that can be drawn is that despite a lot of encouragement, there was little actual practice of mindfulness by the relatively well-off students in a relatively highly resourced school setting. We could expect results to improve with wider dissemination to schools with less resources and less privileged students.
The authors conclude:
The main finding of this study was a significant improvement on measures of mindfulness and psychological well-being related to the degree of individual practice undertaken outside the classroom.
Recall that Mindful Nation cited the study in the following context:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
These are two methodologically weak studies with largely null findings. They are hardly the basis for launching a national policy implementing universal mindfulness in the schools.
As noted in the US AHRQ report, despite a huge number of studies of mindfulness having been conducted, few involved a test with an adequate control group, and so there’s little evidence that mindfulness has any advantage over any active treatment. Neither of these studies disturbed that conclusion, although they are spun both in the original studies and in the Mindful Nation report to be positive. Both papers were published in journals where the reviewers were likely to be overly sympathetic and not at him tentative to serious methodological and statistical problems.
The committee writing Mindful Nation arrived at conclusions consistent with their prior enthusiasm for mindfulness and their vested interest in it. They sorted through evidence to find what supported their pre-existing assumptions.
Like UK resilience programs, the recommendations of Mindful Nation put considerable resources in the delivery of services to a large population and likely to have the threshold of need to register a socially in clinically significant effect. On a population level, results of the implementation are doomed to fall short of its claims. Those many fewer students in need more timely, intensive, and tailored services are left underserved. Their presence is ignored or, worse, invoked to justify the delivery of services to the larger group, with the needy students not benefiting.
In this blog post, I mainly focused on two methodologically poor studies. But for the selection of these particular studies, I depended on the search of the authors of Mindful Nation and the emphasis that were given to these two studies for some sweeping claims in the report. I will continue to be writing about the recommendations of Mindful Nation. I welcome reader feedback, particularly from readers whose enthusiasm for mindfulness is offended. But I urge them not simply to go to Google and cherry pick an isolated study and ask me to refute its claims.
Rather, we need to pay attention to the larger literature concerning mindfulness, its serious methodological problems, and the sociopolitical forces and vested interests that preserve a strong confirmation bias, both in the “scientific” literature and its echoing in documents like Mindful Nation.
If The Lancet COBRA study had evaluated homeopathy rather than behavioural activation (BA), homeopathy would likely have similarly been found “non-inferior” to cognitive behavior therapy.
This is not an argument for treating depression with homeopathy, but an argument that the 14 talented authors of The Lancet COBRA study stacked the deck for their conclusion that BA could be substituted for CBT in routine care for depression without loss of effectiveness. Conflict of interest and catering to politics intruded on science in the COBRA trial.
If a study like COBRA produces phenomenally similar results with treatments based on distinct mechanisms of change, one possibility is that background nonspecific factors are dominating the results. Insert homeopathy, a bogus treatment with strong nonspecific effects, in place of BA, and a non-inferiority may well be shown.
Nonetheless, a super diluted and essentially inert substance is selected and delivered within a complex ritual. The choice of the particular substance being diluted and the extent of its dilution is determined with detailed questioning of patients about their background, life style, and personal functioning. Naïve and unskeptical patients are likely to perceive themselves as receiving exceptionally personalized medicine delivered by a sympathetic and caring provider. Homeopathy thus has potentially strong nonspecific (placebo) elements that may be lacking in the briefer and less attentive encounters of routine medical care.
As an academic editor at PLOS One, I received considerable criticism for having accepted a failed trial of homeopathy for depression. The study had been funded by the German government and had fallen miserably short in efforts to recruit the intended sample size. I felt the study should be published inPLOS One to provide evidence whether such and worthless studies should be undertaken in the future. But I also wanted readers to have the opportunity to see what I had learned from the article about just how ritualized homeopathy can be, with a strong potential for placebo effects.
Presumably, readers would then be better equipped to evaluate when authors claim in other contexts that homeopathy is effective from clinical trials with it was inadequate control of nonspecific effects. But that is also a pervasive problem in psychotherapy trials [ 1,2 ] that do not have a suitable comparison/control group.
The LancetCOBRA study has received extraordinary promotion as evidence for the cost-effectiveness of substituting behavioural activation therapy (BA) delivered by minimally trained professionals for cognitive behaviour therapy (CBT) for depression. The study is serving as the basis for proposals to cut costs in the UK National Health Service by replacing more expensive clinical psychologists with less trained and experienced providers.
Coached by the Science Media Centre, the authors of The Lancet study focused our attention on their finding no inferiority of BA to CBT. They are distracting us from the more important question of whether either treatment had any advantage over nonspecific interventions in the unusual context in which they were evaluated.
The editorial accompanying the COBRA study suggest a BA involves a simple message delivered by providers with very little training:
“Life will inevitably throw obstacles at you, and you will feel down. When you do, stay active. Do not quit. I will help you get active again.”
I encourage readers to stop and think how depressed persons suffering substantial impairment, including reduced ability to experience pleasure, would respond to such suggestions. It sounds all too much like the “Snap out of it, Debbie” they may have already heard from people around them or in their own self-blame.
In such a system, when emergent mild to moderate depressive symptoms are uncovered in a primary medical care setting, providers are encouraged neither to initiate an active treatment nor even make a formal psychiatric diagnosis of a condition that could prove self-limiting with a brief passage of time. Rather, providers are encouraged to defer diagnosis and schedule a follow-up appointment. This is more than simple watchful waiting. Until the next appointment, providers encourage patients to undertake some guided self-help, including engagement in pleasant activities of their choice, much as apparently done in the BA condition in the COBRA study. Increasingly, they may encourage Internet-based therapy.
In a few parts of the UK, general practitioners may refer patients to agreen gym.
It’s now appreciated that to have any effectiveness, such prescriptions have to be made in a relationship of supportive accountability. For patients to adhere adequately to such prescriptions and not feel they are simply being dismissed by the provider and sent away. Patients need to have a sense that the prescription is occurring within the context of a relationship with someone who cares with whether they carry out and benefit from the prescription.
Used in this way, this BA component of stepped care could possibly be part of reducing unnecessary medication and the need for more intensive treatment. However, evaluation of cost effectiveness is complicated by the need for a support structure in which treatment can be monitored, including any antidepressant medication that is subsequently prescribed. Otherwise, the needs of a substantial number of patients needing more intensive, quality care for depression would be neglected.
The shortcomings of COBRA as an evaluation of BA in context
COBRA does not provide an evaluation of any system offering BA to the large pool of patients who do not require more intensive treatment in a system where they would be provided appropriate timely evaluation and referral onwards.
It is the nature of mild to moderate depressive symptoms being presented in primary care, especially when patients are not specifically seeking mental health treatment, that the threshold for a formal diagnosis of major depression is often met by the minimum or only one more than the five required symptoms. Diagnoses are of necessity unreliable, in part because the judgment of particular symptoms meeting a minimal threshold of severity is unreliable. After a brief passage of time and in the absence of formal treatment, a substantial proportion of patients will no longer meet diagnostic criteria.
COBRA also does not evaluate BA versus CBT in the more select population that participates in clinical trials of treatment for depression. Sir David Goldberg is credited with first describing the filters that operate on the pathway of patients from presenting a complex combination of problems in living and psychiatric symptoms in primary medical care to treatment in specialty settings.
Results of the COBRA study cannot be meaningfully integrated into the existing literature concerning BA as a component of stepped care or treatment for depression that is sufficient in itself.
More recently, I reviewed in detail The Lancet COBRA study, highlighting how one of the most ambitious and heavily promoted psychotherapy studies ever – was noninformative. The authors’ claim was unwarranted that it would be wise to substitute BA delivered by minimally trained providers for cognitive behavior therapy delivered by clinical psychologists.
I refer readers to that blog post for further elaboration of some points I will be making here. For instance, some readers might want to refresh their sense of how a noninferiority trial differs from a conventional comparison of two treatments.
Risk of bias in noninferiority trial
Published reports of clinical trials are notoriously unreliable and biased in terms of the authors’ favored conclusions.
With the typical evaluation of an active treatment versus a control condition, the risk of bias is that reported results will favor the active treatment. However, the issue of bias in a noninferiority trial is more complex. The investigators’ interest is in demonstrating that within certain limits, there are no significant differences between two treatments. Yet, although it is not always tested directly, the intention is to show that this lack of difference is due them both being effective, rather than ineffective.
In COBRA, the authors’ clear intention was to show that less expensive BA was not inferior to CBT, with the assumption that both were effective. Biases can emerge from building in features of the design, analysis, and interpretation of the study that minimized differences between these two treatments. But bias can also arise from a study design in which nonspecific effects are distributed across interventions so that any difference in active ingredients is obscured by shared features of the circumstances in which the interventions are delivered. As in Alice in Wonderland [https://en.wikipedia.org/wiki/Dodo_bird_verdict ], the race is rigged so that almost everybody can get a prize.
Why COBRA could have shown almost any treatment with nonspecific effects was noninferior to CBT for depression
1.The investigators chose a population and a recruitment strategy that increase the likelihood that patients participating in the trial would likely get better with minimal support and contact available in either of the two conditions – BA versus CBT.
The recruited patients were not actively seeking treatment. They were identified from records of GPs has having had a diagnosis of depression, but were required to not currently being in psychotherapy.
A dirty secret from someone who has supervised thousands of SCID interviews of medical patients. The developers of the SCID recognized that it yielded a lot of false positives and inflated rates of disorder among patients who are not seeking mental health care.
They attempted to compensate by requiring that respondents not only endorse symptoms, but indicate that the symptoms are a source of impairment. This is the so-called clinical significance criterion. Respondents automatically meet the criterion if they are seeking mental health treatment. Those who are not seeking treatment are asked directly whether the symptoms impair them. This is a particularly on validated aspect of the SCID in patients typically do not endorse their symptoms as a source of impairment.
When we asked breast cancer patients who otherwise met criteria for depression with the SCID whether the depressive symptoms impaired them, they uniformly said something like ‘No, my cancer impairs me.’ When we conducted a systematic study of the clinical significance criterion, we found that whether or not it was endorsed substantially affected individual in overall rates of diagnosis. Robert Spitzer, who developed the SCID interview along with his wife Janet Williams, conceded to me in a symposium that application of the clinical significance criterion was a failure.
What is the relevance in a discussion of the COBRA study? I would wager that the authors, like most investigators who use the SCID, did not inquire about the clinical significance criterion, and as a result they had a lot of false positives.
The population being sampled in the recruitment strategy used in COBRA is likely to yield a sample unrepresentative of patients participating in the usual trials of psychotherapy and medication for depression.
2. Most patients participating in COBRA reported already receiving antidepressants at baseline, but adherence and follow-up are unknown, but likely to be inadequate.
Notoriously, patients receiving a prescription for an antidepressant in primary care actually take the medication inconsistently and for only a short time, if at all. They receive inadequate follow-up and reassessment. Their depression outcomes may actually be poorer than for patients receiving a pill placebo in the context of a clinical trial, where there is blinding and a high degree of positive expectations, attention and support.
Studies, including one by an author of the COBRA study suggests that augmenting adequately managed treatment with antidepressants with psychotherapy is unlikely to improve outcomes.
We’re stumbling upon one of the more messy features of COBRA. Most patients had already been prescribed medication at baseline, but their adherence and follow-up is left unreported, but is likely to be poor. The prescription is likely to have been made up to two years before baseline.
It would not be cost-effective to introduce psychotherapy to such a sample without reassessing whether they were adequately receiving medication. Such a sample would also be highly susceptible to nonspecific interventions providing positive expectations, support, and attention that they are not receiving in their antidepressant treatment. There are multiple ways in which nonspecific effects could improve outcomes – perhaps by improving adherence, but perhaps because of the healing effects of support on mild depressive symptoms.
3. The COBRA authors’ way of dealing with co-treatment with antidepressants blocked readers ability to independently evaluate main effects and interactions with BA versus CBT.
The authors used antidepressant treatment as a stratification factor, insuring that the 70% of patients receiving them were evenly distributed the BA in CBT conditions. This strategy made it more difficult to separate effects of antidepressants. However, the problem is compounded by the authors failure to provide subgroup analyses based on whether patients had received an antidepressant prescription, as well as the authors failure to provide any descriptions of the extent to which patients received management of their antidepressants at baseline or during active psychotherapy and follow-up. The authors incorporated data concerning the cost of medication into their economic analyses, but did not report the data in a way that could be scrutinized.
I anticipate requesting these data from the authors to find out more, although they have not responded to my previous query concerning anomalies in the reporting of how long since patients had first received a prescription for antidepressants.
4. The 12 month assessment designated as the primary outcomes capitalized on natural recovery patterns, unreliability of initial diagnosis, and simple regression to the mean.
Depression identified in the community and in primary care patient populations is variable in the course, but typically resolves in nine months. Making reassessment of primary outcomes at 12 months increases the likelihood that effects of active ingredients of the two treatments would be lost in a natural recovery process.
5. The intensity of treatment (allowable number of 20 sessions plus for additional sessions) offered in the study exceeded what is available in typical psychotherapy trials and exceeded what was actually accessed by patients.
Allowing this level of intensity of treatment generates a lot of noise in any interpretation of the resulting data. Offering so much treatment encourages patients dropping out, with the loss of their follow-up data. We can’t tell if they simply dropped out because they had received what they perceived as sufficient treatment or if they were dissatisfied. This intensity of offered treatment reduces generalizability to what actually occurs in routine care and comparing and contrasting results of the COBRA study to the existing literature.
6. The low rate of actual uptake of psychotherapy and retention of patients for follow-up present serious problems for interpreting the results of the COBRA study.
Intent to treat analyses with imputation of missing data are simply voodoo statistics with so much missing data. Imputation and other multivariate techniques make the assumption that data are missing at random, but as I just noted, this is an improbable assumption. [I refer readers back to my previous blog post who want to learn more about intent to treat versus per-protocol analyses].
The authors cite past literature in their choice to emphasize the per-protocol analyses. That means that they based their interpretation of the results on 135 of 221 patients originally assigned to the BA and in the 151 of 219 patients originally signed to CBT. This is a messy approach and precludes generalizing back to original assignment. That’s why that intent to treat analyses are emphasized in conventional evaluations of psychotherapy.
A skeptical view of what will be done with the COBRA data
The authors clear intent was to produce data supporting an argument that more expensive clinical psychologists could be replaced by less trained clinicians providing a simplified treatment. The striking lack of differences between BA and CBT might be seen as strong evidence that BA could replace CBT. Yet, I am suggesting that the striking lack of differences could also indicate features built into the design that swamped any differences in limited any generalizability to what would happen if all depressed patients were referred to BA delivered by clinicians with little training versus CBT. I’m arguing that homeopathy would have done as well.
BA is already being implemented in the UK and elsewhere as part of stepped care initiatives for depression. Inclusion of BA is inadequately evaluated, as is the overall strategy of stepped care. See here for an excellent review of stepped care initiatives and a tentative conclusion that they are moderately effective, but that many questions remain.
If the COBRA authors were most committed to improving the quality of depression care in the UK, they would’ve either designed their study as a fairer test of substituting BA for CBT or they would have tackled the more urgent task of evaluating rigorously whether stepped care initiatives work.
Years ago, collaborative care programs for depression were touted as reducing overall costs. These programs, which were found to be robustly effective in many contexts, involved placing depression managers in primary care to assist the GPs in improved monitoring and management of treatment. Often the most immediate and effective improvement was that patients got adequate follow-up, where previously they were simply being ignored. Collaborative care programs did not prove to be cheaper, and not surprising, because better care is often more expensive than ineptly provided inadequate care.
We should be extremely skeptical of experienced investigators who claim that they demonstrate that they can cut costs and maintain quality with a wholesale reduction in the level of training of providers treating depression, a complex and heterogeneous disorder, especially when their expensive study fails to deal with this complexity and heterogeneity.