Paul: “For three years I kept my faith that relief had to be just around the corner, but my disappointment is now as chronic as my pain. Hope has become a distraction.”
Chronic pain and tragic irony…
Paul: “For three years I kept my faith that relief had to be just around the corner, but my disappointment is now as chronic as my pain. Hope has become a distraction.”
Paul Ingraham is quite important in the Science-Based Skeptics movement and in my becoming involved in it. He emailed me after a long spell without contact. He wanted to explain how he had been out of touch. His life had been devastated by as-yet medically unexplained pain and other mysterious symptoms.
Paul modestly describes himself at his blog site as “a health writer in Vancouver, Canada, best known for my work debunking common myths about treating common pain problems on PainScience.com. I actually make a living doing that. On this blog, I just mess around. ~ Paul Ingraham (@painsci, Facebook).”
Detailed, readable tutorials about common stubborn pain problems & injuries, like back pain or runner’s knee.
Many common painful problems are often misunderstood, misdiagnosed, and mistreated. Made for patients, but strong enough for professionals, these book-length tutorials are crammed with tips, tricks, and insights about what works, what doesn’t, and why. No miracle cures are for sale here — just sensible information, scientifically current, backed up by hundreds of free articles and a huge pain and injury science bibliography.
Paul offered me invaluable assistance and support when I began blogging at the prestigious Science Based Medicine. See for instance, my:
I have not consistently blogged there, because my topics don’t always fit. Whenever I do blog there, I learn a lot from the wealth of thoughtful comments I received.
I have great respect for Science Based Medicine’s authoritative, well documented and evidence-based analyses. I highly recommend the blog for those who are looking for sophistication delivered in a way that an intelligent lay person could understand.
What’s the difference between Sciencebased medicine (SBM) versus evidence-based medicine (EBM)?
I get some puzzlement every time I bring up this important distinction – Bloggers at SBM frequently make a distinction between science-based- and evidence-based- medicine. They offer careful analyses of unproven treatments like acupuncture and homeopathy. Proponents of these treatment increasingly sell them as evidence-based, citing randomized trials that do not involve an active treatment. The illusion of efficacy is often created by the positive expectations and mysterious rituals with which these treatments are delivered. Comparison treatments in these studies often lack this boost, particularly when tested in in unblinded comparisons.
The SBM bloggers like to point out that there are no plausible tested scientific mechanisms by which these treatments might conceivably work. The name of blog, Science-Based Medicine calls attention to their higher standards for considering treatments efficacious: to be considered science based medicine, they have to be proven as effective as evidence-based active treatments, and have to have a mechanism beyond nonspecific, placebo effects.
Paul Ingram reappears from a disappearance.
Paul mysteriously disappeared for a while. Now he’s reemerged with a tale that is getting a lot of attention. He gave me permission to blog about excerpts. I enclose a link to the full story that I strongly recommend.
A decade ago I devoted myself to helping people with chronic pain, and now it’s time to face my ironic new reality: I have serious unexplained chronic pain myself. It may never stop, and I need to start learning to live with it rather than trying to fix it.
I have always been “prone” to aches and pains, and that’s why I became a massage therapist and then moved on to publishing PainScience.com. But that tendency was a pain puppy humping my leg compared to the Cerberus of suffering that’s mauling me now. I’ve graduated to the pain big leagues.
For three years I kept my faith that relief had to be just around the corner, but my disappointment is now as chronic as my pain. Hope has become a distraction. I’ve been like a blind man waiting for my sight to return instead of learning braille. It’s acceptance time.
Paul describes how is pain drove him into hiding.
… why I’ve become one of those irritating people who answers every invitation with a “maybe” and bails on half the things I commit to. I never know what I’m going to be able to cope with on a given day until it’s right in front of me.
He struggled to define the problem:
Mostly widespread soreness and joint pain like the early stages of the flu, a parade of agonizing hot spots that are always on the verge of breaking my spirit, and a lot of sickly fatigue. All of which is easily provoked by exercise.
But there was a dizzying array of other symptoms…
Any diagnosis would be simply a label, not an explanation.
Nothing turned up in a few phases of medical investigation in 2015 and 2016. My “MS hug” is not caused by MS. My thunderclap headaches are not brain bleeds. My tremors are not Parkinsonian. I am not deficient in vitamins B or D. There is no tumour lurking in my chest or skull, nor any markers of inflammation in my blood. My heart beats as steadily as an atomic clock, and my nerves conduct impulses like champs.
Paul was not seriously tempted by alternative and complementary medicine
I am not tempted to try alternative medicine. The best of alt-med is arguably not alternative at all — e.g. nutrition, mindfulness, relaxation, massage, and so on — and the rest of what alt-med offers ranges from dubious at best to insane bollocks at the worst. You can’t fool a magician with his own tricks, and you can’t give false hope to an alt-med apostate like me: I’ve seen how the sausage is made, and I feel no surge of false hope when someone tells me (and they have) “it’s all coming from your jaw, you should see this guy in Seattle, he’s a Level 17 TMJ Epic Master, namaste.” Most of what sounds promising to the layperson just sounds like a line of bull to me.
Fascinating how many people clearly think Paul’s story was almost identical to their own.
All these seemingly “identical” cases have got me pondering: syndromes consist of non-specific symptoms by definition, and batches of such symptoms will always seem more similar than they actually are… because blurry pictures look more alike than sharp and clear ones. Non-specific symptoms are generalized biological reactions to adversity. Anxiety can cause any of them, and so can cancer. Any complex cases without pathognomic (specific, defining) symptoms are bound to have extensive overlap of their non-specific symptoms.
There are many ways to be sick, and relatively few ways to feel bad.
Peter was exceptionally prepared, had a definite point of view, but was open to what I said. In the end seem to be persuaded by me on a number of points. The resulting article in Inverse faithfully conveyed my perspective and juxtaposed quotes from me with those from an author of the Psych Science piece in a kind of debate.
My point of view
When evaluating an article about mindfulness in a peer-reviewed journal, we need to take into account that authors may not necessarily be striving to do the best science, but to maximally benefit their particular brand of mindfulness, their products, or the settings in which they operate. Many studies of mindfulness are a little more than infomercials, weak research intended only to get mindfulness promoters’ advertisement of themselves into print or to allow the labeling of claims as “peer-reviewed”. Caveat Lector.
We cannot assume authors of mindfulness studies are striving to do the best possible science, including being prepared for the possibility of being proven incorrect by their results. Rather they may be simply try to get the strongest possible claims through peer review, ignoring best research practices and best publication practices.
There was much from the author of the Psych Science article with which I would agree:
“In my opinion, there are far too many organizations, companies, and therapists moving forward with the implementation of ‘mindfulness-based’ treatments, apps, et cetera before the research can actually tell us whether it actually works, and what the risk-reward ratio is,” corresponding author and University of Melbourne research fellow Nicholas Van Dam, Ph.D. tells Inverse.
“People are spending a lot of money and time learning to meditate, listening to guest speakers about corporate integration of mindfulness, and watching TED talks about how mindfulness is going to supercharge their brain and help them live longer. Best case scenario, some of the advertising is true. Worst case scenario: very little to none of the advertising is true and people may actually get hurt (e.g., experience serious adverse effects).”
But there were some statements that renewed the discomfort and disappointment I experienced when I read the original article in Psychological Science:
“I think the biggest concern among my co-authors and I is that people will give up on mindfulness and/or meditation because they try it and it doesn’t work as promised,” says Van Dam.
“There may really be something to mindfulness, but it will be hard for us to find out if everyone gives up before we’ve even started to explore its best potential uses.”
So, how long before we “give up” on thousands of studies pouring out of an industry? In the meantime, should consumers act on what seem to be extravagant claims?
The Inverse article segued into some quotes from me after delivering another statement from the author which I could agree:
The authors of the study make their attitudes clear when it comes to the current state of the mindfulness industry: “Misinformation and poor methodology associated with past studies of mindfulness may lead public consumers to be harmed, misled, and disappointed,” they write. And while this comes off as unequivocal, some think they don’t go far enough in calling out specific instances of quackery.
“It’s not bare-knuckle, that’s for sure. I’m sure it got watered down in the review process,” James Coyne, Ph.D., an outspoken psychologist who’s extensively criticized the mindfulness industry, tells Inverse.
Coyne agrees with the conceptual issues outlined in the paper, specifically the fact that many mindfulness therapies are based on science that doesn’t really prove their efficacy, as well as the fact that researchers with copyrights on mindfulness therapies have financial conflicts of interest that could influence their research. But he thinks the authors are too concerned with tone policing.
“I do appreciate that they acknowledged other views, but they kept out anybody who would have challenged their perspective,” he says.
Regarding Coyne’s criticism about calling out individuals, Van Dam says the authors avoided doing that so as not to alienate people and stifle dialogue.
“I honestly don’t think that my providing a list of ‘quacks’ would stop people from listening to them,” says Van Dam. “Moreover, I suspect my doing so would damage the possibility of having a real conversation with them and the people that have been charmed by them.” If you need any evidence of this, look at David “Avocado” Wolfe, whose notoriety as a quack seems to make him even more popular as a victim of “the establishment.” So yes, this paper may not go so far as some would like, but it is a first step toward drawing attention to the often flawed science underlying mindfulness therapies.
To whom is the dialogue directed about unwarranted claims from the mindfulness industry?
As one of the authors of an article claiming to be an authoritative review from a group of psychologists with diverse expertise, Van Dam says he is speaking to consumers. Why won’t he and his co-authors provide citations and name names so that readers can evaluate for themselves what they are being told? Is the risk of reputational damage and embarrassment to the psychologists so great as to cause Van Dam to protect them versus protecting consumers from the exaggerated and even fraudulent claims of psychologists hawking their products branded as ‘peer-reviewed psychological and brain science’.
I use the term ‘quack’ sparingly outside of discussing unproven and unlikely-to-be-proven products supposed to promote physical health and well-being or to prevent or cure disease and distress.
I think Harvard psychologist Ellen Langer deserves the term “quack” for her selling of expensive trips to spas in Mexico to women with advanced cancer so that they can change their mind set to reverse the course of their disease. Strong evidence, please! Given that this self-proclaimed mother of mindfulness gets her claims promoted through the Association for Psychological Science website, I think it particularly appropriate for Van Dam and his coauthors to name her in their publication in an APS journal. Were they censored or only censoring themselves?
Let’s put aside psychologists who can be readily named as quacks. How about Van Dam and co-authors naming names of psychologists claiming to alter the brains and immune systems of cancer patients with mindfulness practices so that they improve their physical health and fight cancer, not just cope better with a life-altering disease?
I simply don’t buy Van Dam’s suggestion that to name names promotes quackery any more than I believe exposing anti-vaxxers promotes the anti-vaccine cause.
Is Van Dam only engaged in a polite discussion with fellow psychologists that needs to be strictly tone-policed to avoid offense or is he trying to reach, educate, and protect consumers as citizen scientists looking after their health and well-being? Maybe that is where we parted ways.
The SMILE trial holds many anomalies and leaves us with more questions than answers.
A guest post by Dr. Keith Geraghty
Honorary Research Fellow at the University of Manchester, Centre for Primary Care, Division of Population Health and Health Services Research
The Advertising Standards Authority previously ruled that the Lightning Process (LP) should not be advertised as a treatment for CFS/ME. So how then, did LP end up getting tested as a treatment in a clinical trial involving adolescents with CFS/ME? Publication of the trial sparked controversy after it was claimed that LP, in addition to specialist medical care, out-performed specialist medical care alone. This blog attempts to shed light on just how a quack alternative online teaching programme, ended up in a costly clinical trial and discusses how the SMILE trial exemplifies all that is wrong with contemporary psycho-behavioural trials; that are clearly vulnerable to bias and spin.
The SMILE trial compared LP plus specialist medical care (SMC) to SMC alone (commonly a mix of cognitive behavioural therapy and graded exercise therapy). LP is a trademarked training programme created by Phil Parker from osteopathy, life coaching and neuro-linguistic programming. It costs over £600 and after assessment and telephone briefings, clients attend group sessions over three days. While there is much secrecy about what exactly these sessions involve, a cursory search online shows us that past clients were told to ‘block out all negative thoughts’ and to consider themselves well, not sick. A person with an illness is said to be ‘doing illness’ (LP spells doing as duing, to signify LP means more than just doing). LP appears to attempt to get a participant to ‘stop doing’ by blocking negative thoughts and making positive affirmations.
Leading psychologists have raised concerns. Professor James Coyne called LP “quackery” and said neuro-linguistic programming “…has been thoroughly debunked for its pseudoscience”. In an expert reaction to the SMILE trial for the Science Media Centre, Professor Dorothy Bishop of Oxford University stated: “the intervention that was assessed is commercial and associated with a number of warning signs. The Lightning Process appears based on neuro-linguistic programming, which, despite its scientific-sounding name, has long been recognised as pseudoscience“.
The first and most obvious question is why did the SMILE trial take place? Trial lead Professor Esther Crawley, who runs an NHS paediatric CFS/ME clinic, says she undertook the trial after many of her patients and their parents asked about LP. Patients with CFS/ME often report a lack of support from doctors and health care providers and some turn to the internet seeking help; some are drawn to try alternative approaches, such as LP. But is that justification enough for spending over £160,000 on testing LP on children? I think not. Should we test every quack approach peddled online: herbs, crystals, spiritual healing – particularly when funding in CFS/ME research is so limited currently? There must also be a compelling scientific plausibility to justify a trial. Simply wanting to see if something helps, does not merit adequate justification.
The SMILE trial has a fundamental design flaw. The trial compared specialist medical care alone (SMC) against SMC plus LP (SMC&LP). To the novice observer this may appear acceptable, but clinical trials are used to test item x against item y. For example, imagine trying to see which drug works better, drug A or drug B, you would not give drug A to one group and both drugs A and B to another group – yet this is exactly what happened in SMILE. In seeking to test LP, Prof. Crawley gave LP&SMC together – rendering any findings from this trial arm as pretty meaningless. The proper controls were missing. In addition, a trial of this magnitude would normally have a third arm, a do-nothing or usual care group, or another talk therapy control – yet such controls were missing.
Next we turn to the trial’s primary outcome measures. These were subjective self-reports of changes in physical function (using SF-36). Secondary outcomes were quality of life, anxiety and school attendance. These outcomes were assessed at 6 months with a follow-up at 12 months. It is reported that SMC+LP outperformed SMC alone on these measures at 6 and maintained at 12 months. However, there is no way to determine whether any claimed improvements came from LP alone, given LP was mixed with SMC. We could assume that LP+SMC meant more support, positive expectations and increased contact time. Here we see how farcical SMILE is as a trial. We have one group getting two treatments (possible double help) and one group getting one treatment (possible half help).
Of particular concern is how few of the available patients enrolled in and completed the trial: 637 children aged 12-18 attended screening or appointment at a specialist CFS/ME clinic; fewer than half (310) were deemed eligible; just 136 consented to receiving trial information and then only 100 were randomised (less than 1/3 of the eligible group). 49 had SMC and 51 had SMC+LP. Overall 207 patients either declined to participate or were not sufficiently interested to return the consent form. Were patients self-selecting? Were those less likely to respond to nonspecific factors choosing not to participate, and were we left with a group interested in LP – give Prof. Crawley said many patients asked about LP?
As the trial progressed, patients dropped out: of the 51 participants allocated to SMC+LP, only 39 received full SMC+LP. At 6-month assessment just 38 of the 48 allocated to SMC and 46 of the 51 in SMC+LP are fully recorded. At 12 months there are further losses to follow-up in both cohorts: 14% in LP and 24% in SMC. The reasons for participant loss are not fully clear, though the paper reports 5 adverse events (3 in the SMC+LP arm). It is worth noting that physical function at 6 months deteriorated in 9 participants (roughly 10% overall), 8 in the SMC arm, with 5 participants having a fall of ≤10 on the SF-36 physical function subscale (deemed not clinically important). Again questions are raised as to whether some degree of self-selection took place? The fact 3 of the participants assigned to SMC alone appear to have received LP reflects possible contamination of research cohorts that are meant to be kept apart.
Seven problems stand out in SMILE:
The use of the SF-36 physical function test was questionable. This self-report instrument is not designed or adequately validated for use in children.
Many of the participants appear to have had symptoms of anxiety and depression at the start of the trial. SMILE defined anxiety and depression as a score of ≥12 out of 22 on the self-report HADS. Usually a score of 8 or above is considered positive for mild anxiety and depression, and of above 12 for moderate anxiety and depression. The average mean HADS score at trial entry was 9.6 (meaning using standard cut-offs, most participants met a criteria for anxiety and depression). On the Spence Anxiety Scale (SCAS) the average entry score was 35, with above 33 indicative of anxiety in this age group. Such mild to moderate elevations in depression and anxiety symptoms are very responsive to nonspecific support.
There is an anomaly in the data on improvement: in the physical function test, the average base level of the children at entry into the trial was 54.5 (n=99), considered severely physically impaired. Only 52.5% of participants had been able to attend at least 3 days of school in the week prior to their entry into the study. Yet those assigned to SMC+LP were well enough to attend 3 consecutive days of sessions lasting 4 hours. The reports of severe physical disablement do not match the capabilities of those who participated in the course. Were the children’s self-reported poor physical abilities exaggerated to justify enrolment in the trial? Were the children’s elevated depression and anxiety symptoms responsive to the nonspecific elements in extra time of being assigned to LP plus standard care?
If the subjective self-report is accepted as a recovery criterion, in LP, just 12 hours of talk therapy, added to SMC would cure the majority of children with CFS. Such an effect would be astonishing, if true. In randomized controlled trials in adults with CFS/ME, such dramatic restoration of physical function (a wholesale return to near normal) is universally not seen. The SMILE Trial is clearly unbelievable.
SMILE’s reliance on the broad NICE criteria means there is a clear risk patients were included in the trial who would not have met stricter definitions of the illness. There is a growing concern that loose entry criteria in clinical trials in ME/CFS allow enrolments of many participants who do not in fact have ME/CFS. A detailed study of CFS prevalence found many children are wrongly diagnosed with CFS, when they may just be suffering from general fatigue and/or mental health complaints (Jones et al., 2004). SMILE uses NICE guidelines to diagnose CFS: fatigue must be present for at least 3 months with one or more of four other symptoms, which can be as general as sleep disturbance. In contrast, Jones et al. showed that using the Centre for Disease Control criteria of at least four specific symptoms alongside detailed clinical examination, many children believed to have CFS are diagnosed with other exclusionary disorders, often general fatigue, mental health complaints, drug and alcohol abuse or eating disorders (that are often not readily disclosed to parents or doctors).
LP involves attempting to coerce clients into thinking that they have control over their symptoms and to block out symptoms. This alone would distort any response by a participant in a follow-on questionnaire about symptoms.
LP was delivered by people from the Lightning Process Company. Phil Parker and his employees held a clear financial interest in a positive outcome in SMILE. Such an obvious conflict of interest is hard to disentangle and totally nullifies any outcomes from this trial.
The SMILE trial holds many anomalies and leaves us with more questions than answers.
It is not clear whether the children enrolled in the trial, diagnosed with CFS using NICE criteria, might of been deemed non-CFS using more stringent clinical screening (e.g. CDC or IOM Criteria).
There is no way of determining whether any effect following SMC+LP was anything more than the result of non-specific factors, psychological tricks and persuasion.
The fact LP+SMC appears to have cured the majority of participants with as little as 12 hours talk therapy is a big flashing red light that this trial is clearly fundamentally flawed.
There is a very real danger of promoting LP as a treatment for CFS/ME: The UK ME Association conducted a survey of members (4,217 members) and found that 20% of those who tried LP reported feeling worse (7.9% slightly worse,12.9% much worse). SMILE cannot be, and should not be, used to justify LP as a treatment for CFS/ME.
The Lightning Process has no scientific credibility and this trial highlights a fundamental flaw in contemporary clinical trials: they are susceptible to suggestion, bias and spin. The SMILE trial appears to draw paediatric CFS/ME clinical care for children into a swamp of pseudoscience and mysticism. This is a clear step backward. There is little to smile about after reviewing the SMILE trial.
Dr. Geraghty is currently an Honorary Research Fellow within the Centre for Primary Care, Division of Population Health and Health Services Research at the University of Manchester. He previously worked as a research associate at Cardiff University and Imperial College London. He left a career in clinical medicine after becoming ill with ME/CFS. The main themes of his work are doctor-patient relationships, medically unexplained symptoms, quality and safety in health care delivery, physician well-being and evidence-based medicine. He has a special interest in medically unexplained symptoms (MUS), and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome.
1. Crawley, E., et al., Chronic disabling fatigue at age 13 and association with family adversity. Pediatrics, 2012. 130(1): p. e71-e79.
2. Crawley, E.M., et al., Clinical and cost-effectiveness of the Lightning Process in addition to specialist medical care for paediatric chronic fatigue syndrome: randomised controlled trial. Archives of Disease in Childhood, 2017.
3. Jones, J.F., et al., Chronic fatigue syndrome and other fatiguing illnesses in adolescents: a population-based study. Journal of Adolescent Health, 2004. 35(1): p. 34-40.
The tour of the sausage factory is starting, here’s your brochure telling you’ll see.
A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.
Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release. Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:
“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”
“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”
[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]
Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.
Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.
This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.
Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments. Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:
More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.
Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review. But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.
Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]
As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA. We can be no more confident with what the trials provide than we would be without them ever having been done.
I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.
I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.
Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.
The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.
That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored. If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing. Only a handful of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.
One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?
For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.
Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.
Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.
If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:
In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).
Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.
Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.
Some general rules:
Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.
Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.
When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!
Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.
Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up. If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.
Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.
What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.
For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.
I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.
Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.
In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants. A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.
Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.
Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).
Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.
Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.
But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.
Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.
Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.” Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.
Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:
One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.
How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?
The trial was registered long after patient recruitment had started and the trial protocol can be found here
[Aside: What is the value of registering a trial long after recruitment commenced? Do journal articles have a responsibility to acknowledge a link they publish for trial registration is for what occurred after the trial commenced? Is trial registration another ritual like acupuncture?]
Uncritical reports of the results of the trial as interpreted by the authors echoed through both the lay and physician-aimed media.
Coverage by Reuters was somewhat more interesting than the rest. The trial authors’ claim that acupuncture for preventing migraines was ready for prime time was paired with some reservations expressed in the accompanying editorial.
“Placebo response is strong in migraine treatment studies, and it is possible that the Deqi sensation . . . that was elicited in the true acupuncture group could have led to a higher degree of placebo response because there was no attempt made to elicit the Deqi sensation in the sham acupuncture group,” Dr. Amy Gelfand writes in an accompanying editorial.
Come on, Dr. Gelfand, if you checked the article, you would have that Deqi is not measured. If you checked the literature, even proponents concede that Deqi remains a vague, highly subjective judgment in, this case, being made by an unblinded acupuncturist. Basically the acupuncturist persisted in whatever was being done until there was indication that a sensation of soreness, numbness, distention, or radiating seemed to be elicited from the patient. What part of a subjective response to acupuncture, with or without Deqi, would you consider NOT a placebo response?
Dr. Gelfand also revealed some reasons why she may bother to write an editorial for a treatment with an incoherent and implausible nonscientific rationale.
“When I’m a researcher, placebo response is kind of a troublesome thing, because it makes it difficult to separate signal from noise,” she said. But when she’s thinking as a doctor about the patient in front of her, placebo response is welcome, Gelfand said.
“You know, what I really want is my patient to feel better, and to be improved and not be in pain. So, as long as something is safe, even if it’s working through a placebo mechanism, it may still be something that some patients might want to use,” she said.
Let’s contemplate the implications of this. This editorial in JAMA Internal Medicine accompanies an article in which the trial author suggests acupuncture is ready to become a standard treatment for migraine. There is nothing in the article to which suggests that the unscientific basis of acupuncture had been addressed, only that it might have achieved a placebo response. Is Dr. Gelfand suggesting that would be sufficient, although there are some problems in the trial. What if that became the standard for recommending medications and medical procedures?
With increasing success in getting acupuncture and other now-called “integrative medicine” approaches ensconced in cancer centers and reimbursed by insurance, will be facing again and again some of the issues that started this blog post. Is acupuncture not doing obvious from a reason for reimbursing it? Trials like this one can be cited in support for reimbursement.
The JAMA: Internal Medicine report of an RCT of acupuncture for preventing migraines
Participants were randomly assigned to one of three groups: true acupuncture, sham acupuncture, or a waiting-list control group.
Participants in the true acupuncture and sham acupuncture groups received treatment 5 days per week for 4 weeks for a total of 20 sessions.
Participants in the waiting-list group did not receive acupuncture but were informed that 20 sessions of acupuncture would be provided free of charge at the end of the trial.
As the editorial comment noted, this is incredibly intensive treatment that burdens patients coming in five days a week for treatment for four weeks. Yet the effects were quite modest in terms of number of migraine attacks, even if statistically significant:
The mean (SD) change in frequency of migraine attacks differed significantly among the 3 groups at 16 weeks after randomization (P < .001); the mean (SD) frequency of attacks decreased in the true acupuncture group by 3.2 (2.1), in the sham acupuncture group by 2.1 (2.5), and the waiting-list group by 1.4 (2.5); a greater reduction was observed in the true acupuncture than in the sham acupuncture group (difference of 1.1 attacks; 95%CI, 0.4-1.9; P = .002) and in the true acupuncture vs waiting-list group (difference of 1.8 attacks; 95%CI, 1.1-2.5; P < .001). Sham acupuncture was not statistically different from the waiting-list group (difference of 0.7 attacks; 95%CI, −0.1 to 1.4; P = .07).
There were no group by time differences in use of medication for migraine. Receiving “true” versus sham acupuncture did not matter.
Four acupoints were used per treatment. All patients received acupuncture on 2 obligatory points, including GB20 and GB8. The 2 other points were chosen according to the syndrome differentiation of meridians in the headache region. The potential acupoints included SJ5, GB34, BL60, SI3, LI4, ST44, LR3, and GB40.20. The use of additional acupoints other than the prescribed ones was not allowed.We chose the prescriptions as a result of a systematic review of ancient and modern literature,22,23 consensus meetings with clinical experts, and experience from our previous study.
Note that the “headache region” is not the region of the head where headaches occur, the selection of which there is no scientific basis. Since when does such a stir fry of ancient and contemporary wisdom, consensus meetings with experts, and the clinical experience of the investigators become the basis of the mechanism justified for study in a clinical trial published in a prestigious American medical journal?
What was sham about the sham acupuncture (SA) treatment?
The number of needles, electric stimulation, and duration of treatment in the SA group were identical in the TA group except that an attempt was not made to induce the Deqi sensation. Four nonpoints were chosen according to our previous studies.
From the trial protocol, we learn that the effort to induce the Deqi sensation involves the acupuncturist twirling and rotating the needles.
In a manner that can easily escape notice, the authors indicate that they acupuncture was administered by electro stimulation.
In the methods section, they abruptly state:
Electrostimulation generates an analgesic effect, as manual acupuncture does.21
I wonder if the reviewers or the editorialist checked this reference. It is to an article that provides the insight that “meridians” -the 365 designated acupuncture points- are identified on a particular patient by
feeling for 12 organ-specific pulses located on the wrists and with cosmological interpretations including a representation of five elements: wood, water, metal, earth, and fire.
The authors further state that they undertook a program of research to counter the perception in the United States in the 1970s that acupuncture was quackery and even “Oriental hypnosis.” Their article describes some of the experiments they conducted, including one in which the benefits of a rabbit having received finger-pressure acupuncture was transferred to another via a transfusion of cerebrospinal fluid.
In discussing the results of the present study in JAMA Internal Medicine, the authors again comment in passing:
We added electrostimulation to manual acupuncture because manual acupuncture requires more time until it reaches a similar analgesic effect as electrical stimulation.27 Previous studies have reported that electrostimulation is better than manual acupuncture in relieving pain27-30 and could induce a longer lasting effect.28
The citations are to methodologically poor laboratory studies in which dramatic results are often obtained with very small cell size (n= 10).
Can we dispense with the myth that the acupuncture provided in this study is an extension of traditional Chinese needle therapy?
It is high time that we dispense with the notion that acupuncture applied to migraines and other ailments represents a traditional Chinese medicine that is therefore not subject to any effort to critique its plausibility and status as a science-based treatment. If we dispense with that idea, we still have to confront how unscientific and nonsensical the rationale is for the highly ritualized treatment provided in this study.
reformed and “sanitized” acupuncture and the makeshift theoretical framework of Maoist China that have flourished in the West as “Traditional,” “Chinese,” “Oriental,” and most recently as “Asian” medicine.
Kavoussi, who studied to become an acupuncturist, notes that:
Traditional theories for selecting points and means of stimulation are not based on an empirical rationale, but on ancient cosmology, astrology and mythology. These theories significantly resemble those that underlined European and Islamic astrological medicine and bloodletting in the Middle-Ages. In addition, the alleged predominance of acupuncture amongst the scholarly medical traditions of China is not supported by evidence, given that for most of China’s long medical history, needling, bloodletting and cautery were largely practiced by itinerant and illiterate folk-healers, and frowned upon by the learned physicians who favored the use of pharmacopoeia.
In the early 1930s a Chinese pediatrician by the name of Cheng Dan’an (承淡安, 1899-1957) proposed that needling therapy should be resurrected because its actions could potentially be explained by neurology. He therefore repositioned the points towards nerve pathways and away from blood vessels-where they were previously used for bloodletting. His reform also included replacing coarse needles with the filiform ones in use today.38 Reformed acupuncture gained further interest through the revolutionary committees in the People’s Republic of China in the 1950s and 1960s along with a careful selection of other traditional, folkloric and empirical modalities that were added to scientific medicine to create a makeshift medical system that could meet the dire public health and political needs of Maoist China while fitting the principles of Marxist dialectics. In deconstructing the events of that period, Kim Taylor in her remarkable book on Chinese medicine in early communist China, explains that this makeshift system has achieved the scale of promotion it did because it fitted in, sometimes in an almost accidental fashion, with the ideals of the Communist Revolution. As a result, by the 1960s acupuncture had passed from a marginal practice to an essential and high-profile part of the national health-care system under the Chinese Communist Party, who, as Kim Taylor argues, had laid the foundation for the institutionalized and standardized format of modern Chinese medicine and acupuncture found in China and abroad today.39 This modern construct was also a part of the training of the “barefoot doctors,” meaning peasants with an intensive three- to six-month medical and paramedical training, who worked in rural areas during the nationwide healthcare disarray of the Cultural Revolution era.40 They provided basic health care, immunizations, birth control and health education, and organized sanitation campaigns. Chairman Mao believed, however, that ancient natural philosophies that underlined these therapies represented a spontaneous and naive dialectical worldview based on social and historical conditions of their time and should be replaced by modern science.41 It is also reported that he did not use acupuncture and Chinese medicine for his own ailments.42
What is a suitable comparison/control group for a theatrical administration of a placebo?
A randomized double-blind crossover pilot study published in NEJM highlight some of the problems arising from poorly chosen control groups. The study compared an inhaled albuterol bronchodilator to one of three control conditions placebo inhaler, sham acupuncture, or no intervention. Subjective self-report measures of perceived improvement in asthma symptoms and perceived credibility of the treatments revealed only that the no-intervention condition was inferior to the active treatment of inhaled albuterol and the two placebo conditions, but no difference was found between the active treatment and the placebo conditions. However, strong differences were found between the active treatment in the three comparison/control conditions in an objective measure of physiological responses – improvement in forced expiratory volume (FEV1), measured with spirometry.
One take away lesson is we should be careful about accepting subjective self-report measures when objective measures are available. One objective measure in the present study was the taking of medication for migraines and there were no differences between groups. This point is missed in both the target article in JAMA Internal Medicine and the accompanying editorial.
The editorial does comment on the acupuncturists being unblinded – they clearly knew when they are providing the preferred “true” acupuncture and when they were providing sham. They had some instructions to avoid creating a desqi sensation in the sham group, but some latitude in working till it was achieved in the “true” group. Unblinded treatment providers are always a serious risk of bias in clinical trials, but we here we have a trial where the primary outcomes are subjective, the scientific status of desqi is dubious, and the providers might be seen as highly motivated to promote the “true” treatment.
I’m not sure why the editorialist was not stopped in her tracks by the unblinded acupuncturists – or for that matter why the journal published this article. But let’s ponder a bit difficulties in coming up with a suitable comparison/control group for what is – until proven otherwise – a theatrical and highly ritualized placebo. If a treatment has no scientifically valid crucial ingredient, how we construct a comparison/control group differs only in the absence of the active ingredient, but is otherwise equivalent?
There is a long history of futile efforts to apply sham acupuncture, defined by what practitioners consider the inappropriate meridians. An accumulation of failures to distinguish such sham from “true” acupuncture in clinical trials has led to arguments that the distinction may not be valid: the efficacy of acupuncture may depend only on the procedure, not choice of a correct meridian. There are other studies would seem to show some advantage to the active or “true” treatments. These are generally clinical trials with high risk of bias, especially the inability to blind practitioners as to what she treatment they are providing.
There are been some clever efforts to develop sham acupuncture techniques that can fool even experienced practitioners. A recent PLOS One article tested needles that collapsed into themselves.
Up to 68% of patients and 83% of acupuncturists correctly identified the treatment, but for patients the distribution was not far from 50/50. Also, there was a significant interaction between actual or perceived treatment and the experience of de qi (p = 0.027), suggesting that the experience of de qi and possible non-verbal clues contributed to correct identification of the treatment. Yet, of the patients who perceived the treatment as active or placebo, 50% and 23%, respectively, reported de qi. Patients’ acute pain levels did not influence the perceived treatment. In conclusion, acupuncture treatment was not fully double-blinded which is similar to observations in pharmacological studies. Still, the non-penetrating needle is the only needle that allows some degree of practitioner blinding. The study raises questions about alternatives to double-blind randomized clinical trials in the assessment of acupuncture treatment.
Thirty-six studies were included for qualitative analysis while 14 were in the meta-analysis. The meta-analysis does not support the notion of either the Streitberger or the Park Device being inert control interventions while none of the studies involving the Takakura Device was included in the meta-analysis. Sixteen studies reported the occurrence of adverse events, with no significant difference between verum and placebo acupuncture. Author-reported blinding credibility showed that participant blinding was successful in most cases; however, when blinding index was calculated, only one study, which utilised the Park Device, seemed to have an ideal blinding scenario. Although the blinding index could not be calculated for the Takakura Device, it was the only device reported to enable practitioner blinding. There are limitations with each of the placebo devices and more rigorous studies are needed to further evaluate their effects and blinding credibility.
Really, must we we await better technology the more successfully fool’s acupuncturists and their patients whether they are actually penetrating the skin?
Results Between baseline and weeks 9 to 12, the mean (SD) number of days with headache of moderate or severe intensity decreased by 2.2 (2.7) days from a baseline of 5.2 (2.5) days in the acupuncture group compared with a decrease to 2.2 (2.7) days from a baseline of 5.0 (2.4) days in the sham acupuncture group, and by 0.8 (2.0) days from a baseline if 5.4 (3.0) days in the waiting list group. No difference was detected between the acupuncture and the sham acupuncture groups (0.0 days, 95% confidence interval, −0.7 to 0.7 days; P = .96) while there was a difference between the acupuncture group compared with the waiting list group (1.4 days; 95% confidence interval; 0.8-2.1 days; P<.001). The proportion of responders (reduction in headache days by at least 50%) was 51% in the acupuncture group, 53% in the sham acupuncture group, and 15% in the waiting list group.
Conclusion Acupuncture was no more effective than sham acupuncture in reducing migraine headaches although both interventions were more effective than a waiting list control.
I welcome someone with more time on their hands to compare and contrast the results of these two studies and decide which one has more credibility.
Maybe we step should back and ask “why is anyone care about such questions, when there is such doubt that a plausible scientific mechanism is in play?”
Time for JAMA: Internal Medicine to come clean
The JAMA: Internal Medicine article on acupuncture for prophylaxis of migraines is yet another example of a publication where revelation of earlier drafts, reviewer critiques, and author responses would be enlightening. Just what standard to which the authors are being held? What issues were raised in the review process? Beyond resolving crucial limitations like blinding of acupuncturists, under what conditions would be journal conclude that studies of acupuncture in general are sufficiently scientifically unsound and medically irrelevant to warrant publication in a prestigious JAMA journal.
Alternatively, is the journal willing to go on record that it is sufficient to establish that patients are satisfied with a pain treatment in terms of self-reported subjective experiences? Could we then simply close the issue of whether there is a plausible scientific mechanism involved where the existence of one can be seriously doubted? If so, why stop with evaluations with subjective pain would days without pain as the primary outcome?
We must question the wisdom of JAMA: Internal Medicine of inviting Dr. Amy Gelfand for editorial comment. She is apparently willing to allow that demonstration of a placebo response is sufficient for acceptance as a clinician. She also is attached to the University of California, San Francisco Headache Center which offers “alternative medicine, such as acupuncture, herbs, massage and meditation for treating headaches.” Endorsement of acupuncture in a prestigious journal as effective becomes part of the evidence considered for its reimbursement. I think there are enough editorial commentators out there without such conflicts of interest.
I will soon be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. Sign up at my new website to get notified about these courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.
According to the website of an advocacy foundation, coverage of two recent clinical trials published in in Journal of Psychopharmacology evaluating psilocybin for distress among cancer patients garnered over 1 billion views in the social media. To put that in context, the advocacy group claimed that this is one sixth of the attention that the Super Bowl received.
In this blog post I’ll review the second of the two clinical trials. Then, I will discuss some reasons why we should be concerned about the success of this public relations campaign in terms of what it means for both the integrity of scientific publishing, as well as health and science journalism.
The issue is not doubt that cancer patients will find benefit from the ingesting psychedelic mushroom in a safe environment. Nor that sale and ingestion of psilocybin is currently criminalized (Schedule 1, classified same as heroin).
We can appreciate the futility of the war on drugs, and the absurdity of the criminalization of psilocybin, but still object to how, we were strategically and effectively manipulated by this PR campaign.
Even if we approve of a cause, we need to be careful about subordinating the peer-review process and independent press coverage to the intended message of advocates.
Tolerating causes being promoted in this fashion undermines the trustworthiness of peer review and of independent press coverage of scientific papers.
To contradict a line from the 1964 acceptance speech of Republican Presidential Candidate Barry Goldwater, “Extremism in pursuit of virtue is no [a] vice. “
In this PR campaign –
We witnessed the breakdown of expected buffer of checks and balances between:
An advocacy group versus reporting of clinical trials in a scientific journal evaluating its claims.
Investigators’ exaggerated self-promotional claims versus editorial review and peer commentary.
Materials from the publicity campaign versus supposedly independent evaluation by journalists.
Is this part of a larger trend, where advocacy and marketing shape supposedly peer-reviewed publications in prestigious medical journals?
The public relations campaign for the psilocybin RCTs also left in tatters the credibility of altmetrics as an alternative to journal impact factors. The orchestrating of 1 billion views is a dramatic demonstration how altmetrics can be readily gamed. Articles published in a journal with a modest impact factor scored spectacularly, as seen in these altmetrics graphics the Journal of Psychopharmacology posted.
I reviewed in detail one of the clinical trials in my last blog post and will review the second in this one. They are both mediocre, poorly designed clinical trials that got lavishly praised as being highest quality by an impressive panel of commentators. I’ll suggest that in particular the second trial is best seen as what Barney Caroll has labeled an experimercial, a clinical trial aimed at generating enthusiasm for a product, rather than a dispassionate evaluation undertaken with some possibility of not been able to reject the null hypothesis. If this sounds harsh, please indulge me and read on and be entertained and I think persuaded that this was not a clinical trial but an elaborate ritual, complete with psychobabble woo that has no place in the discussion of the safety and effectiveness of medicine.
After skeptically scrutinizing the second trial, I’ll consider the commentaries and media coverage of the two trials.
I’ll end with a complaint that this PR effort is only aimed at securing the right of wealthy people with cancer to obtain psilocybin under supervision of a psychiatrist and in the context of woo psychotherapy. The risk of other people in other circumstances ingesting psilocybin is deliberately exaggerated. If psilocybin is as safe and beneficial as claimed by these articles, why should use remain criminalized for persons who don’t have cancer or don’t want to get a phony diagnosis from a psychiatrist or don’t want to submit to woo psychotherapy?
The normally pay walled Journal of Psychopharmacology granted free access to the two articles, along with most but not all of the commentaries. However, extensive uncritical coverage in Medscape Medical News provides a fairly accurate summary, complete with direct quotes of lavish self-praise distributed by the advocacy-affiliated investigators and echoed in seemingly tightly coordinated commentaries.
The praise one of the two senior authors heaped upon their two studies as captured in Medscape Medical News and echoed elsewhere:
The new findings have “the potential to transform the care of cancer patients with psychological and existential distress, but beyond that, it potentially provides a completely new model in psychiatry of a medication that works rapidly as both an antidepressant and anxiolytic and has sustained benefit for months,” Stephen Ross, MD, director of Substance Abuse Services, Department of Psychiatry, New York University (NYU), Langone Medical Center, told Medscape Medical News.
“That is potentially earth shattering and a big paradigm shift within psychiatry,” Dr Ross told Medscape Medical News.
The trial’s available registration is at ClinicalTrial.gov is available here.
The trial’s website is rather drab and typical for clinical trials. It contrasts sharply with the slick PR of the website for the NYU trial . The latter includes a gushy, emotional video from a clinical psychologist participating as a patient in the study. She delivers a passionate pitch for the “wonderful ritual” of the transformative experimental session. You can also get a sense of how session monitor structured the session and cultivated positive expectations. You also get a sense of the psilocybin experience being slickly marketed to appeal to the same well-heeled patients who pay out-of-pocket for complementary and alternative medicine at integrative medicine centers.
Conflict of interest
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Roland Griffiths is on the Board of Directors of the Heffter Research Institute.
Heffter Research Institute is listed as one of the funders of the study.
The Hopkins study starts with some familiar claims from psycho-oncology ] that portray cancer as a mental health issue. The exaggerated estimates of 40% of cancer patients experiencing a mood disorder is arrived at by lumping adjustment reactions with a smaller proportion of diagnoses of generalized anxiety and major depression.
The introduction ends with a strong claim to the rigor and experimental control exercised in the clinical trial:
The present study provides the most rigorous evaluation to date of the efficacy of a classic hallucinogen for treatment of depressed mood and anxiety in psychologically distressed cancer patients. The study evaluated a range of clinically relevant measures using a double-blind cross-over design to compare a very low psilocybin dose (intended as a placebo) to a moderately high psilocybin dose in 51 patients under conditions that minimized expectancy effects.
The methods and results
In a nutshell: Despite claims to the contrary, this study cannot be considered a blinded study. At the six month follow-up, which is the outcome assessment point of greatest interest, it could no longer meaningfully considered a randomized trial. All benefits of randomization were lost. In addition, the effects of psilocybin were confounded with a woo psychotherapy in which positive expectations and support were provided and reinforced in a way that likely influenced assessments of outcome. Outcomes at six months also reflected changes in distress which would’ve occurred in the absence of treatment. The sample is inappropriate for generalizations about the treatment of major depression and generalized anxiety. The characterization of patients as facing impending death is inaccurate.
The study involved a crossover design, which provides a lower level of evidence than a placebo controlled comparison study. The study compared a high psilocybin dose (22 or 30 mg/70 kg) with a low dose (1 or 3 mg/70 kg) administered in identically appearing capsules. While the low dose might not be homeopathic, it can be readily distinguished soon after administration from the larger dosage. The second drug administration occurred approximately 5 weeks later. Not surprisingly, with the high difference in dosage, session monitors who were supposedly blinded readily identified the group to which the participant they were observing had been assigned.
Within a cross over design, the six month follow-up data basically attributed any naturalistic decline in distress to the drug treatments. As David Colquhoun would argue, any estimate of the effects of the drug was inflated by including regression to the mean and get-better anyway effects. Furthermore, the focus on outcomes at six months meant patients assigned to either group in the crossover design had received high dosage psilocybin by at least five weeks into the study. Any benefits of randomization were lost.
Like the NYU study, the study Johns Hopkins involves selecting a small, unrepresentative sample of a larger group responding to a mixed recruitment strategy utilizing flyers, the internet, and physician referral.
Less than 10% of the cancer patients calling in were randomized.
Almost half of the final sample were currently using marijuana and, similarly, almost half had used hallucinogens in the past.
The sample is relatively young for cancer patients and well educated. More than half had postgraduate education, almost all were white, but there were two black people.
The sample is quite heterogeneous with respect to psychiatric diagnoses, with almost half having an adjustment disorder, and the rest anxiety and mood disorders.
In terms of cancer diagnoses and staging, it was also a select and heterogeneous group with only about a quarter having recurrent/metastatic disease with less than two years of expected survival. This suggests the odd “life-threatening” in the title is misleading.
Any mental health effects of psilocybin as a drug are inseparable from the effects of accompanying psychotherapy designed by a clinical psychologist “with extensive experience in studies of classic hallucinogens.” Participants met with that “session monitor” several times before the session in which the psilocybin was ingested in the monitor guided and aided in the interpretation of the drug experience. Aside from providing therapy, the session monitor instructed the patient to have positive expectations before the ingestion of the drug and work to maintain these expectations throughout the experience.
I found this psychotherapeutic aspect of the trial strikingly similar to one that was included in a trial of homeopathy in Germany that I accepted for publication in PLOS One. [See here for my rationale for accepting the trial and the ensuing controversy.] Trials of alternative therapies notoriously have such an imbalance of nonspecific placebo factors favoring the intervention group.
The clinical trial registration indicates that the primary outcome was the Pahnke-Richards Mystical Experience Questionnaire. This measure is included among 20 participant questionnaires listed in the Table 3 in the article as completed seven hours after administration of psilocybin. Although I haven’t reviewed all of these measures, I’m skeptical about their psychometric development, intercorrelation, and validation beyond face validity. What possibly could be learned from administering such a battery?
The authors make unsubstantiated assumptions in suggesting that these measures either individually or collectively capture mediation of later response assessed by mental health measures. A commentary echoed this:
Mediation analysis indicates that the mystical experience was a significant mediator of the effects of psilocybin dose on therapeutic outcomes.
But one of the authors of the commentary later walked that back with a statement to Medscape Medical News:
As for the mystical experiences that some patients reported, it is not clear whether these are “a cause, consequence or corollary of the anxiolytic effect or unconstrained cognition.”
Clinical outcomes at six months are discussed in terms of multiple measures derived from the unblinded, clinician-rated Hamilton scales. However, there are repeated references to box scores of the number of significant findings from at least 17 clinical measures (for instance, significant effects for 11 of the 17 measures), in addition to other subjective patient and significant-other measures. It is unclear why the authors would choose to administer so many measures that are highly likely intercorrelated.
There were no adverse events attributed to administration of psilocybin, and while there were a number of adverse psychological effects during the session with the psilocybin, none were deemed serious.
My summary evaluation
The clinical trial registration indicates broad inclusion criteria which may suggest the authors anticipated difficulty in recruiting patients that had significant psychiatric disorder for which psychotropic medication would be appropriate, as well as difficulty obtaining cancer patients that actually had poorer prognoses. Regardless, descriptions of the study is focusing on anxiety and depression and on “life-threatening” cancer seem to be marketing. You typically do not see a mixed sample with a large proportion of adjustment reaction characterized in the title of a psychiatric journal as treatment of “anxiety” and “depression”. You typically do not see a the adjective “life-threatening” in the title of an oncology article with such a mixed sample of cancer patients.
The authors could readily have anticipated that at the six-month assessment point of interest that they no longer had a comparison they could have been described as a rigorous double-blind, randomized trial. They should have thought through exactly what was being controlled by a control comparison group of a minimal dose of psilocybin. They should have been clearer that they were not simply evaluating psilocybin, but psilocybin administered in the context of a psychotherapy and an induction of strong positive expectations and promise of psychological support.
The finding of a lack of adverse events is consistent with a large literature, but is contradicted in the way the study is described to the media.
The accompanying editorial and commentary
Medscape Medical News reports the numerous commentaries accompanies these two clinical trials were hastily assembled. Many of the commentaries read that way, with the authors uncritically passing on the psilocybin authors’ lavish self praise of their work, after a lot of redundant recounts of the chemical nature of psilocybin and its history in psychiatry. When I repeatedly encountered claims that these trials represented rigorous, double blinded clinical trials or suggestions that the cancer was in a terminal phase, I assumed that the authors had not read the studies, only the publicity material, or simply had suspended all commitment to truth.
I have great admiration for David Nutt and respect his intellectual courage in campaigning for the decriminalization of recreational drugs, even when he knew that it would lead to his dismissal as chairman of the UK’s Advisory Council on the Misuse of Drugs (ACMD). He has repeatedly countered irrationality and prejudice with solid evidence. His graph depicting the harms of various substances to the uses and others deserves the wide distribution that it has received.
He ends his editorial with praise for the two trials as “the most rigorous double-blind placebo-controlled trials of a psychedelic drug in the past 50 years.” I’ll give him a break and assume that that reflects his dismal assessment of the quality of the other trials. I applaud his declaration, available nowhere else in the commentaries that:
There was no evidence of psilocybin being harmful enough to be controlled when it was banned, and since then, it has continued to be used safely by millions of young people worldwide with a very low incidence of problems. In a number of countries, it has remained legal, for example in Mexico where all plant products are legal, and in Holland where the underground bodies of the mushrooms (so-called truffles) were exempted from control.
His description of the other commentaries accompanying the two trials is apt:
The honours list of the commentators reads like a ‘who’s who’ of American and European psychiatry, and should reassure any waverers that this use of psilocybin is well within the accepted scope of modern psychiatry. They include two past presidents of the American Psychiatric Association (Lieberman and Summergrad) and the past-president of the European College of Neuropsychopharmacology (Goodwin), a previous deputy director of the Office of USA National Drug Control Policy (Kleber) and a previous head of the UK Medicines and Healthcare Regulatory Authority (Breckenridge). In addition, we have input from experienced psychiatric clinical trialists, leading pharmacologists and cancer-care specialists. They all essentially say the same thing..
The other commentaries. I do not find many of the commentaries worthy of further comment. However, one by Guy M Goodwin, Psilocybin: Psychotherapy or drug? Is unusual in offering even mild skepticism about the way the investigators are marketing their claims:
The authors consider this mediating effect as ‘mystical’, and show that treatment effects correlate with a subjective scale to measure such experience. The Oxford English Dictionary defines mysticism as ‘belief that union with or absorption into the Deity or the absolute, or the spiritual apprehension of knowledge inaccessible to the intellect, may be attained through contemplation and self-surrender’. Perhaps a scale really can measure a relevant kind of experience, but it raises the caution that the investigation of hallucinogens as treatments may be endangered by grandiose descriptions of their effects and unquestioning acceptance of their value.
The experiences of salience, meaningfulness, and healing that accompanied these powerful spiritual experiences and that were found to be mediators of clinical response in both of these carefully performed studies are also important to understand in their own right and are worthy of further study and contemplation. None of us are immune from the transitory nature of human life, which can bring fear and apprehension or conversely a real sense of meaning and preciousness if we carefully number our days. Understanding where these experiences fit in healing, well-being, and our understanding of consciousness may challenge many aspects of how we think about mental health or other matters, but these well-designed studies build upon a recent body of work that confronts us squarely with that task.
Coverage in of the two studies in the media
The website for Heffter Research Institute provides a handy set of links to some of the press coverage of the studies have received. There’s remarkable sameness to the portrayal of the study in the media, suggesting that journalists stayed closely to the press releases, except occasionally supplementing these with direct quotes from the authors. The appearance of a solicitation of independent evaluation of the trial almost entirely dependent on the commentaries published with the two articles.
There’s a lot of slick marketing by the two studies’ authors. In addition to what I wrote noted earlier in the blog, there are recurring unscientific statements marketing the psilocybin experience:
“They are defined by a sense of oneness – people feel that their separation between the personal ego and the outside world is sort of dissolved and they feel that they are part of some continuous energy or consciousness in the universe. Patients can feel sort of transported to a different dimension of reality, sort of like a waking dream.
The new studies, however, suggest psilocybin be used only in a medical setting, said Dr. George Greer, co-founder, medical director and secretary at the Heffter Research Institute in Santa Fe, New Mexico, which funded both studies.
“Our focus is scientific, and we’re focused on medical use by medical doctors,” Greer said at the news conference. “This is a special type of treatment, a special type of medicine. Its use can be highly controlled in clinics with specially trained people.”
He added he doubts the drug would ever be distributed to patients to take home.
There are only rare admissions from an author of one of the studies that:
The results were similar to those they had found in earlier studies in healthy volunteers. “In spite of their unique vulnerability and the mood disruption that the illness and contemplation of their death has prompted, these participants have the same kind of experiences, that are deeply meaningful, spiritually significant and producing enduring positive changes in life and mood and behaviour,” he said.
I’m not sure that demand would be great except among previous users of psychedelics and current users of cannabis.
But should psilocybin remain criminalized outside of cancer centers where wealthy patients can purchase a diagnosis of adjustment reaction from a psychiatrist? Cancer is not especially traumatic and PTSD is almost as common in the waiting rooms of primary care physicians. Why not extend to primary care physicians the option of prescribing psilocybin to their patients? What would be accomplished is that the purity could be assured. But why should psilocybin use being limited to mental health conditions, once we accept that a diagnosis of adjustment reaction is such a distorted extension of the term? Should we exclude patients who are atheists and only wants a satisfying experience, not a spiritual one?
Experience in other countries suggests that psilocybin can safely be ingested in a supportive, psychologically safe environment. Why not allow cancer patients and others to obtain psilocybin with assured purity and dosage? They could then ingest it in the comfort of friends and intimate partners who have been briefed on how the experience needs to be managed. The patients in the studies were mostly not facing immediate death from terminal cancer. But should we require that persons need to be dying in order to have a psilocybin experience without the risk of criminal penalties? Why not allow psilocybin to be ingested in the presence of pastoral counselors or priests whose religious beliefs are more congruent with the persons seeking such experiences than are New York City psychiatrists?
This is the first installment of what will be a series of occasional posts about the UK Mindfulness All Party Parliamentary Group report, Mindful Nation.
Mindful Nation is seriously deficient as a document supposedly arguing for policy based on evidence.
The professional and financial interests of lots of people involved in preparation of the document will benefit from implementation of its recommendations.
After an introduction, I focus on two studies singled in Mindful Nation out as offering support for the benefits of mindfulness training for school children.
Results of the group’s cherrypicked studies do not support implementation of mindfulness training in the schools, but inadvertently highlight some issues.
Investment in universal mindfulness training in the schools is unlikely to yield measurable, socially significant results, but will serve to divert resources from schoolchildren more urgently in need of effective intervention and support.
Mindfulness Nation is another example of delivery of low intensity services to mostly low risk persons to the detriment of those in greatest and most urgent need.
The launch event for the Mindful Nation report billed it as the “World’s first official report” on mindfulness.
The Mindfulness All-Party Parliamentary Group (MAPPG) was set up to:
review the scientific evidence and current best practice in mindfulness training
develop policy recommendations for government, based on these findings
provide a forum for discussion in Parliament for the role of mindfulness and its implementation in public policy.
The Mindfulness All-Party Parliamentary Group describes itself as
Impressed by the levels of both popular and scientific interest, and launched an inquiry to consider the potential relevance of mindfulness to a range of urgent policy challenges facing government.
Don’t get confused by this being a government-commissioned report. The report stands in sharp contrast to one commissioned by the US government in terms of unbalanced constitution of the committee undertaking the review, and lack of transparency in search for relevant literature, and methodology for rating and interpreting of the quality of available evidence.
Compare the claims of Mindful Nation to a comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ) that reviewed 18,753 citations, and found only 47 trials (3%) that included an active control treatment. The vast majority of studies available for inclusion had only a wait list or no-treatment control group and so exaggerated any estimate of the efficacy of mindfulness.
Although the US report was available to those preparing the UK Mindful Nation report, no mention is made of either the full contents of report or a resulting publication in a peer-reviewed journal. Instead, the UK Mindful Nation report emphasized narrative and otherwise unsystematic reviews, and meta-analyses not adequately controlling for bias.
When the abridged version of the AHRQ report was published in JAMA: Internal Medicine, an accompanying commentary raises issues even more applicable to the Mindful Nation report:
The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?
The members of the UK Mindfulness All-Party Parliamentary Group were selected for their positive attitude towards mindfulness. The collection of witnesses they called to hearings were saturated with advocates of mindfulness and those having professional and financial interests in arriving at a positive view. There is no transparency in terms of how studies or testimonials were selected, but the bias is notable. Many of the scientific studies were methodologically poor, if there was any methodology at all. Many were strongly stated, but weakly substantiated opinion pieces. Authors often included those having financial interests in obtaining positive results, but with no acknowledgment of conflict of interest. The glowing testimonials were accompanied by smiling photos and were unanimous in their praise of the transformative benefits of mindfulness.
As Mark B. Cope and David B. Allison concluded about obesity research, such a packing of the committee and a highly selective review of the literature leads to a ”distortion of information in the service of what might be perceived to be righteous ends.” [I thank Tim Caulfield for calling this quote to my attention].
Mindfulness in the schools
The recommendations of Mindfulness Nation are
The Department for Education (DfE) should designate, as a first step, three teaching schools116 to pioneer mindfulness teaching,co-ordinate and develop innovation, test models of replicability and scalability and disseminate best practice.
Given the DfE’s interest in character and resilience (as demonstrated through the Character Education Grant programme and its Character Awards), we propose a comparable Challenge Fund of £1 million a year to which schools can bid for the costs of training teachers in mindfulness.
The DfE and the Department of Health (DOH) should recommend that each school identifies a lead in schools and in local services to co-ordinate responses to wellbeing and mental health issues for children and young people117. Any joint training for these professional leads should include a basic training in mindfulness interventions.
The DfE should work with voluntary organisations and private providers to fund a freely accessible, online programme aimed at supporting young people and those who work with them in developing basic mindfulness skills118.
Leading up to these recommendations, the report outlined an “alarming crisis” in the mental health of children and adolescents and proposes:
Given the scale of this mental health crisis, there is real urgency to innovate new approaches where there is good preliminary evidence. Mindfulness fits this criterion and we believe there is enough evidence of its potential benefits to warrant a significant scaling-up of its availability in schools.
Think of all the financial and professional opportunities that proponents of mindfulness involved in preparation of this report have garnered for themselves.
Mindfulness to promote executive functioning in children and adolescents
For the remainder of the blog post, I will focus on the two studies cited in support of the following statement:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
The terms “executive control” and “emotional stability” were clarified:
Many argue that the most important prerequisites for child development are executive control (the management of cognitive processes such as memory, problem solving, reasoning and planning) and emotion regulation (the ability to understand and manage the emotions, including and especially impulse control). These main contributors to self-regulation underpin emotional wellbeing, effective learning and academic attainment. They also predict income, health and criminality in adulthood69. American psychologist, Daniel Goleman, is a prominent exponent of the research70 showing that these capabilities are the biggest single determinant of life outcomes. They contribute to the ability to cope with stress, to concentrate, and to use metacognition (thinking about thinking: a crucial skill for learning). They also support the cognitive flexibility required for effective decision-making and creativity.
Actually, Daniel Goleman is the former editor of the pop magazine Psychology Today and an author of numerous pop books.
The first cited paper.
73 Flook L, Smalley SL, Kitil MJ, Galla BM, Kaiser-Greenland S, Locke J, et al. Effects of mindful awareness practices on executive functions in elementary school children. Journal of Applied School Psychology. 2010;26(1):70-95.
Journal of Applied School Psychology is a Taylor-Francis journal, formerly known as Special Services in the Schools (1984 – 2002). Its Journal Impact Factor is 1.30.
One of the authors of the article, Susan Kaiser-Greenland is a mindfulness entrepreneur as seen in her website describing her as an author, public speaker, and educator on the subject of sharing secular mindfulness and meditation with children and families. Her books are The Mindful Child: How to Help Your Kid Manage Stress and Become Happier, Kinder, and More Compassionate and Mindful Games: Sharing Mindfulness and Meditation with Children, Teens, and Families and the forthcoming The Mindful Games Deck: 50 Activities for Kids and Teens.
This article represents the main research available on Kaiser-Greenfield’s Inner Kids program and figures prominently in her promotion of her products.
The sample consisted of 64 children assigned to either mindful awareness practices (MAPs; n = 32) or a control group consisting of a silent reading period (n = 32).
The MAPs training used in the current study is a curriculum developed by one of the authors (SKG). The program is modeled after classical mindfulness training for adults and uses secular and age appropriate exercises and games to promote (a) awareness of self through sensory awareness (auditory, kinesthetic, tactile, gustatory, visual), attentional regulation, and awareness of thoughts and feelings; (b) awareness of others (e.g., awareness of one’s own body placement in relation to other people and awareness of other people’s thoughts and feelings); and (c) awareness of the environment (e.g., awareness of relationships and connections between people, places, and things).
A majority of exercises involve interactions among students and between students and the instructor.
The primary EF outcomes were the Metacognition Index (MI), Behavioral Regulation Index (BRI), and Global Executive Composite (GEC) as reported by teachers and parents
The program was delivered for 30 minutes, twice per week, for 8 weeks. Teachers and parents completed questionnaires assessing children’s executive function immediately before and following the 8-week period. Multivariate analysis of covariance on teacher and parent reports of executive function (EF) indicated an interaction effect baseline EF score and group status on posttest EF. That is, children in the group that received mindful awareness training who were less well regulated showed greater improvement in EF compared with controls. Specifically, those children starting out with poor EF who went through the mindful awareness training showed gains in behavioral regulation, metacognition, and overall global executive control. These results indicate a stronger effect of mindful awareness training on children with executive function difficulties.
The finding that both teachers and parents reported changes suggests that improvements in children’s behavioral regulation generalized across settings. Future work is warranted using neurocognitive tasks of executive functions, behavioral observation, and multiple classroom samples to replicate and extend these preliminary findings.”
What I discovered when I scrutinized the study.
This study is unblinded, with students and their teachers and parents providing the subjective ratings of the students well aware of which group students are assigned. We are not given any correlations among or between their ratings and so we don’t know whether there is just a global subjective factor (easy or difficult child, well-behaved or not) operating for either teachers or parents, or both.
It is unclear for what features of the mindfulness training the comparison reading group offers control or equivalence. The two groups are different in positive expectations and attention and support that are likely to be reflected the parent and teacher ratings. There’s a high likelihood of any differences in outcomes being nonspecific and not something active and distinct ingredient of mindfulness training. In any comparison with the students assigned to reading time, students assigned to mindfulness training have the benefit of any active ingredient it might have, as well as any nonspecific, placebo ingredients.
This is exceedingly weak design, but one that dominates evaluations of mindfulness.
With only 32 students per group, note too that this is a seriously underpowered study. It has less than a 50% probability of detecting a moderate sized effect if one is present. And because of the larger effect size needed to achieve statistical significance with such a small sample size, and statistically significant effects will be large, even if unlikely to replicate in a larger sample. That is the paradox of low sample size we need to understand in these situations.
Not surprisingly, there were no differences between the mindfulness and reading control groups on any outcomes variable, whether rated by parents or teachers. Nonetheless, the authors rescued their claims for an effective intervention with:
However, as shown by the significance of interaction terms, baseline levels of EF (GEC reported by teachers) moderated improvement in posttest EF for those children in the MAPs group compared to children in the control group. That is, on the teacher BRIEF, children with poorer initial EF (higher scores on BRIEF) who went through MAPs training showed improved EF subsequent to the training (indicated by lower GEC scores at posttest) compared to controls.
Similar claims were made about parent ratings. But let’s look at figure 3 depicting post-test scores. These are from the teachers, but results for the parent ratings are essentially the same.
Note the odd scaling of the X axis. The data are divided into four quartiles and then the middle half is collapsed so that there are three data points. I’m curious about what is being hidden. Even with the sleight-of-hand, it appears that scores for the intervention and control groups are identical except for the top quartile. It appears that just a couple of students in the control group are accounting for any appearance of a difference. But keep in mind that the upper quartile is only a matter of eight students in each group.
This scatter plot is further revealing:
It appears that the differences that are limited to the upper quartile are due to a couple of outlier control students. Without them, even the post-hoc differences that were found in the upper quartile between intervention control groups would likely disappear.
Basically what we are seeing is that most students do not show any benefit whatsoever from mindfulness training over being in a reading group. It’s not surprising that students who were not particularly elevated on the variables of interest do not register an effect. That’s a common ceiling effect in such universally delivered interventions in general population samples
Essentially, if we focus on the designated outcome variables, we are wasting the students’ time as well as that of the staff. Think of what could be done if the same resources could be applied in more effective ways. There are a couple of students in in this study were outliers with low executive function. We don’t know how else they otherwise differ.Neither in the study, nor in the validation of these measures is much attention given to their discriminant validity, i.e., what variables influence the ratings that shouldn’t. I suspect strongly that there are global, nonspecific aspects to both parent and teacher ratings such that they are influenced by the other aspects of these couple of students’ engagement with their classroom environment, and perhaps other environments.
I see little basis for the authors’ self-congratulatory conclusion:
The present findings suggest that mindfulness introduced in a general education setting is particularly beneficial for children with EF difficulties.
Introduction of these types of awareness practices in elementary education may prove to be a viable and cost-effective way to improve EF processes in general, and perhaps specifically in children with EF difficulties, and thus enhance young children’s socio-emotional, cognitive, and academic development.
Maybe the authors stared with this conviction and it was unshaken by disappointing findings.
Or the statement made in Mindfulness Nation:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
But we have another study that is cited for this statement.
74. Huppert FA, Johnson DM. A controlled trial of mindfulness training in schools: The importance of practice for an impact on wellbeing. The Journal of Positive Psychology. 2010; 5(4):264-274.
The first author, Felicia Huppert is a Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University.
This study involved 173 14- and 15- year old boys from a private Catholic school.
The Journal of Positive Psychology is not known for its high methodological standards. A look at its editorial board suggests a high likelihood that manuscripts submitted will be reviewed by sympathetic reviewers publishing their own methodologically flawed studies, often with results in support of undeclared conflicts of interest.
The mindfulness training was based on the program developed by Kabat-Zinn and colleagues at the University of Massachusetts Medical School (Kabat-Zinn, 2003). It comprised four 40 minute classes, one per week, which presented the principles and practice of mindfulness meditation. The mindfulness classes covered the concepts of awareness and acceptance, and the mindfulness practices included bodily awareness of contact points, mindfulness of breathing and finding an anchor point, awareness of sounds, understanding the transient nature of thoughts, and walking meditation. The mindfulness practices were built up progressively, with a new element being introduced each week. In some classes, a video clip was shown to highlight the practical value of mindful awareness (e.g. “The Last Samurai”, “Losing It”). Students in the mindfulness condition were also provided with a specially designed CD, containing three 8-minute audio files of mindfulness exercises to be used outside the classroom. These audio files reflected the progressive aspects of training which the students were receiving in class. Students were encouraged to undertake daily practice by listening to the appropriate audio files. During the 4-week training period, students in the control classes attended their normal religious studies lessons.
A total of 155 participants had complete data at baseline and 134 at follow-up (78 in the mindfulness and 56 in the control condition). Any student who had missing data are at either time point was simply dropped from the analysis. The effects of this statistical decison are difficult to track in the paper. Regardless, there was a lack of any difference between intervention and control group and any of a host of outcome variables, with none designated as primary outcome.
Actual practicing of mindfulness by students was inconsistent.
One third of the group (33%) practised at least three times a week, 34.8% practised more than once but less than three times a week, and 32.7% practised once a week or less (of whom 7 respondents, 8.4%, reported no practice at all). Only two students reported practicing daily. The practice variable ranged from 0 to 28 (number of days of practice over four weeks). The practice variable was found to be highly skewed, with 79% of the sample obtaining a score of 14 or less (skewness = 0.68, standard error of skewness = 0.25).
The authors rescue their claim of a significant effect for the mindfulness intervention with highly complex multivariate analyses with multiple control variables in which outcomes within-group effects for students assigned to mindfulness were related to the extent of students actually practicing mindfulness. Without controlling for the numerous (and post-hoc) multiple comparisons, results were still largely nonsignificant.
One simple conclusion that can be drawn is that despite a lot of encouragement, there was little actual practice of mindfulness by the relatively well-off students in a relatively highly resourced school setting. We could expect results to improve with wider dissemination to schools with less resources and less privileged students.
The authors conclude:
The main finding of this study was a significant improvement on measures of mindfulness and psychological well-being related to the degree of individual practice undertaken outside the classroom.
Recall that Mindful Nation cited the study in the following context:
What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.
These are two methodologically weak studies with largely null findings. They are hardly the basis for launching a national policy implementing universal mindfulness in the schools.
As noted in the US AHRQ report, despite a huge number of studies of mindfulness having been conducted, few involved a test with an adequate control group, and so there’s little evidence that mindfulness has any advantage over any active treatment. Neither of these studies disturbed that conclusion, although they are spun both in the original studies and in the Mindful Nation report to be positive. Both papers were published in journals where the reviewers were likely to be overly sympathetic and not at him tentative to serious methodological and statistical problems.
The committee writing Mindful Nation arrived at conclusions consistent with their prior enthusiasm for mindfulness and their vested interest in it. They sorted through evidence to find what supported their pre-existing assumptions.
Like UK resilience programs, the recommendations of Mindful Nation put considerable resources in the delivery of services to a large population and likely to have the threshold of need to register a socially in clinically significant effect. On a population level, results of the implementation are doomed to fall short of its claims. Those many fewer students in need more timely, intensive, and tailored services are left underserved. Their presence is ignored or, worse, invoked to justify the delivery of services to the larger group, with the needy students not benefiting.
In this blog post, I mainly focused on two methodologically poor studies. But for the selection of these particular studies, I depended on the search of the authors of Mindful Nation and the emphasis that were given to these two studies for some sweeping claims in the report. I will continue to be writing about the recommendations of Mindful Nation. I welcome reader feedback, particularly from readers whose enthusiasm for mindfulness is offended. But I urge them not simply to go to Google and cherry pick an isolated study and ask me to refute its claims.
Rather, we need to pay attention to the larger literature concerning mindfulness, its serious methodological problems, and the sociopolitical forces and vested interests that preserve a strong confirmation bias, both in the “scientific” literature and its echoing in documents like Mindful Nation.