Did a placebo affect allergic reactions to a pin prick or only in the authors’ minds?

Can placebo effects be harnessed to improve treatment outcomes? Stories of a placebo changing bodily function are important in promoting mind-body medicine, but mostly turn out to be false positives. Was this one an exception?

mind the brain logoCan placebo effects be harnessed to improve treatment outcomes? Stories of a placebo changing bodily function are important in promoting mind-body medicine, but mostly turn out to be false positives. Was this one an exception?

A lesson in critical appraisal: How to screen complicated studies in order to decide whether to put the time and energy into a closer look.

The study:

Howe LC, Goyer JP, Crum AJ. Harnessing the Placebo Effect: Exploring the Influence of Physician Characteristics on Placebo Response. Health Psychology Vol 36(11), Nov 2017, 1074-1082 http://dx.doi.org/10.1037/hea0000499

From the Abstract:

After inducing an allergic reaction in participants through a histamine skin prick test, a health care provider administered a cream with no active ingredients and set either positive expectations (cream will reduce reaction) or negative expectations (cream will increase reaction).

The provider demonstrated either high or low warmth, or either high or low competence.

Results: The impact of expectations on allergic response was enhanced when the provider acted both warmer and more competent and negated when the provider acted colder and less competent.

Conclusion: This study suggests that placebo effects should be construed not as a nuisance variable with mysterious impact but instead as a psychological phenomenon that can be understood and harnessed to improve treatment outcomes.

Why I dismissed this study

bigger skin prickThe small sample size was set in a power analysis based on the authors hopes of finding a moderate effect size, not any existing results. With only 20 participants per cell, most significant findings are likely to be false positives.

The authors had a complicated design with multiple manipulations and  time points, They examined 2 physiological measures, but only reported results for one of them in the paper, the one with stronger results.

The authors did not report a key overall test of whether there was a significant main or interaction effect. Without such a finding, jumping down to significant comparisons between groups is likely to a false positive.

The authors did not adjust for multiple comparisons, despite doing a huge number.

The authors did not report raw mean differences for comparisons, only differences at two time points controlling for gender, race, and the first two time points. No rationale is given.

The authors used language like ‘marginally significant, and ‘different, but not significantly so,’ which might suggest they were chasing and selectively reporting significant findings.

The phenomena under study was mild allergic reaction in the short term:  three time points,  9-15 minutes, with data for 2 earlier time points not reported as outcomes. It is unclear the mechanism by which an experimental manipulation could have an observable effect on such a mild reaction in such a short period of time.

Overview

Claims of placebo effects figures heavily in discussions of the power of the mind over the body. Yet, this power is greatly exaggerated by lay persons and in the lay press and social media. Effects of a placebo manipulation on objective physiological measures, as opposed to subjective self-report measures are uncommon and usually turn out to be false positives.

A New England Journal of Medicine review  of 130 clinical trials found

Little evidence in general that placebos had powerful clinical effects. Although placebos had no significant effects on objective or binary outcomes, they had possible small benefits in studies with continuous subjective outcomes and for the treatment of pain. Outside the setting of clinical trials, there is no justification for the use of placebos.

I often cite another great NEJM study  showing the sharp contrast in positive results obtained subjective self-report versus negative results with objective physical functioning measures.

That is probably the case with a recent report of effects of expectancies and interpersonal relationship on a mild allergic reaction induced by a histamine skin prick test (SPT). The study involved manipulation of the perceived warmth and competence of a provider, as well as whether research participants were told that an inert cream being applied would have a positive or negative effect.

The authors invoke in claiming support that psychological variables do indeed influence a mild allergic reaction. Examining all of the numerous pairwise comparisons,  would be a long and tedious task. However, I decided from some details of the design and analysis of the study, I would not proceed.

Some notable features of the study.

The key manipulations of high versus low warmth and high versus low competence were in the behavior of a single unblinded experimenter.

The design is described as 2x2x2 with a cell size of n= 20 (19 in one cell).

It is more properly described as 2x2x2x(5) because of the 5 time points after the provider administeried the skin prick:

(T1 = 3 min post-SPT, T2 = 6 mi  post-SPT and cream administered directly afterward, T3 = 9 min post-SPTand 3 min post-cream,T4 = 12 min post-SPT and 6 min post-cream, T5 =15 min post-SPT and 9 min. post-cream).

The small number of participants per cell was set in a power analysis based on hope a moderate effect size could be shown, not on past results.

The physiological reaction was measured in terms of size of a wheal (raised bump) and size of the flare (redness surrounding the bump).

Numerous other physiological measures were obtained, including blood pressure and pre-post session saliva samples. It is not stated what was done with these data, but they could have been used to evaluate further the manipulation of experimenter behavior.

No simple correlation between participants’ perceptions of warm and competence are reported, which would have been helpful in interpreting the 2×2 crossing of warmth and competence.

In the supplementary materials, readers are told ratings of itchiness and mood were obtained after the skin prick. No effects of the experimental manipulation were observed, which would seem not to support the effectiveness of the intervention.

No overall ANOVA or test for significance of interactions is presented.

Instead, numerous paired comparisons are presented without correction for post hoc multiplicity.

Further comparisons were conducted with a sample that was constructed post hoc:

To better understand the mechanism by which expectations differed, within a setting of high warmth and high competence, we compared the wheal and flare size for the positive and negative expectations conditions to a follow-up sample who received neutral expectations. This resulted in a total sample of N=62.

Differences arising using this sample were discussed, despite significance levels being p = .095 and p =  .155.

Raw mean scores are not presented nor discussed. Instead, all comparisons controlled for gender and race and size of the wheal at Times 1 and 2,

Only the size of the wheal is reported in the body of the paper, but it was reported

The results on the flare of the reaction were mostly similar (see the supplemental material available online).

Actually, the results reported in the supplemental material were considerably weaker, with claims of differences being marginally significant and favoring results that were only significant at particular time points.

So, what do you think? If you are interested, take a look at the study and let me know if I was premature to dismiss it.

Preorders are being accepted for e-books providing skeptical lookseBook_PositivePsychology_345x550 at mindfulness and positive psychology, and arming citizen scientists with critical thinking skills. Right now there is a special offer for free access to a Mindfulness Master Class. But hurry, it won’t last.

I will also be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. I want to give researchers the tools to get into the journals where their work will get the attention it deserves.

Sign up at my website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

Power Poseur: The lure of lucrative pseudoscience and the crisis of untrustworthiness of psychology

This is the second of two segments of Mind the Brain aimed at redirecting the conversation concerning power posing to the importance of conflicts of interest in promoting and protecting its scientific status. 

The market value of many lines of products offered to consumers depends on their claims of being “science-based”. Products from psychologists that invoke wondrous mind-body or brain-behavior connections are particularly attractive. My colleagues and I have repeatedly scrutinized such claims, sometimes reanalyzing the original data, and consistently find the claims false or premature and exaggerated.

There is so little risk and so much money and fame to be gained in promoting questionable and even junk psychological science to lay audiences. Professional organizations confer celebrity status on psychologists who succeed, provide them with forums and free publicity that enhance their credibility, and protect their claims of being “science-based” from critics.

How much money academics make from popular books, corporate talks, and workshops and how much media attention they garner serve as alternative criteria for a successful career, sometimes seeming to be valued more than the traditional ones of quality and quantity of publications and the amount of grant funding obtained.

Efforts to improve the trustworthiness of what psychologists publish in peer-reviewed have no parallel in any efforts to improve the accuracy of what psychologists say to the public outside of the scientific literature.

By the following reasoning, there may be limits to how much the former efforts at reform can succeed without the latter. In the hypercompetitive marketplace, only the most dramatic claims gain attention. Seldom are the results of rigorously done, transparently reported scientific work sufficiently strong and  unambiguous enough to back up the claims with the broadest appeal, especially in psychology. Psychologists who remain in academic setting but want to sell market their merchandise to consumers face a dilemma: How much do they have to hype and distort their findings in peer-reviewed journals to fit with what they say to the public?

It important for readers of scientific articles to know that authors are engaged in these outside activities and have pressure to obtain particular results. The temptation of being able to make bold claims clash with the requirements to conduct solid science and report results transparently and completely. Let readers decide if this matters for their receptivity to what authors say in peer-reviewed articles by having information available to them. But almost never is a conflict of interest declared. Just search articles in Psychological Science and see if you can find a single declaration of a COI, even when the authors have booking agents and give high priced corporate talks and seminars.

The discussion of the quality of science backing power posing should have been shorter.

Up until now, much attention to power posing in academic circles has been devoted to the quality of the science behind it, whether results can be independently replicated, and whether critics have behaved badly. The last segment of Mind the Brain examined the faulty science of the original power posing paper in Psychological Science and showed why it could not contribute a credible effect size to the literature.

The discussion of the science behind power posing should have been much shorter and should have reached a definitive conclusion: the original power posing paper should never have been published in Psychological Science. Once the paper had been published, a succession of editors failed in their expanded Pottery-Barn responsibility to publish critiques by Steven J. Stanton  and by Marcus Crede and Leigh A. Phillips that were quite reasonable in their substance and tone. As is almost always the case, bad science was accorded an incumbent advantage once it was published. Any disparagement or criticism of this paper would be held by editors to strict and even impossibly high standards if it were to be published. Let’s review the bad science uncovered in the last blog. Readers who are familiar with that post can skip to the next section.

A brief unvarnished summary of the bad science of the original power posing paper has a biobehavioral intervention study

Reviewers of the original paper should have balked at the uninformative and inaccurate abstract. Minimally, readers need to know at the outset that there were only 42 participants (26 females and 16 males) in the study comparing high power versus low-power poses. Studies with so few participants cannot be expected to provide reproducible effect sizes. Furthermore, there is no basis for claiming that results held for both men and women because that claim depended on analyses with even smaller numbers. Note the 16 males were distributed in some unknown way across the two conditions. If power is fixed by the smaller cell size, even the optimal 8 males/cell is well below contributing an effect size. Any apparent significant effects in this study are likely to be meaning imposed on noise.

The end sentence in the abstract is an outrageously untrue statement of results. Yet, as we will see, it served as the basis of a product launch worth in the seven-figure range that was already taking shape:

That a person can, by assuming two simple 1-minute poses, embody power and instantly become more powerful has real-world, actionable implications.

Aside from the small sample size, as an author, editor and critic for in clinical and health psychology for over 40 years, I greet a claim of ‘real-world actionable implications’ from two one-minute manipulations of participants’ posture with extreme skepticism. My skepticism grows as we delve into the details of the study.

Investigators’ collecting a single pair of pre-post assessments of salivary cortisol is at best a meaningless ritual, and can contribute nothing to understanding what is going on in the study at a hormonal level.

Men in this age range of participants in this study have six times more testosterone than women. Statistical “control” of testosterone by controlling for gender is a meaningless gesture producing uninterpretable results. Controlling for baseline testosterone in analyses of cortisol and vice versa eliminates any faint signal in the loud noise of the hormonal data.

Although it was intended as a manipulation check (and subsequently as claimed as evidence of the effect of power posing on feelings),  the crude subjective self-report ratings of how “powerful” and “in charge” on a 1-4 scale could simply communicate the experimenters’ expectancies to participants. Endorsing whether they felt more powerful indicated how smart participants were and if they were go along with the purpose of the study. Inferences beyond that uninteresting finding require external validation.

In clinical and health psychology trials, we are quite wary of simple subjective self-report analogue scales, particularly when there is poor control of the unblinded experimenters’ behavior and what they communicate to participants.

The gambling task lacks external validation. Low stakes could simply reduce it to another communication of experimenters’ expectancies. Note that the saliva assessments were obtained after completion of the task and if there is any confidence left in the assessments of hormones, this is an important confound.

The unblinded experimenters’ physically placing participants in either 2 1-minute high power or 2 1-minute low-power poses is a weird, unvalidated experimental manipulation that could not have the anticipated effects on hormonal levels. Neither high- nor low-power poses are credible, but the hypothesis particularly strains credibility that they low-power pose would actually raise cortisol, if cortisol assessments in the study had any meaning at all.

Analyses were not accurately described, and statistical controls of any kind with such a small sample  are likely to add to spurious findings. The statistical controls in this study were particularly inappropriate and there is evidence of the investigators choosing the analyses to present after the results were known.

There is no there there: The original power pose paper did not introduce a credible effect size into the literature.

The published paper cannot introduce a credible effect size into the scientific literature. Power posing may be an interesting and important idea that deserves careful scientific study but if any future study of the idea would be “first ever,” not a replication of the  Psychological Science article. The two commentaries that were blocked from publication in Psychological Science but published elsewhere amplify any dismissal of the paper, but we are already well over the top. But then there is the extraordinary repudiation of the paper by the first author and her exposure of the exploitation of investigator degrees of freedom and outright p-hacking.  How many stakes do you have to plunge into the heart of a vampire idea?

Product launch

 Even before the power posing article appeared in Psychological Science, Amy Cuddy was promoting it at Harvard, first  in Power Posing: Fake It Until You Make It  in Harvard Business School’s Working Knowledge: Business Research for Business Leaders. Shortly afterwards was the redundant but elaborated article in Harvard Magazine, subtitled Amy Cuddy probes snap judgments, warm feelings, and how to become an “alpha dog.”

Amy Cuddy is the middle author on the actual Psychological Science between first author Dana Carney and third author, Dana Carney’s graduate student Andy J Yap. Yet, the Harvard Magazine article lists Cuddy first. The Harvard Magazine article is also noteworthy in unveiling what would grow into Cuddy’s redemptive self narrative, although Susan Fiske’s role as  as the “attachment figure” who nurtures Cuddy’s  realization of her inner potential was only hinted.

QUITE LITERALLY BY ACCIDENT, Cuddy became a psychologist. In high school and in college at the University of Colorado at Boulder, she was a serious ballet dancer who worked as a roller-skating waitress at the celebrated L.A. Diner. But one night, she was riding in a car whose driver fell asleep at 4:00 A.M. while doing 90 miles per hour in Wyoming; the accident landed Cuddy in the hospital with severe head trauma and “diffuse axonal injury,” she says. “It’s hard to predict the outcome after that type of injury, and there’s not much they can do for you.”

Cuddy had to take years off from school and “relearn how to learn,” she explains. “I knew I was gifted–I knew my IQ, and didn’t think it could change. But it went down by two standard deviations after the injury. I worked hard to recover those abilities and studied circles around everyone. I listened to Mozart–I was willing to try anything!” Two years later her IQ was back. And she could dance again.

Yup, leading up to promoting the idea that overcoming circumstances and getting what you want is as simple as adopitng these 2 minutes of  behavioral manipulation.

The last line of the Psychological Science abstract was easily fashioned into the pseudoscientific basis for this ease of changing behavior and outcomes, which now include the success of venture-capital pitches:

 

Tiny changes that people can make can lead to some pretty dramatic outcomes,” Cuddy reports. This is true because changing one’s own mindset sets up a positive feedback loop with the neuroendocrine secretions, and also changes the mindset of others. The success of venture-capital pitches to investors apparently turns, in fact, on nonverbal factors like “how comfortable and charismatic you are.”

Soon, The New York Times columnist David Brooks   placed power posing solidly within the positive thinking product line of positive psychology, even if Cuddy had no need to go out on that circuit: “If you act powerfully, you will begin to think powerfully.”

In 2011, both first author Dana Carney and Amy Cuddy received the Rising Star Award from the Association for Psychological Science (APS) for having “already made great advancements in science” Carney cited her power posing paper as one that she liked. Cuddy didn’t nominate the paper, but reported er recent work examined “how brief nonverbal expressions of competence/power and warmth/connection actually alter the neuroendocrine levels, expressions, and behaviors of the people making the expressions, even when the expressions are “posed.”

The same year, In 2011, Cuddy also appeared at PopTech, which is a”global community of innovators, working together to expand the edge of change” with tickets selling for $2,000. According to an article in The Chronicle of Higher Education :

When her turn came, Cuddy stood on stage in front of a jumbo screen showing Lynda Carter as Wonder Woman while that TV show’s triumphant theme song announced the professor’s arrival (“All the world is waiting for you! And the power you possess!”). After the music stopped, Cuddy proceeded to explain the science of power poses to a room filled with would-be innovators eager to expand the edge of change.

But that performance was just a warm up for Cuddy’s TedGlobal Talk which has now received almost 42 million views.

A Ted Global talk that can serve as a model for all Ted talks: Your body language may shape who you are  

This link takes you not only to Amy Cuddy’s Ted Global talk but to a transcript in 49 different languages

 Amy Cuddy’s TedGlobal Talk is brilliantly crafted and masterfully delivered. It has two key threads. The first thread is what David McAdams has described as an obligatory personal narrative of a redeemed self.  McAdams summarizes the basic structure:

As I move forward in life, many bad things come my way—sin, sickness, abuse, addiction, injustice, poverty, stagnation. But bad things often lead to good outcomes—my suffering is redeemed. Redemption comes to me in the form of atonement, recovery, emancipation, enlightenment, upward social mobility, and/or the actualization of my good inner self. As the plot unfolds, I continue to grow and progress. I bear fruit; I give back; I offer a unique contribution.

This is interwoven with a second thread, the claims of the strong science of power pose derived from the Psychological Science article. Without the science thread, the talk is reduced to a motivational talk of the genre of Oprah Winfrey or Navy Seal Admiral William McRaven Sharing Reasons You Should Make Bed Everyday

It is not clear that we should hold the redeemed self of a Ted Talk to the criteria of historical truth. Does it  really matter whether  Amy Cuddy’s IQ temporarily fell two standard deviations after an auto accident (13:22)? That Cuddy’s “angel adviser Susan Fiske saved her from feeling like an imposter with the pep talk that inspired the “fake it until you make it” theme of power posing (17:03)? That Cuddy similarly transformed the life of her graduate student (18:47) with:

So I was like, “Yes, you are! You are supposed to be here! And tomorrow you’re going to fake it, you’re going to make yourself powerful, and, you know –

This last segment of the Ted talk is best viewed, rather than read in the transcript. It brings Cuddy to tears and the cheering, clapping audience to their feet. And Cuddy wraps up with her takeaway message:

The last thing I’m going to leave you with is this. Tiny tweaks can lead to big changes. So, this is two minutes. Two minutes, two minutes, two minutes. Before you go into the next stressful evaluative situation, for two minutes, try doing this, in the elevator, in a bathroom stall, at your desk behind closed doors. That’s what you want to do. Configure your brain to cope the best in that situation. Get your testosterone up. Get your cortisol down. Don’t leave that situation feeling like, oh, I didn’t show them who I am. Leave that situation feeling like, I really feel like I got to say who I am and show who I am.

So I want to ask you first, you know, both to try power posing, and also I want to ask you to share the science, because this is simple. I don’t have ego involved in this. (Laughter) Give it away. Share it with people, because the people who can use it the most are the ones with no resources and no technology and no status and no power. Give it to them because they can do it in private. They need their bodies, privacy and two minutes, and it can significantly change the outcomes of their life.

Who cares if the story is literal historical truth? Maybe we should not. But I think psychologists should care about the misrepresentation of the study. For that matter, anyone concerned with truth in advertising to consumers. Anyone who believes that consumers have the right to fair and accurate portrayal of science in being offered products, whether anti-aging cream, acupuncture, or self-help merchandise:

Here’s what we find on testosterone. From their baseline when they come in, high-power people experience about a 20-percent increase, and low-power people experience about a 10-percent decrease. So again, two minutes, and you get these changes. Here’s what you get on cortisol. High-power people experience about a 25-percent decrease, and the low-power people experience about a 15-percent increase. So two minutes lead to these hormonal changes that configure your brain to basically be either assertive, confident and comfortable, or really stress-reactive, and feeling sort of shut down. And we’ve all had the feeling, right? So it seems that our nonverbals do govern how we think and feel about ourselves, so it’s not just others, but it’s also ourselves. Also, our bodies change our minds.

Why should we care? Buying into such simple solutions prepares consumers to accept other outrageous claims. It can be a gateway drug for other quack treatments like Harvard psychologist Ellen Langer’s claims that changing mindset can overcome advanced cancer.

Unwarranted claims breaks down the barriers between evidence-based recommendations and nonsense. Such claims discourages consumers from accepting more deliverable promises that evidence-based interventions like psychotherapy can indeed make a difference, but they take work and effort, and effects can be modest. Who would invest time and money in cognitive behavior therapy, when two one-minute self-manipulations can transform lives? Like all unrealistic promises of redemption, such advice may ultimately lead people to blame themselves when they don’t overcome adversity- after all it is so simple  and just a matter of taking charge of your life. Their predicament indicates that they did not take charge or that they are simply losers.

But some consumers can be turned cynical about psychology. Here is a Harvard professor trying to sell them crap advice. Psychology sucks, it is crap.

Conflict of interest: Nothing to declare?

In an interview with The New York Times, Amy Cuddy said: “I don’t care if some people view this research as stupid,” she said. “I feel like it’s my duty to share it.”

Amy Cuddy may have been giving her power pose advice away for free in her Ted Talk, but she already had given it away at the $2,000 a ticket PopTech talk. The book contract for Presence: Bringing Your Boldest Self to Your Biggest Challenges was reportedly for around a million dollars.  And of course, like many academics who leave psychology for schools of management, Cuddy had a booking agency soliciting corporate talks and workshops. With the Ted talk, she could command $40,000-$100,000.

Does this discredit the science of power posing? Not necessarily, but readers should be informed and free to decide for themselves. Certainly, all this money in play might make Cuddy more likely to respond defensively to criticism of her work. If she repudiated this work the way that first author Dana Carey did, would there be a halt to her speaking gigs, a product recall, or refunds issued by Amazon for Presence?

I think it is fair to suggest that there is too much money in play for Cuddy to respond to academic debate.  Maybe things are outside that realm because of these stakes.

The replicationados attempt replications: Was it counterproductive?

 Faced with overwhelming evidence of the untrustworthiness of the psychological literature, some psychologists have organized replication initiatives and accumulated considerable resources for multisite replications. But replication initiatives are insufficient to salvage the untrustworthiness of many areas of psychology, particularly clinical and health psychology intervention studies, and may inadvertently dampen more direct attacks on bad science. Many of those who promote replication initiatives are silent when investigators refused to share data for studies with important clinical and public health implications. They are also silent when journals like Psychological Science fail to publish criticism of papers with blatantly faulty science.

Replication initiatives take time and results are often,but not always ultimately published outside of the journals where a flawed original work was published. But in important unintended consequence of them is they lend credibility to effect sizes that had no validity whatsoever when they occurred in the original papers. In debate attempting to resolve discrepancies between original studies and large scale replications, the original underpowered studies are often granted a more entrenched incumbent advantage.

It should be no surprise that in large-scale attempted  replication,  Ranehill , Dreber, Johannesson, Leiberg, Sul , and Weber failed to replicate the key, nontrivial findings of the original power pose study.

Consistent with the findings of Carney et  al., our results showed a significant effect of power posing on self-reported feelings of power. However, we found no significant effect of power posing on hormonal levels or in any of the three behavioral tasks.

It is also not surprising that Cuddy invoked her I-said-it-first-and-i-was-peer-reviewed incumbent advantage reasserting of her original claim, along with a review of 33 studies including the attempted replication:

The work of Ranehill et al. joins a body of research that includes 33 independent experiments published with a total of 2,521 research participants. Together, these results may help specify when nonverbal expansiveness will and will not cause embodied psychological changes.

Cuddy asserted methodological differences between their study and the attempted Ranehill replication, may have moderated the effects of posing. But no study has shown that putting participants into a power pose affects hormones.

Joe Simmons and Uri Simonsohn and performed a meta analysis of the studies nominated by Cuddy and ultimately published in Psychological Science. Their blog Data Colada succinctly summarized the results:

Consistent with the replication motivating this post, p-curve indicates that either power-posing overall has no effect, or the effect is too small for the existing samples to have meaningfully studied it. Note that there are perfectly benign explanations for this: e.g., labs that run studies that worked wrote them up, labs that run studies that didn’t, didn’t. [5]

While the simplest explanation is that all studied effects are zero, it may be that one or two of them are real (any more and we would see a right-skewed p-curve). However, at this point the evidence for the basic effect seems too fragile to search for moderators or to advocate for people to engage in power posing to better their lives.

Come on, guys, there was never a there there, don’t invent one, but keeping trying to explain it.

It is interesting that none of these three follow up articles in Psychological Science have abstracts, especially in contrast to the original power pose paper that effectively delivered its misleading message in the abstract.

Just as this blog post was being polished, a special issue of Comprehensive Results in Social Psychology (CRSP) on Power Poses was released.

  1. No preregistered tests showed positive effects of expansive poses on any behavioral or hormonal measures. This includes direct replications and extensions.
  2. Surprise: A Bayesian meta-analysis across the studies reveals a credible effect of expansive poses on felt power. (Note that this is described as a ‘manipulation check’ by Cuddy in 2015.) Whether this is anything beyond a demand characteristic and whether it has any positive downstream behavioral effects is unknown.

No, not a surprise, just an uninteresting artifact. But stay tuned for the next model of poser pose dropping the tainted name and focusing on “felt power.” Like rust, commercialization of bad psychological science never really sleeps, it only takes power naps.

Meantime, professional psychological organizations, with their flagship journals and publicity machines need to:

  • Lose their fascination with psychologists whose celebrity status depends on Ted talks and the marketing of dubious advice products grounded in pseudoscience.
  • Embrace and adhere to an expanded Pottery Barn rule that covers not only direct replications, but corrections to bad science that has been published.
  • Make the protection of  consumers from false and exaggerated claims a priority equivalent to the vulnerable reputations of academic psychologists in efforts to improve the trustworthiness of psychology.
  • Require detailed conflicts of interest statements for talks and articles.

All opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Disclosure:

I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

 

 

 

 

Calling out pseudoscience, radically changing the conversation about Amy Cuddy’s power posing paper

Part 1: Reviewed as the clinical trial that it is, the power posing paper should never have been published.

Has too much already been written about Amy Cuddy’s power pose paper? The conversation should not be stopped until its focus shifts and we change our ways of talking about psychological science.

The dominant narrative is now that a junior scientist published an influential paper on power posing and was subject to harassment and shaming by critics, pointing to the need for greater civility in scientific discourse.

Attention has shifted away from the scientific quality of the paper and the dubious products the paper has been used to promote and on the behavior of its critics.

Amy Cuddy and powerful allies are given forums to attack and vilify critics, accusing them of damaging the environment in which science is done and discouraging prospective early career investigators from entering the field.

Meanwhile, Amy Cuddy commands large speaking fees and has a top-selling book claiming the original paper provides strong science for simple behavioral manipulations altering mind-body relations and producing socially significant behavior.

This misrepresentation of psychological science does potential harm to consumers and the reputation of psychology among lay persons.

This blog post is intended to restart the conversation with a reconsideration of the original paper as a clinical and health psychology randomized trial (RCT) and, on that basis, identifying the kinds of inferences that are warranted from it.

In the first of a two post series, I argue that:

The original power pose article in Psychological Science should never been published.

-Basically, we have a therapeutic analog intervention delivered in 2 1-minute manipulations by unblinded experimenters who had flexibility in what they did,  what they communicated to participants, and which data they chose to analyze and how.

-It’s unrealistic to expect that 2 1-minute behavioral manipulations would have robust and reliable effects on salivary cortisol or testosterone 17 minutes later.

-It’s absurd to assume that the hormones mediated changes in behavior in this context.

-If Amy Cuddy retreats to the idea that she is simply manipulating “felt power,” we are solidly in the realm of trivial nonspecific and placebo effects.

The original power posing paper

Carney DR, Cuddy AJ, Yap AJ. Power posing brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science. 2010 Oct 1;21(10):1363-8.

The Psychological Science article can be construed as a brief mind-body intervention consisting of 2 1-minute behavioral manipulations. Central to the attention that the paper attracted is that argument that this manipulation  affected psychological state and social performance via the effects of the manipulation on the neuroendocrine system.

The original study is in effect, a disguised randomized clinical trial (RCT) of a biobehavioral intervention. Once this is recognized, a host of standards can come into play for reporting this study and interpreting the results.

CONSORT

All major journals and publishers including Association for Psychological Science have adopted the Consolidated Standards of Reporting Trials (CONSORT). Any submission of a manuscript reporting a clinical trial is required to be accompanied by a checklist  indicating that the article reports that particular details of how the trial was conducted. Item 1 on the checklist specifies that both the title and abstract indicate the study was a randomized trial. This is important and intended to aid readers in evaluating the study, but also for the study to be picked up in systematic searches for reviews that depend on screening of titles and abstracts.

I can find no evidence that Psychological Science adheres to CONSORT. For instance, my colleagues and I provided a detailed critique of a widely promoted study of loving-kindness meditation that was published in Psychological Science the same year as Cuddy’s power pose study. We noted that it was actually a poorly reported null trial with switched outcomes. With that recognition, we went on to identify serious conceptual, methodological and statistical problems. After overcoming considerable resistance, we were able  to publish a muted version of our critique. Apparently reviewers of the original paper had failed to evaluate it in terms of it being an RCT.

The submission of the completed CONSORT checklist has become routine in most journals considering manuscripts for studies of clinical and health psychology interventions. Yet, additional CONSORT requirements that developed later about what should be included in abstracts are largely being ignored.

It would be unfair to single out Psychological Science and the Cuddy article for noncompliance to CONSORT for abstracts. However, the checklist can be a useful frame of reference for noting just how woefully inadequate the abstract was as a report of a scientific study.

CONSORT for abstracts

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF, CONSORT Group. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLOS Medicine. 2008 Jan 22;5(1):e20.

Journal and conference abstracts should contain sufficient information about the trial to serve as an accurate record of its conduct and findings, providing optimal information about the trial within the space constraints of the abstract format. A properly constructed and well-written abstract should also help individuals to assess quickly the validity and applicability of the findings and, in the case of abstracts of journal articles, aid the retrieval of reports from electronic databases.

Even if CONSORT for abstracts did not exist, we could argue that readers, starting with the editor and reviewers were faced with an abstract with extraordinary claims that required better substantiation. They were disarmed by a lack of basic details from evaluating these claims.

In effect, the abstract reduces the study to an experimercial for products about to be marketed in corporate talks and workshops, but let’s persist in evaluating it as an abstract as a scientific study.

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

I don’t believe I have ever encountered in an abstract the extravagant claims with which this abstract concludes. But readers are not provided any basis for evaluating the claim until the Methods section. Undoubtedly, many holding opinions about the paper did not read that far.

Namely:

Forty-two participants (26 females and 16 males) were randomly assigned to the high-power-pose or low-power-pose condition.

Testosterone levels were in the normal range at both Time 1 (M = 60.30 pg/ml, SD = 49.58) and Time 2 (M = 57.40 pg/ml, SD = 43.25). As would be suggested by appropriately taken and assayed samples (Schultheiss & Stanton, 2009), men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.2

Too small a study to provide an effect size

Hold on! First. Only 42 participants  (26 females and 16 males) would readily be recognized as insufficient for an RCT, particularly in an area of research without past RCTs.

After decades of witnessing the accumulation of strong effect sizes from underpowered studies, many of us have reacted by requiring 35 participants per group as the minimum acceptable level for a generalizable effect size. Actually, that could be an overly liberal criterion. Why?

Many RCTs are underpowered, yet a lack of enforcement of preregistration allows positive results by redefining the primary outcomes after results are known. A psychotherapy trial with 30 or less patients in the smallest cell has less than a 50% probability of detecting a moderate sized significant effect, even if it is present (Coyne,Thombs, & Hagedoorn, 2010). Yet an examination of the studies mustered for treatments being evidence supported by APA Division 12 ( http://www.div12.org/empirically-supported-treatments/ ) indicates that many studies were too underpowered to be reliably counted as evidence of efficacy, but were included without comment about this problem. Taking an overview, it is striking the extent to which the literature continues depend on small, methodologically flawed RCTs conducted by investigators with strong allegiances to one of the treatments being evaluated. Yet, which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated (Luborsky et al., 2006).

Earlier my colleagues and I had argued for the non-accumulative  nature of evidence from small RCTs:

Kraemer, Gardner, Brooks, and Yesavage (1998) propose excluding small, underpowered studies from meta-analyses. The risk of including studies with inadequate sample size is not limited to clinical and pragmatic decisions being made on the basis of trials that cannot demonstrate effectiveness when it is indeed present. Rather, Kraemer et al. demonstrate that inclusion of small, underpowered trials in meta-analyses produces gross overestimates of effect size due to substantial, but unquantifiable confirmatory publication bias from non-representative small trials. Without being able to estimate the size or extent of such biases, it is impossible to control for them. Other authorities voice support for including small trials, but generally limit their argument to trials that are otherwise methodologically adequate (Sackett & Cook, 1993; Schulz & Grimes, 2005). Small trials are particularly susceptible to common methodological problems…such as lack of baseline equivalence of groups; undue influence of outliers on results; selective attrition and lack of intent-to-treat analyses; investigators being unblinded to patient allotment; and not having a pre-determined stopping point so investigators are able to stop a trial when a significant effect is present.

In the power posing paper, there was the control for sex in all analyses because a peek at the data revealed baseline sex differences in testosterone dwarfing any other differences. What do we make of investigators conducting a study depending on testosterone mediating a behavioral manipulation who did not anticipate large baseline sex differences in testosterone?

In a Pubpeer comment leading up to this post , I noted:

We are then told “men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.”

The findings alluded to in the abstract should be recognizable as weird and uninterpretable. Most basically, how could the 16 males be distributed across the two groups so that the authors could confidently say that differences held for both males and females? Especially when all analyses control for sex? Sex is highly correlated with testosterone and so an analysis that controlled for both the variables, sex and testosterone would probably not generalize to testosterone without such controls.

We are never given the basic statistics in the paper to independently assess what the authors are doing, not the correlation between cortisol and testosterone, only differences in time 2 cortisol controlling for time 1 cortisol, time 1 testosterone and gender. These multivariate statistics are not  very generalizable in a sample with 42 participants distributed across 2 groups. Certainly not for the 26 females and 16  males taken separately.

The behavioral manipulation

The original paper reports:

Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation.

And then elaborates:

To configure the test participants into the poses, the experimenter placed an electrocardiography lead on the back of each participant’s calf and underbelly of the left arm and explained, “To test accuracy of physiological responses as a function of sensor placement relative to your heart, you are being put into a certain physical position.” The experimenter then manually configured participants’ bodies by lightly touching their arms and legs. As needed, the experimenter provided verbal instructions (e.g., “Keep your feet above heart level by putting them on the desk in front of you”). After manually configuring participants’ bodies into the two poses, the experimenter left the room. Participants were videotaped; all participants correctly made and held either two high-power or two low-power poses for 1 min each. While making and holding the poses, participants completed a filler task that consisted of viewing and forming impressions of nine faces.

The behavioral task and subjective self-report assessment

Measure of risk taking and powerful feelings. After they finished posing, participants were presented with the gambling task. They were endowed with $2 and told they could keep the money—the safe bet—or roll a die and risk losing the $2 for a payoff of $4 (a risky but rational bet; odds of winning were 50/50). Participants indicated how “powerful” and “in charge” they felt on a scale from 1 (not at all) to 4 (a lot).

An imagined bewildered review from someone accustomed to evaluating clinical trials

Although the authors don’t seem to know what they’re doing, we have an underpowered therapy analogue study with extraordinary claims. It’s unconvincing  that the 2 1-minute behavioral manipulations would change subsequent psychological states and behavior with any extralaboratory implications.

The manipulation poses a puzzle to research participants, challenging them to figure out what is being asked of them. The $2 gambling task presumably is meant to simulate effects on real-world behavior. But the low stakes could mean that participants believed the task evaluated whether they “got” the purpose of the intervention and behaved accordingly. Within that perspective, the unvalidated subjective self-report rating scale would serve as a clue to the intentions of the experimenter and an opportunity to show the participants were smart. The  manipulation of putting participants  into a low power pose is even more unconvincing as a contrasting active intervention or a control condition.  Claims that this manipulation did anything but communicate experimenter expectancies are even less credible.

This is a very weak form of evidence: A therapy analogue study with such a brief, low intensity behavioral manipulation followed by assessments of outcomes that might just inform participants of what they needed to do to look smart (i.e., demand characteristics). Add in that the experimenters were unblinded and undoubted had flexibility in how they delivered the intervention and what they said to participants. As a grossly underpowered trial, the study cannot make a contribution to the literature and certainly not an effect size.

Furthermore, if the authors had even a basic understanding of gender differences in social status or sex differences in testosterone, they would have stratified the study with respect to participate gender, not attempted to obtain control by post hoc statistical manipulation.

I could comment on signs of p-hacking and widespread signs of inappropriate naming, use, and interpretation of statistics, but why bother? There are no vital signs of a publishable paper here.

Is power posing salvaged by fashionable hormonal measures?

 Perhaps the skepticism of the editor and reviewers was overcome by the introduction of mind-body explanations  of what some salivary measures supposedly showed. Otherwise, we would be left with a single subjective self-report measure and a behavioral task susceptible to demand characteristics and nonspecific effects.

We recognize that the free availability of powerful statistical packages risks people using them without any idea of the appropriateness of their use or interpretation. The same observation should be made of the ready availability of means of collecting spit samples from research participants to be sent off to outside laboratories for biochemical analysis.

The clinical health psychology literature is increasingly filled with studies incorporating easily collected saliva samples intended to establish that psychological interventions influence mind-body relations. These have become particularly applied in attempts to demonstrate that mindfulness meditation and even tai chi can have beneficial effects on physical health and even cancer outcomes.

Often inaccurately described as as “biomarkers,” rather than merely as biological measurements, there is seldom little learned by inclusion of such measures that is generalizable within participants or across studies.

Let’s start with salivary-based cortisol measures.

A comprehensive review  suggests that:

  • A single measurement on a participant  or a pre-post pair of assessments would not be informative.
  • Single measurements are unreliable and large intra-and inter-individual differences not attributable to intervention can be in play.
  • Minor variations in experimental procedures can have large, unwanted effects.
  • The current standard is cortisol awakening response in the diurnal slope over more than one day, which would not make sense for the effects of 2 1-minute behavioral manipulations.
  • Even with sophisticated measurement strategies there is low agreement across and even within studies and low agreement with behavioral and self-report data.
  • The idea of collecting saliva samples would serve the function the investigators intended is an unscientific, but attractive illusion.

Another relevant comprehensive theoretical review and synthesis of cortisol reactivity was available at the time the power pose study was planned. The article identifies no basis for anticipating that experimenters putting participants into a 1-minute expansive poses would lower cortisol. And certainly no basis for assuming that putting participants into a 1-minute slumped position would raise cortisol. Or what such findings could possibly mean.

But we are clutching at straws. The authors’ interpretations of their hormonal data depend on bizarre post hoc decisions about how to analyze their data in a small sample in which participant sex is treated in incomprehensible  fashion. The process of trying to explain spurious results risks giving the results a credibility that authors have not earned for them. And don’t even try to claim we are getting signals of hormonal mediation from this study.

Another system failure: The incumbent advantage given to a paper that should not have been published.

Even when publication is based on inadequate editorial oversight and review, any likelihood or correction is diminished by published results having been blessed as “peer reviewed” and accorded an incumbent advantage over whatever follows.

A succession of editors have protected the power pose paper from post-publication peer review. Postpublication review has been relegated to other journals and social media, including PubPeer and blogs.

Soon after publication of  the power pose paper, a critique was submitted to Psychological Science, but it was desk rejected. The editor informally communicated to the author that the critique read like a review and teh original article had already been peer reviewed.

The critique by Steven J. Stanton nonetheless eventually appeared in Frontiers in Behavioral Neuroscience and is worth a read.

Stanton took seriously the science being invoked in the claims of the power pose paper.

A sampling:

Carney et al. (2010) collapsed over gender in all testosterone analyses. Testosterone conforms to a bimodal distribution when including both genders (see Figure 13; Sapienza et al., 2009). Raw testosterone cannot be considered a normally distributed dependent or independent variable when including both genders. Thus, Carney et al. (2010) violated a basic assumption of the statistical analyses that they reported, because they used raw testosterone from pre- and post-power posing as independent and dependent variables, respectively, with all subjects (male and female) included.

And

^Mean cortisol levels for all participants were reported as 0.16 ng/mL pre-posing and 0.12 ng/mL post-posing, thus showing that for all participants there was an average decrease of 0.04 ng/mL from pre- to post-posing, regardless of condition. Yet, Figure 4 of Carney et al. (2010) shows that low-power posers had mean cortisol increases of roughly 0.025 ng/mL and high-power posers had mean cortisol decreases of roughly 0.03 ng/mL. It is unclear given the data in Figure 4 how the overall cortisol change for all participants could have been a decrease of 0.04 ng/mL.

Another editor of Psychological Science received a critical comment from Marcus Crede and Leigh A. Phillips. After the first round of reviews, the Crede and Philips removed references to changes in the published power pose paper from earlier drafts that they had received from the first author, Dana Carney. However, Crede and Phillips withdrew their critique when asked to respond to a review by Amy Cuddy in a second resubmission.

The critique is now forthcoming in Social Psychological and Personality Science

Revisiting the Power Pose Effect: How Robust Are the Results Reported by Carney, Cuddy and Yap (2010) to Data Analytic Decisions

The article investigates effects of choices made in p-hacking in the original paper. An excerpt from the abstract

In this paper we use multiverse analysis to examine whether the findings reported in the original paper by Carney, Cuddy, and Yap (2010) are robust to plausible alternative data analytic specifications: outlier identification strategy; the specification of the dependent variable; and the use of control variables. Our findings indicate that the inferences regarding the presence and size of an effect on testosterone and cortisol are  highly sensitive to data analytic specifications. We encourage researchers to routinely explore the influence of data analytic choices on statistical inferences and also encourage editors and  reviewers to require explicit examinations of the influence of alternative data analytic  specifications on the inferences that are drawn from data.

Dana Carney, the first author of the has now posted an explanation why she no longer believes the originally reported findings are genuine and why “the evidence against the existence of power poses is undeniable.” She discloses a number of important confounds and important “researcher degrees of freedom in the analyses reported in the published paper.

Coming Up Next

A different view of the Amy Cuddy’s Ted talk in terms of its selling of pseudoscience to consumers and its acknowledgment of a strong debt to Cuddy’s adviser Susan Fiske.

A disclosure of some of the financial interests that distort discussion of the scientific flaws of the power pose.

How the reflexive response of the replicationados inadvertently reinforced the illusion that the original pose study provided meaningful effect sizes.

How Amy Cuddy and her allies marshalled the resources of the Association for Psychological Science to vilify and intimidate critics of bad science and of the exploitation of consumers by psychological pseudoscience.

How journalists played into this vilification.

What needs to be done to avoid a future fiasco for psychology like the power pose phenomenon and protect reformers of the dissemination of science.

Note: Time to reiterate that all opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Unmasking Jane Brody’s “A Positive Outlook May Be Good for Your Health” in The New York Times

A recipe for coercing ill people with positive psychology pseudoscience in the New York Times

  • Judging by the play she gets in social media and the 100s of comments on her articles in the New York Times, Jane Brody has a successful recipe for using positive psychology pseudoscience to bolster down-home advice you might’ve gotten from your grandmother.
  • Her recipe might seem harmless enough, but her articles are directed at people struggling with chronic and catastrophic physical illnesses. She offers them advice.
  • The message is that persons with physical illness should engage in self-discipline, practice positive psychology exercises – or else they are threatening their health and shortening their lives.
  • People struggling with physical illness have enough to do already. The admonition they individually and collectively should do more -they should become more self-disciplined- is condescending and presumptuous.
  • Jane Brody’s carrot is basically a stick. The implied threat is simply coercive: that people with chronic illness are not doing what they can to improve the physical health unless they engage in these exercises.
  • It takes a careful examination Jane Brody’s sources to discover that the “scientific basis” for this positive psychology advice is quite weak. In many instances it is patently junk, pseudoscience.
  • The health benefits claimed for positivity are unfounded.
  • People with chronic illness are often desperate or simply vulnerable to suggestions that they can and should do more.  They are being misled by this kind of article in what is supposed to be the trusted source of a quality news outlet, The New York Times, not The Daily News.
  • There is a sneaky, ill-concealed message that persons with chronic illness will obtain wondrous benefits by just adopting a positive attitude – even a hint that cancer patients will live longer.

In my blog post about positive psychology and health, I try to provide  tools so that consumers can probe for themselves the usually false and certainly exaggerated claims that are being showered on them.

However, in the case of Jane Brody’s articles, we will see that the task is difficult because she draws on a selective sampling of the literature in which researchers generate junk self-promotional claims.

That’s a general problem with the positive psychology “science” literature, but the solution for journalists like Jane Brody is to seek independent evaluation of claims from outside the positive psychology community. Journalists, did you hear that message?

The article, along with its 100s of comments from readers, is available here:

A Positive Outlook May Be Good for Your Health by Jane E.Brody

The article starts with some clichéd advice about being positive. Brody seems to be on the side of the autonomy of her  readers. She makes seemingly derogatory comments  that the advice is “cockeyed optimism” [Don’t you love that turn of phrase? I’m sure to borrow it in the future]

“Look on the sunny side of life.”

“Turn your face toward the sun, and the shadows will fall behind you.”

“Every day may not be good, but there is something good in every day.”

“See the glass as half-full, not half-empty.”

Researchers are finding that thoughts like these, the hallmarks of people sometimes called “cockeyed optimists,” can do far more than raise one’s spirits. They may actually improve health and extend life.

See?  The clever putdown of this advice was just a rhetorical device, just a set up for what follows. Very soon Brody is delivering some coercive pseudoscientific advice, backed by the claim that “there is no longer any doubt” and that the links between positive thinking and health benefits are “indisputable.”

There is no longer any doubt that what happens in the brain influences what happens in the body. When facing a health crisis, actively cultivating positive emotions can boost the immune system and counter depression. Studies have shown an indisputable link between having a positive outlook and health benefits like lower blood pressure, less heart disease, better weight control [Emphasis added.].

I found the following passage particularly sneaky and undermining of people with cancer.

Even when faced with an incurable illness, positive feelings and thoughts can greatly improve one’s quality of life. Dr. Wendy Schlessel Harpham, a Dallas-based author of several books for people facing cancer, including “Happiness in a Storm,” was a practicing internist when she learned she had non-Hodgkin’s lymphoma, a cancer of the immune system, 27 years ago. During the next 15 years of treatments for eight relapses of her cancer, she set the stage for happiness and hope, she says, by such measures as surrounding herself with people who lift her spirits, keeping a daily gratitude journal, doing something good for someone else, and watching funny, uplifting movies. Her cancer has been in remission now for 12 years.

“Fostering positive emotions helped make my life the best it could be,” Dr. Harpham said. “They made the tough times easier, even though they didn’t make any difference in my cancer cells.”

Sure, Jane Brody is careful to avoid the explicit claim the positive attitude somehow is connected to the cancer being in remission for 12 years, but the implication is there. Brody pushes the advice with a hint of the transformation available to cancer patients, only if they follow the advice.

After all, Jane Brody had just earlier asserted that positive attitude affects the immune system and this well-chosen example happens to be a cancer of the immune system.

Jane Brody immediately launches into a description of a line of research conducted by a positive psychology group at Northwestern University and University of California San Francisco.

Taking her cue from the investigators, Brody blurs the distinction between findings based in correlational studies and the results of intervention studies in which patients actually practiced positive psychology exercises.

People with new diagnoses of H.I.V. infection who practiced these skills carried a lower load of the virus, were more likely to take their medication correctly, and were less likely to need antidepressants to help them cope with their illness.

But Brody sins as a journalist are worse than that. With a great deal of difficulty, I have chased her claims back into the literature. I found some made up facts.

In my literature search, I could find only one study from these investigators that seemed directly related to these claims. The mediocre retrospective correlational study was mainly focused on use of psychostimulants, but it included a crude 6-item summary measure  of positive states of mind.

The authors didn’t present the results in a simple way that allows direct independent examination of whether indeed positive affect is related to other outcomes in any simple fashion. They did not allow check of simple correlations needed to determine whether their measure was not simply a measure of depressive symptoms turned on its head. They certainly had the data, but did not report it. Instead, they present some multivariate analyses that do not show impressive links. Any direct links to viral load are not shown and presumably are not there, although the investigators tested statistically for them. Technically speaking, I would write off the findings to measurement and specification error, certainly not worthy of reporting in The New York Times.

Less technically speaking, Brody is leading up to using HIV as an exemplar illness where cultivating positivity can do so much. But if this study is worth anything at all, it is to illustrate that even correlationally, positive affect is not related to much, other than – no surprise – alternative measures of positive affect.

Brody then goes on to describe in detail an intervention study. You’d never know from her description that her source of information is not a report of the results of the intervention study, but a promissory protocol that supposedly describes how the intervention study was going to be done.

I previously blogged about this protocol. At first, I thought it was praiseworthy that a study of a positive psychology intervention for health had even complied with the requirement that studies be preregistered and have a protocol available. Most such studies do not, but they are supposed to do that. In plain English, protocols are supposed to declare ahead of time what researchers are going to do and precisely how they are going to evaluate whether an intervention works. That is because, notoriously, researchers are inclined to say later they were really trying to do something else and to pick another outcome that makes the intervention look best.

But then I got corrected by James Heathers on Facebook. Duh, he had looked at the date the protocol was published.

He pointed out that this protocol was actually published years after collection of data had begun. The researchers already had a lot to peek at. Rather than identifying just a couple of variables on which the investigators were prepared to stake their claim the intervention was affected, the protocol listed 25 variables that would be examined as outcomes (!) in order to pick one or two.

So I updated what I said in my earlier blog. I pointed out that the published protocol was misleading. It was posted after the fact of the researchers being able to see how their study was unfolding and to change their plains accordingly.  The vagueness of the protocol gave the authors lots of wiggle room for selectively reporting and hyping their findings with the confirmation bias. They would later take advantage of this when they actually published the results of their study.

The researchers studied 159 people who had recently learned they had H.I.V. and randomly assigned them to either a five-session positive emotions training course or five sessions of general support. Fifteen months past their H.I.V. diagnosis, those trained in the eight skills maintained higher levels of positive feelings and fewer negative thoughts related to their infection.

Brody is not being accurate here. When the  authors finally got around to publishing the results, they told a very different story if you probe carefully. Even with the investigators doing a lot of spinning, they showed null results, no effects for the intervention. Appearances the contrary were created by the investigators ignoring what they actually reported in their tables. If you go to my earlier blog post, I point this out in detail, so you can see for yourself.

Brody goes on to describe the regimen that was not shown in the published study validation to be effective.

An important goal of the training is to help people feel happy, calm and satisfied in the midst of a health crisis. Improvements in their health and longevity are a bonus. Each participant is encouraged to learn at least three of the eight skills and practice one or more each day. The eight skills are:

■ Recognize a positive event each day.

■ Savor that event and log it in a journal or tell someone about it.

■ Start a daily gratitude journal.

■ List a personal strength and note how you used it.

■ Set an attainable goal and note your progress.

■ Report a relatively minor stress and list ways to reappraise the event positively.

■ Recognize and practice small acts of kindness daily.

■ Practice mindfulness, focusing on the here and now rather than the past or future.

For chrissakes, this is a warmed over version of Émile Coué de la Châtaigneraie’s autosuggestion “Every day in every way, I’m getting better and better. Surely, contemporary positive psychology’s science of health can do better than that. To Coué’s credit, he gave away his advice for free. He did not charge for his coaching, even if he was giving away something for which he had no evidence would improve people’s physical health.

Dr. Moskowitz said she was inspired by observations that people with AIDS, Type 2 diabetes and other chronic illnesses lived longer if they demonstrated positive emotions. She explained, “The next step was to see if teaching people skills that foster positive emotions can have an impact on how well they cope with stress and their physical health down the line.”

She listed as the goals improving patients’ quality of life, enhancing adherence to medication, fostering healthy behaviors, and building personal resources that result in increased social support and broader attention to the good things in life.

Let me explain why I am offended here. None of these activities have been shown to improve the health of persons with newly diagnosed HIV. It’s reasonable to assume that newly diagnosed persons have a lot with which to contend. It’s a bad time to give them advice to clutter their life with activities that will not make a difference in their health.

The published study was able to recruit and retain a sample of persons with newly diagnosed HIV because it paid them well to keep coming. I’ve worked with this population before, in a study aiming at helping them solve specific practical problems that that they said got in the way of their adherence.

Many persons with newly diagnosed HIV are low income and are unemployed or marginally employed. They will enroll in studies to get the participant fees. When I lived in the San Francisco Bay area, I recall one patient telling a recruiter from UCSF that he was too busy and unable to make a regular visit to the medical center for the intervention, but he would be willing to accept being in the study if he was assigned to the control group. It did not involve attending intervention sessions and would give him a little cash.

Based on my clinical and research experience, I don’t believe that such patients would regularly show up for this kind of useless positive psychology treatment without getting paid. Paticularly if they were informed of the actual results of this misrepresented study.

Gregg De Meza, a 56-year-old architect in San Francisco who learned he was infected with H.I.V. four years ago, told me that learning “positivity” skills turned his life around. He said he felt “stupid and careless” about becoming infected and had initially kept his diagnosis a secret.

“When I entered the study, I felt like my entire world was completely unraveling,” he said. “The training reminded me to rely on my social network, and I decided to be honest with my friends. I realized that to show your real strength is to show your weakness. No pun intended, it made me more positive, more compassionate, and I’m now healthier than I’ve ever been.”

I object to this argument by quotes-from-an-unrepresentative-patient. The intervention did not have the intended effect, and it is misleading to find somebody who claim to turn their life around.

Jane Brody proceeds with some more fake facts.

In another study among 49 patients with Type 2 diabetes, an online version of the positive emotions skills training course was effective in enhancing positivity and reducing negative emotions and feelings of stress. Prior studies showed that, for people with diabetes, positive feelings were associated with better control of blood sugar, an increase in physical activity and healthy eating, less use of tobacco and a lower risk of dying.

The study was so small and underpowered, aside from being methodologically flawed, that even if such effects were actually present, most of the time they would be missed because the study did not have enough patients to achieve significance.

In a pilot study of 39 women with advanced breast cancer, Dr. Moskowitz said an online version of the skills training decreased depression among them. The same was true with caregivers of dementia patients.

“None of this is rocket science,” Dr. Moskowitz said. “I’m just putting these skills together and testing them in a scientific fashion.”

It’s not rocket science, it’s misleading hogwash.

In a related study of more than 4,000 people 50 and older published last year in the Journal of Gerontology, Becca Levy and Avni Bavishi at the Yale School of Public Health demonstrated that having a positive view of aging can have a beneficial influence on health outcomes and longevity. Dr. Levy said two possible mechanisms account for the findings. Psychologically, a positive view can enhance belief in one’s abilities, decrease perceived stress and foster healthful behaviors. Physiologically, people with positive views of aging had lower levels of C-reactive protein, a marker of stress-related inflammation associated with heart disease and other illnesses, even after accounting for possible influences like age, health status, sex, race and education than those with a negative outlook. They also lived significantly longer.

This is even deeper into the woo. Give me a break, Jane Brody. Stop misleading people with chronic illness with false claims and fake facts. Adopting these attitudes will not prevent dementia.

Don’t believe me? I previously debunked these patently false claims in detail. You can see my critique here.

Here is what the original investigators claimed about Alzheimer’s:

We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

I exposed some analysis of voodoo statistics on which this claim is based. I concluded:

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

Nonsense, utter nonsense.

Let chronically ill people and those facing cancer adopt any attitude is comfortable or natural for them. It’s a bad time to ask for change, particularly when there isn’t any promised benefit in improved health or prolonged life.

Rather than Jane Brody’s recipe for positive psychology improving your health, I strongly prefer Lilia Downe’s  La Cumbia Del Mole.

It is great on chicken. If it does not extend your life, It will give you some moments of happiness, but you will have to adjust the spices to your personal taste.

I will soon be offering e-books providing skeptical looks at positive psychology, as well as mindfulness. As in this blog post, I will take claims I find in the media and trace them back to the scientific studies on which they are based. I will show you what I see so you can see it too.

 Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. You can even advance order one or all of the e-books.

 Lots to see at CoyneoftheRealm.com. Come see…

The Prescription Pain Pill Epidemic: A Conversation with Dr. Anna Lembke

back-pain-in-seniors-helped-with-mindfulness-300x200manypills
My colleague, Dr. Anna Lembke is the Program Director for the Stanford University Addiction Medicine Fellowship, and Chief of the Stanford Addiction Medicine Dual Diagnosis Clinic. She is the author of a newly released book on the prescription pain pill epidemic: “Drug Dealer, MD: How Doctors Were Duped, Patients Got Hooked, and Why It’s So Hard to Stop” (Johns Hopkins University Press, October 2016).

I spoke with her recently about the scope of this public health tragedy, how we got here and what we need to do about it.

Dr. Jain: About 15-20 years ago American medicine underwent a radical cultural shift in its attitude towards pain, a shift that ultimately culminated in a public health tragedy. Can you comment on factors that contributed to that shift occurring in the first place?
Dr. Lembke: Sure. So the first thing that happened (and it was really more like the early 1980’s when this shift occurred) was that there were more people with daily pain. Overall, our population is getting healthier, but we also have more people with more pain conditions. No one really knows exactly the reason for that, but it probably involves people living longer with chronic illnesses, and more people getting surgical interventions for all types of condition. Any time you cut into the body, you cut across the nerves and you create the potential for some kind of neuropathic pain problem.
The other thing that happened in the 1980’s was the beginning of the hospice movement. This movement helped people at the very end of life (the last month to weeks to days of their lives) to transition to death in a more humane and peaceful way. There was growing recognition that we weren’t doing enough for people at the end of life. As part of this movement, many doctors began advocating for using opioids more liberally at the end of life.
There was also a broader cultural shift regarding the meaning of pain. Prior to 1900 people viewed pain as having positive value: “what does not kill you makes you stronger” or “after darkness comes the dawn”. There were spiritual and biblical connotations and positive meaning in enduring suffering. What arose, through the 20th century, was this idea that pain is actually something that you need to avoid because pain itself can lead to a psychic scar that contributes to future pain. Today, not only is pain painful, but pain begets future pain. By the 1990’s, pain was viewed as a very bad thing and something that had to be eliminated at all cost.
Growing numbers of people experiencing chronic pain, the influence of the hospice movement, and a shifting paradigm about the meaning and consequences of experiencing pain, led to increased pressures within medicine for doctors to prescribe more opioids. This shift was a departure from prior practice, when doctors were loathe to prescribe opioids, for fear of creating addiction, except in cases of severe trauma, cases involving surgery, or cases of the very end of life.
Dr. Jain: The American Pain Society had introduced “pain as the 5th vital sign,” a term which suggested physicians, who were not taking their patients’ pain seriously, were being neglectful. What are your thoughts about this term?
Dr. Lembke: “Pain is the 5th vital sign” is a slogan. It’s kind of an advertising campaign. We use slogans all the time in medicine, many times to good effect, to raise awareness both inside and outside the profession about a variety of medical issues. The reason that “pain is the 5th vital sign” went awry, however, has to do with the ways in which professional medical societies, like the American Pain Society, and so-called “academic thought leaders”, began to collaborate and cooperate with the pharmaceutical industry. That’s where “pain is the 5th vital sign” went from being an awareness campaign to being a brand for a product, namely prescription opioids.
So the good intentions in the early 1980’s turned into something really quite nefarious when it came to the way that we started treating patients. To really understand what happened, you have to understand the ways in which the pharmaceutical industry, particularly the makers of opioid analgesics, covertly collaborated with various institutions within what I’ll call Big Medicine, in order to promote opioid prescribing.
Dr. Jain: So by Big Medicine what do you mean?
Dr. Lembke: I mean the Federation of State Medical Boards, The Joint Commission (JACHO), pain societies, academic thought leaders, and the Food and Drug Administration (FDA). These are the leading organizations within medicine whose job it is to guide and regulate medicine. None of these are pharmaceutical companies per se, but what happened around opioid pain pills was that Big Pharma infiltrated these various organizations in order to use false evidence to encourage physicians to prescribe more opioids. They used a Trojan Horse approach.. They didn’t come out and say we want you to prescribe more opioids because we’re Big Pharma and we want to make more money, instead what they said was we want you to prescribe more opioids because that’s what the scientific evidence supports.
The story of how they did that is really fascinating. Let’s take The Joint Commission (JACHO) as an example. In 1996, when oxycontin was introduced to the market, JACHO launched a nationwide pain management educational program where they sold educational materials to hospitals, which they acquired for free from Purdue Pharma. These materials included statements which we now know to be patently false. JACHO sold the Purdue Pharma videos and literature on pain to hospitals.
These educational materials perpetuated four myths about opioid prescribing. The first myth was that opioids work for chronic pain. We have no evidence to support that. The second was that no dose is too high. So if your patient responds to opioids initially and then develops tolerance, just keep going up. And that’s how we got patients on astronomical amounts of opioids. The third myth was about pseudo addiction. If you have a patient who appears to be demonstrating drug seeking behavior, they’re not addicted. They just need more pain meds. The fourth and most insidious myth was that there is a halo effect when opioids are prescribed by a doctor, that is, they’re not addictive as long as they’re being used to treat pain.
So getting back to JACHO, not only did they use material propagating myths about the use of opioids to treat pain, but they also did something that was very insidious and, ultimately, very bad for patients. They made pain a “quality measure”. By The Joint Commission’s own definition of a quality measure, it must be something that you can count. So what they did was they created this visual analog scale, also known as the “pain scale”. The scale consists of numbers from one to ten describing pain, with sad and happy faces to match. JAHCO told doctors they needed to use this pain scale in order to assess a patients’ pain. What we know today is that this pain scale has not led to improved treatment or functional outcomes for patients with pain. The only thing that it has been correlated with is increased opioid prescribing.
This sort of stealth maneuver by Big Pharma to use false evidence or pseudo-science to infiltrate academic medicine, regulatory agencies, and academic societies in order to promote more opioid prescribing: that’s an enduring theme throughout any analysis of this epidemic.
Dr. Jain: Can you comment specifically on the breadth and depth of the opioid epidemic in the US? What were the key factors involved?
Dr. Lembke: Drug overdose is now the leading cause of accidental death in this country, exceeding death due to motor vehicle accidents or firearms. Driving this statistic is opioid deaths and driving opioid deaths is opioid pain prescription deaths, which in turn correlates with excessive opioid prescribing. There are more than 16,000 deaths per year due to prescription opioid overdoses.
What’s really important to understand is that an opioid overdose is not a suicide attempt. The vast majority of these people are not trying to kill themselves, and many of them are not even taking the medication in excess. They’re often taking it as prescribed, but over time are developing a low grade hypoxia. They may get a minor cold, let’s say a pneumonia, then they’ll take the pills and they’ll fall asleep and won’t wake up again because their tolerance to the euphorigenic and pain effects of the opioids is very robust, but their tolerance to the respiratory suppressant effect doesn’t keep pace with that. You can feel like you need to take more in order to eliminate the pain, but at the same time the opioid is suppressing your respiratory drive, so you eventually become hypoxemic and can’t breathe anymore and just fall into a gradual sleep that way.
There are more than two million people today who are addicted to prescription opioids. So not only is there this horrible risk of accidental death, but there’s obviously the risk of addiction. We also have heroin overdose deaths and heroin addiction on the rise, most likely on the coattails of the prescription opioids epidemic, driven largely by young people who don’t have reservations about switching from pills to heroin..
Dr. Jain: I was curious about meds like oxycontin, vicodin, and percocet. Are they somehow more addictive than other opioid pills?
Dr. Lembke: All opioids are addictive, especially if you’re dealing with an opioid naive person. But it is certainly true that some of the opioids are more addictive than others because of pharmacology. Let’s consider oxycontin. The major ingredient in oxycontin is oxycodone. Oxycodone is a very potent synthetic opioid. When Purdue formulated it into oxycontin, what they wanted to create was a twice daily pain medication for cancer patients. So they put this hard shell on a huge 12 hours worth of oxycodone. That hard shell was intended to release oxycodone slowly over the course of the day. But what people discovered is that if they chewed the oxycontin and broke that hard shell, then they got a whole day’s worth of very potent oxycodone at once. With that came the typical rush that people who are addicted to opioids describe, as well as this long and powerful and sustained high. So that is why oxycontin was really at the center of the prescription opioid epidemic. It basically was more addictive because of the quantity and potency once that hard shell was cracked.
Dr. Jain: So has the epidemic plateaued? And if so, why?
Dr. Lembke: The last year for which we have CDC data is 2014, when there were more prescription opioid-related deaths, and more opioid prescriptions written by doctors, than in any year prior. This is remarkable when you think that by 2014, there was already wide-spread awareness of the problem. Yet doctors were not changing their prescribing habits, and patients were dying in record numbers.
I’m really looking forward to the next round of CDC data to come out and tell us what 2015 looked like. I do not believe we have reached the end or even the waning days of this epidemic. Doctors continue to write over 250 million opioid prescriptions annually, a mere fraction of what was written three decades ago.
Also, the millions of people who have been taking opioids for years are not easily weaned from opioids.. They now have neuroadaptive changes in their brains which are very hard to undo. I can tell you from clinical experience that even when I see patients motivated to get off of their prescription opioids, it can take weeks, months, and even years to make that happen.
So I don’t think that the epidemic has plateaued, and this is one of the major points that I try to make in my book. The prescription drug epidemic is the canary in the coal mine. It speaks to deeper problems within medicine. Doctors get reimbursed for prescribing a pill or doing a procedure, but not for talking to our patients and educating them. That’s a problem. The turmoil in the insurance system we can’t even establish long term relationships with our patients. So as a proxy for real healing and attachment, we prescribe opioids. ! Those kinds of endemic issues within medicine have not changed, and until they do, I believe this prescription drug problem will continue unabated.

Danish RCT of cognitive behavior therapy for whatever ails your physician about you

I was asked by a Danish journalist to examine a randomized controlled trial (RCT) of cognitive behavior therapy (CBT) for functional somatic symptoms. I had not previously given the study a close look.

I was dismayed by how highly problematic the study was in so many ways.

I doubted that the results of the study showed any benefits to the patients or have any relevance to healthcare.

I then searched and found the website for the senior author’s clinical offerings.  I suspected that the study was a mere experimercial or marketing effort of the services he offered.

Overall, I think what I found hiding in plain sight has broader relevance to scrutinizing other studies claiming to evaluate the efficacy of CBT for what are primarily physical illnesses, not psychiatric disorders. Look at the other RCTs. I am confident you will find similar problems. But then there is the bigger picture…

[A controversial assessment ahead? You can stop here and read the full text of the RCT  of the study and its trial registration before continuing with my analysis.]

Schröder A, Rehfeld E, Ørnbøl E, Sharpe M, Licht RW, Fink P. Cognitive–behavioural group treatment for a range of functional somatic syndromes: randomised trial. The British Journal of Psychiatry. 2012 Apr 13:bjp-p.

A summary overview of what I found:

 The RCT:

  • Was unblinded to patients, interventionists, and to the physicians continuing to provide routine care.
  • Had a grossly unmatched, inadequate control/comparison group that leads to any benefit from nonspecific (placebo) factors in the trial counting toward the estimated efficacy of the intervention.
  • Relied on subjective self-report measures for primary outcomes.
  • With such a familiar trio of design flaws, even an inert homeopathic treatment would be found effective, if it were provided with the same positive expectations and support as the CBT in this RCT. [This may seem a flippant comment that reflects on my credibility, not the study. But please keep reading to my detailed analysis where I back it up.]
  • The study showed an inexplicably high rate of deterioration in both treatment and control group. Apparent improvement in the treatment group might only reflect less deterioration than in the control group.
  • The study is focused on unvalidated psychiatric diagnoses being applied to patients with multiple somatic complaints, some of whom may not yet have a medical diagnosis, but most clearly had confirmed physical illnesses.

But wait, there is more!

  • It’s not CBT that was evaluated, but a complex multicomponent intervention in which what was called CBT is embedded in a way that its contribution cannot be evaluated.

The “CBT” did not map well on international understandings of the assumptions and delivery of CBT. The complex intervention included weeks of indoctrination of the patient with an understanding of their physical problems that incorporated simplistic pseudoscience before any CBT was delivered. We focused on goals imposed by a psychiatrist that didn’t necessarily fit with patients’ sense of their most pressing problems and the solutions.

OMGAnd the kicker.

  • The authors switched primary outcomes – reconfiguring the scoring of their subjective self-report measures years into the trial, based on a peeking at the results with the original scoring.

Investigators have a website which is marketing services. Rather than a quality contribution to the literature, this study can be seen as an experimercial doomed to bad science and questionable results from before the first patient was enrolled. An undeclared conflict of interest in play? There is another serious undeclared conflict of interest for one of the authors.

For the uninformed and gullible, the study handsomely succeeds as an advertisement for the investigators’ services to professionals and patients.

Personally, I would be indignant if a primary care physician tried to refer me or friend or family member to this trial. In the absence of overwhelming evidence to the contrary, I assume that people around me who complain of physical symptoms have legitimate physical concerns. If they do not yet have a confirmed diagnosis, it serves little purpose to stop the probing and refer them to psychiatrists. This trial operates with an anachronistic Victorian definition of psychosomatic condition.

something is rotten in the state of DenmarkBut why should we care about a patently badly conducted trial with switched outcomes? Is it only a matter of something being rotten in the state of Denmark? Aside from the general impact on the existing literature concerning CBT for somatic conditions, results of this trial  were entered into a Cochrane review of nonpharmacological interventions for medically unexplained symptoms. I previously complained about one of the authors of this RCT also being listed as an author on another Cochrane review protocol. Prior to that, I complained to Cochrane  about this author’s larger research group influencing a decision to include switched outcomes in another Cochrane review.  A lot of us rightfully depend heavily on the verdict of Cochrane reviews for deciding best evidence. That trust is being put into jeopardy.

Detailed analysis

1.This is an unblinded trial, a particularly weak methodology for examining whether a treatment works.

The letter that alerted physicians to the trial had essentially encouraged them to refer patients they were having difficulty managing.

‘Patients with a long-term illness course due to medically unexplained or functional somatic symptoms who may have received diagnoses like fibromyalgia, chronic fatigue syndrome, whiplash associated disorder, or somatoform disorder.

Patients and the physicians who referred them subsequently got feedback about to which group patients were assigned, either routine care or what was labeled as CBT. This information could have had a strong influence on the outcomes that were reported, particularly for the patients left in routine care.

Patients’ learning that they did not get assigned to the intervention group was undoubtedly disappointing and demoralizing. The information probably did nothing to improve the positive expectations and support available to patients in routine. This could have had a nocebo effect. The feedback may have contributed to the otherwise  inexplicably high rates of subjective deterioration [to be noted below] reported by patients left in the routine care condition. In contrast, the authors’ disclosure that patients had been assigned to the intervention group undoubtedly boosted the morale of both patients and physicians and also increased the gratitude of the patients. This would be reflected in the responses to the subjective outcome measures.

The gold standard alternative to an unblinded trial is a double-blind, placebo-controlled trial in which neither providers, nor patients, nor even the assessors rating outcomes know to which group particular patients were assigned. Of course, this is difficult to achieve in a psychotherapy trial. Yet a fair alternative is a psychotherapy trial in which patients and those who refer them are blind to the nature of the different treatments, and in which an effort is made to communicate credible positive expectations about the comparison control group.

Conclusion: A lack of blinding seriously biases this study toward finding a positive effect for the intervention, regardless of whether the intervention has any active, effective component.

2. A claim that this is a randomized controlled trial depends on the adequacy of the control offered by the comparison group, enhanced routine care. Just what is being controlled by the comparison? In evaluating a psychological treatment, it’s important that the comparison/control group offers the same frequency and intensity of contact, positive expectations, attention and support. This trial decidedly did not.

 There were large differences between the intervention and control conditions in the amount of contact time. Patients assigned to the cognitive therapy condition received an additional 9 group sessions with a psychiatrist of 3.5 hour duration, plus the option of even more consultations. The over 30 hours of contact time with a psychiatrist should be very attractive to patients who wanted it and could not otherwise obtain it. For some, it undoubtedly represented an opportunity to have someone to listen to their complaints of pain and suffering in a way that had not previously happened. This is also more than the intensity of psychotherapy typically offered in clinical trials, which is closer to 10 to 15, 50-minute sessions.

The intervention group thus received substantially more support and contact time, which was delivered with more positive expectations. This wealth of nonspecific factors favoring the intervention group compromises an effort to disentangle the specific effects of any active ingredient in the CBT intervention package. From what has been said so far, the trials’ providing a fair and generalizable evaluation of the CBT intervention is nigh impossible.

Conclusion: This is a methodologically poor choice of control groups with the dice loaded to obtain a positive effect for CBT.

3.The primary outcomes, both as originally scored and after switching, are subjective self-report measures that are highly responsive to nonspecific treatments, alleviation of mild depressive symptoms and demoralization. They are not consistently related to objective changes in functioning. They are particularly problematic when used as outcome measures in the context of an unblinded clinical trial within an inadequate control group.

There have been consistent demonstrations that assigning patients to inert treatments and measuring the outcomes with subjective measures may register improvements that will not correspond to what would be found with objective measures.

For instance, a provocative New England Journal of Medicine study showed that sham acupuncture as effective as an established medical treatment – an albuterol inhaler – for asthma when judged with subjective measures, but there was a large superiority for the established medical treatment obtained with objective measures.

There have been a number of demonstrations that treatments such as the one offered in the present study to patient populations similar to those in the study produce changes in subjective self-report that are not reflected in objective measures.

Much of the improvement in primary outcomes occurred before the first assessment after baseline and not very much afterwards. The early response is consistent with a placebo response.

The study actually included one largely unnoticed objective measure, utilization of routine care. Presumably if the CBT was effective as claimed, it would have produced a significant reduction in healthcare utilization. After all, isn’t the point of this trial to demonstrate that CBT can reduce health-care utilization associated with (as yet) medically unexplained symptoms? Curiously, utilization of routine care did not differ between groups.

The combination of the choice of subjective outcomes, unblinded nature of the trial, and poorly chosen control group bring together features that are highly likely to produce the appearance of positive effects, without any substantial benefit to the functioning and well-being of the patients.

Conclusion: Evidence for the efficacy of a CBT package for somatic complaints that depends solely on subjective self-report measures is unreliable, and unlikely to generalize to more objective measures of meaningful impact on patients’ lives.

4. We need to take into account the inexplicably high rates of deterioration in both groups, but particularly in the control group receiving enhanced care.

There was an unexplained deterioration of 50% deterioration in the control group and 25% in the intervention group. Rates of deterioration are only given a one-sentence mention in the article, but deserve much more attention. These rates of deterioration need to qualify and dampen any generalizable clinical interpretation of other claims about outcomes attributed to the CBT. We need to keep in mind that the clinical trials cannot determine how effective treatments are, but only how different a treatment is from a control group. So, an effect claimed for a treatment and control can largely or entirely come from deterioration in the control group, not what the treatment offers. The claim of success for CBT probably largely depends on the deterioration in the control group.

One interpretation of this trial is that spending an extraordinary 30 hours with a psychiatrist leads to only half the deterioration experienceddoing nothing more than routine care. But this begs the question of why are half the patients left in routine care deteriorating in such a large proportion. What possibly could be going on?

Conclusion: Unexplained deterioration in the control group may explain apparent effects of the treatment, but both groups are doing badly.

5. The diagnosis of “functional somatic symptoms” or, as the authors prefer – Severe Bodily Distress Syndromes – is considered by the authors to be a psychiatric diagnosis. It is not accepted as a valid diagnosis internationally. Its validation is limited to the work done almost entirely within the author group, which is explicitly labeled as “preliminary.” This biased sample of patients is quite heterogeneous, beyond their physicians having difficulty managing them. They have a full range of subjective complaints and documented physical conditions. Many of these patients would not be considered primarily having a psychiatric disorder internationally and certainly within the US, except where they had major depression or an anxiety disorder. Such psychiatric disorders were not an exclusion criteria.

Once sent on the pathway to a psychiatric diagnosis by their physicians’ making a referral to the study, patients had to meet additional criteria:

To be eligible for participation individuals had to have a chronic (i.e. of at least 2 years duration) bodily distress syndrome of the severe multi-organ type, which requires functional somatic symptoms from at least three of four bodily systems, and moderate to severe impairment.in daily living.

The condition identified in the title of the article is not validated as a psychiatric diagnosis. Two papers to which the authors refer to their  own studies ( 1 , 2 ) from a single sample. The title of one of these papers makes a rather immodest claim:

Fink P, Schröder A. One single diagnosis, bodily distress syndrome, succeeded to capture 10 diagnostic categories of functional somatic syndromes and somatoform disorders. Journal of Psychosomatic Research. 2010 May 31;68(5):415-26.

In neither the two papers nor the present RCT is there sufficient effort to rule out a physical basis for the complaints qualifying these patients for a psychiatric diagnosis. There is also a lack of follow-up to see if physical diagnoses were later applied.

Citation patterns of these papers strongly suggest  the authors are not having got much traction internationally. The criteria of symptoms from three out of four bodily systems is arbitrary and unvalidated. Many patients with known physical conditions would meet these criteria without any psychiatric diagnosis being warranted.

The authors relate what is their essentially homegrown diagnosis to functional somatic syndromes, diagnoses which are themselves subject to serious criticism. See for instance the work of Allen Frances M.D., who had been the chair of the American Psychiatric Association ‘s Diagnostic and Statistical Manual (DSM-IV) Task Force. He became a harsh critic of its shortcomings and the failures of APA to correct coverage of functional somatic syndromes in the next DSM.

Mislabeling Medical Illness As Mental Disorder

Unless DSM-5 changes these incredibly over inclusive criteria, it will greatly increase the rates of diagnosis of mental disorders in the medically ill – whether they have established diseases (like diabetes, coronary disease or cancer) or have unexplained medical conditions that so far have presented with somatic symptoms of unclear etiology.

And:

The diagnosis of mental disorder will be based solely on the clinician’s subjective and fallible judgment that the patient’s life has become ‘subsumed’ with health concerns and preoccupations, or that the response to distressing somatic symptoms is ‘excessive’ or ‘disproportionate,’ or that the coping strategies to deal with the symptom are ‘maladaptive’.

And:

 “These are inherently unreliable and untrustworthy judgments that will open the floodgates to the overdiagnosis of mental disorder and promote the missed diagnosis of medical disorder.

The DSM 5 Task force refused to adopt changes proposed by Dr. Frances.

Bad News: DSM 5 Refuses to Correct Somatic Symptom Disorder

Leading Frances to apologize to patients:

My heart goes out to all those who will be mislabeled with this misbegotten diagnosis. And I regret and apologize for my failure to be more effective.

The chair of The DSM Somatic Symptom Disorder work group has delivered a scathing critique of the very concept of medically unexplained symptoms.

Dimsdale JE. Medically unexplained symptoms: a treacherous foundation for somatoform disorders?. Psychiatric Clinics of North America. 2011 Sep 30;34(3):511-3.

Dimsdale noted that applying this psychiatric diagnosis sidesteps the quality of medical examination that led up to it. Furthermore:

Many illnesses present initially with nonspecific signs such as fatigue, long before the disease progresses to the point where laboratory and physical findings can establish a diagnosis.

And such diagnoses may encompass far too varied a group of patients for any intervention to make sense:

One needs to acknowledge that diseases are very heterogeneous. That heterogeneity may account for the variance in response to intervention. Histologically, similar tumors have different surface receptors, which affect response to chemotherapy. Particularly in chronic disease presentations such as irritable bowel syndrome or chronic fatigue syndrome, the heterogeneity of the illness makes it perilous to diagnose all such patients as having MUS and an underlying somatoform disorder.

I tried making sense of a table of the additional diagnoses that the patients in this study had been given. A considerable proportion of patients had physical conditions that would not be considered psychiatric problems in the United States.. Many patients could be suffering from multiple symptoms not only from the conditions, but side effects of the medications being offered. It is very difficult to manage multiple medications required by multiple comorbidities. Physicians from the community found their competence and ability to spend time with these patients taxing.

table of functional somatic symptoms

Most patients had a diagnosis of “functional headaches.” It’s not clear what this designation means, but conceivably it could include migraine headaches, which are accompanied by multiple physical complaints. CBT is not an evidence-based treatment of choice for functional headaches, much less migraines.

Over a third of the patients had irritable bowel syndrome (IBS). A systematic review of the comorbidity  of irritable bowel syndrome concluded physical comorbidity is the norm in IBS:

The nongastrointestinal nonpsychiatric disorders with the best-documented association are fibromyalgia (median of 49% have IBS), chronic fatigue syndrome (51%), temporomandibular joint disorder (64%), and chronic pelvic pain (50%).

In the United States, many patients and specialists would consider considering irritable bowel syndrome as a psychiatric condition offensive and counterproductive. There is growing evidence that irritable bowel syndrome is a disturbance in the gut microbiota. It involves a gut-brain interaction, but the primary direction of influence is of the disturbance in the gut on the brain. Anxiety and depression symptoms are secondary manifestations, a product of activity in the gut influencing the nervous system.

Most of the patients in the sample had a diagnosis of fibromyalgia and over half of all patients in this study had a diagnosis of chronic fatigue syndrome.

Other patients had diagnosable anxiety and depressive disorders, which, particularly at the lower end of severity, are responsive to nonspecific treatments.

Undoubtedly many of these patients, perhaps most of them, are demoralized by not been able to get a  diagnosis for what they have good basis to believe is a medical condition, aside from the discomfort, pain, and interference with their life that they are experiencing. They could be experiencing a demoralization secondary to physical illness.

These patients presented with pain, fatigue, general malaise, and demoralization. I have trouble imagining how their specific most pressing concerns could be addressed in group settings. These patients pose particular problems for making substantive clinical interpretation of outcomes that are highly general and subjective.

Conclusion: Diagnosing patients with multiple physical symptoms as having a psychiatric condition is highly controversial. Results will not generalize to countries and settings where the practice is not accepted. Many of the patients involved in the study had recognizable physical conditions, and yet they are being shunted to psychiatrists who focused only on their attitude towards the symptoms. They are being denied the specialist care and treatments that might conceivably reduce the impact of their conditions on their lives

6. The “CBT” offered in this study is as part of a complex, multicomponent treatment that does not resemble cognitive behavior therapy as it is practiced in the United States.

it is thoughtAs seen in figure 1 in the article, The multicomponent intervention is quite complex and consists of more than cognitive behavior therapy. Moreover, at least in the United States, CBT has distinctive elements of collaborative empiricism. Patients and therapist work together selecting issues on which to focus, developing strategies, with the patients reporting back on efforts to implement them. From the details available in the article, the treatment sounded much more like an exhortation or indoctrination, even arguing with the patients, if necessary. An English version available on the web of the educational material used in initial sessions confirmed a lot of condescending pseudoscience was presented to convince the patients that their problems were largely in their heads.

Without a clear application of learning theory, behavioral analysis, or cognitive science, the “CBT”  treatment offered in this RCT has much more in common with the creative novation therapy offered by Hans Eysenck, which is now known to have been justified with fraudulent data. Indeed,  the educational materials  for this study to what is offered in Eysenck’s study reveal striking similarities. Eysenck was advancing the claim that his intervention could prevent cardiovascular disease and cancer and overcome the iatrogenic effects. I know, this sounds really crazy, but see my careful documentation elsewhere.

Conclusion: The embedding of an unorthodox “CBT” in a multicomponent intervention in this study does not allow isolating any specific, active component ofCBT that might be at work.

7. The investigators disclose having altered their scoring of their primary outcome years after the trial began, and probably after a lot of outcome data had been collected.

I found a casual disclosure in the method section of this article unsettling, particularly noting that the original trial registration was:

We found an unexpected moderate negative correlation of the physical and mental component summary measures, which are constructed as independent measures. According to the SF-36 manual, a low or zero correlation of the physical and mental components is a prerequisite of their use.23 Moreover, three SF-36 scales that contribute considerably to the PCS did not fulfil basic scaling assumptions.31 These findings, together with a recent report of problems with the PCS in patients with physical and mental comorbidity,32 made us concerned that the PCS would not reliably measure patients’ physical health in the study sample. We therefore decided before conducting the analysis not to use the PCS, but to use instead the aggregate score as outlined above as our primary outcome measure. This decision was made on 26 February 2009 and registered as a protocol change at clinical trials. gov on 11 March 2009. Only baseline data had been analysed when we made our decision and the follow-up data were still concealed.

Switching outcomes, particularly after some results are known, constitutes a serious violation of best research practices and leads to suspicion of the investigators refining their hypotheses after they had peeked at the data. See How researchers dupe the public with a sneaky practice called “outcome switching”

The authors had originally proposed a scoring consistent with a very large body of literature. Dropping the original scoring precludes any direct comparison with this body of research, including basic norms. They claim that they switched scoring because two key subscales were correlated in the opposite direction of what is reported in the larger literature. This is troubling indication that something has gone terribly wrong in authors’ recruitment of a sample. It should not be pushed under the rug.

The authors claim that they switched outcomes based only on examining of baseline data from their study. However, one of the authors, Michael Sharpe is also an author on the controversial PACE trial  A parallel switch was made to the scoring of the subjective self-reports in that trial. When the data were eventually re-analyzed using the original scoring, any positive findings for the trial were substantially reduced and arguably disappeared.

Even if the authors of the present RCT did not peekat their outcome data before deciding to switch scoring of the primary outcome, they certainly had strong indications from other sources that the original scoring would produce weak or null findings. In 2009, one of the authors, Michael Sharpe had access to results of a relevant trial. What is called the FINE trial had null findings, which affected decisions to switch outcomes in the PACE trial. Is it just a coincidence that the scoring of the outcomes was then switched for the present RCT?

Conclusion: The outcome switching for the present trial  represents bad research practices. For the trial to have any credibility, the investigators should make their data publicly available so these data could be independently re-analyzed with the original scoring of primary outcomes.

The senior author’s clinic

 I invite readers to take a virtual tour of the website for the senior author’s clinical services  ]. Much of it is available in English. Recently, I blogged about dubious claims of a health care system in Detroit achieving a goal of “zero suicide.” . I suggested that the evidence for this claim was quite dubious, but was a powerful advertisement for the health care system. I think the present report of an RCT can similarly be seen as an infomercial for training and clinical services available in Denmark.

Conflict of interest

 No conflict of interest is declared for this RCT. Under somewhat similar circumstances, I formally complained about undeclared conflicts of interest in a series of papers published in PLOS One. A correction has been announced, but not yet posted.

Aside from the senior author’s need to declare a conflict of interest, the same can be said for one of the authors, Michael Sharpe.

Apart from the professional and reputational interest, (his whole career has been built making strong claims about such interventions) Sharpe works for insurance companies, and publishes on the subject. He declared a conflict of interest for the for PACE trial.

MS has done voluntary and paid consultancy work for government and for legal and insurance companies, and has received royalties from Oxford University Press.

Here’s Sharpe’s report written for the social benefits reinsurance company UnumProvident.

If results of this are accepted at face, they will lend credibility to the claims that effective interventions are available to reduce social disability. It doesn’t matter that the intervention is not effective. Rather persons receiving social disability payments can be disqualified because they are not enrolled in such treatment.

Effects on the credibility of Cochrane collaboration report

The switched outcomes of the trial were entered into a Cochrane systematic review, to which primary care health professionals look for guidance in dealing with a complex clinical situation. The review gives no indication of the host of problems that I exposed here. Furthermore, I have glanced at some of the other trials included and I see similar difficulties.

I been unable to convince the Cochrane to clean up conflicts of interest that are attached to switched outcomes being entered in reviews. Perhaps some of my readers will want to approach Cochrane to revisit this issue.
I think this post raises larger issues about whether Cochrane has any business conducting and disseminating reviews of such a bogus psychiatric diagnosis, medically unexplained symptoms. These reviews do patients no good, and may sidetrack them from getting the medical care they deserve. The reviews do serve the interest of special interests, including disability insurance companies.

Special thanks to John Peters and to Skeptical Cat for their assistance with my writing this blog. However, I have sole responsibility for any excesses or distortions.

 

Unintended consequences of universal mindfulness training for schoolchildren?

the mindful nationThis is the first installment of what will be a series of occasional posts about the UK Mindfulness All Party Parliamentary Group report,  Mindful Nation.

  • Mindful Nation is seriously deficient as a document supposedly arguing for policy based on evidence.
  • The professional and financial interests of lots of people involved in preparation of the document will benefit from implementation of its recommendations.
  • After an introduction, I focus on two studies singled in Mindful Nation out as offering support for the benefits of mindfulness training for school children.
  • Results of the group’s cherrypicked studies do not support implementation of mindfulness training in the schools, but inadvertently highlight some issues.
  • Investment in universal mindfulness training in the schools is unlikely to yield measurable, socially significant results, but will serve to divert resources from schoolchildren more urgently in need of effective intervention and support.
  • Mindfulness Nation is another example of  delivery of  low intensity  services to mostly low risk persons to the detriment of those in greatest and most urgent need.

The launch event for the Mindful Nation report billed it as the “World’s first official report” on mindfulness.

Mindful Nation is a report written by the UK Mindfulness All-Party Parliamentary Group.

The Mindfulness All-Party Parliamentary Group (MAPPG)  was set up to:

  • review the scientific evidence and current best practice in mindfulness training
  • develop policy recommendations for government, based on these findings
  • provide a forum for discussion in Parliament for the role of mindfulness and its implementation in public policy.

The Mindfulness All-Party Parliamentary Group describes itself as

Impressed by the levels of both popular and scientific interest, and launched an inquiry to consider the potential relevance of mindfulness to a range of urgent policy challenges facing government.

Don’t get confused by this being a government-commissioned report. The report stands in sharp contrast to one commissioned by the US government in terms of unbalanced constitution of the committee undertaking the review, and lack  of transparency in search for relevant literature,  and methodology for rating and interpreting of the quality of available evidence.

ahrq reportCompare the claims of Mindful Nation to a comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ) that reviewed 18,753 citations, and found only 47 trials (3%) that included an active control treatment. The vast majority of studies available for inclusion had only a wait list or no-treatment control group and so exaggerated any estimate of the efficacy of mindfulness.

Although the US report was available to those  preparing the UK Mindful Nation report, no mention is made of either the full contents of report or a resulting publication in a peer-reviewed journal. Instead, the UK Mindful Nation report emphasized narrative and otherwise unsystematic reviews, and meta-analyses not adequately controlling for bias.

When the abridged version of the AHRQ report was published in JAMA: Internal Medicine, an accompanying commentary raises issues even more applicable to the Mindful Nation report:

The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?

The members of the UK Mindfulness All-Party Parliamentary Group were selected for their positive attitude towards mindfulness. The collection of witnesses they called to hearings were saturated with advocates of mindfulness and those having professional and financial interests in arriving at a positive view. There is no transparency in terms of how studies or testimonials were selected, but the bias is notable. Many of the scientific studies were methodologically poor, if there was any methodology at all. Many were strongly stated, but weakly substantiated opinion pieces. Authors often included those having  financial interests in obtaining positive results, but with no acknowledgment of conflict of interest. The glowing testimonials were accompanied by smiling photos and were unanimous in their praise of the transformative benefits of mindfulness.

As Mark B. Cope and David B. Allison concluded about obesity research, such a packing of the committee and a highly selective review of the literature leads to a ”distortion of information in the service of what might be perceived to be righteous ends.” [I thank Tim Caulfield for calling this quote to my attention].

Mindfulness in the schools

The recommendations of Mindfulness Nation are

  1. The Department for Education (DfE) should designate, as a first step, three teaching schools116 to pioneer mindfulness teaching,co-ordinate and develop innovation, test models of replicability and scalability and disseminate best practice.
  2. Given the DfE’s interest in character and resilience (as demonstrated through the Character Education Grant programme and its Character Awards), we propose a comparable Challenge Fund of £1 million a year to which schools can bid for the costs of training teachers in mindfulness.
  3. The DfE and the Department of Health (DOH) should recommend that each school identifies a lead in schools and in local services to co-ordinate responses to wellbeing and mental health issues for children and young people117. Any joint training for these professional leads should include a basic training in mindfulness interventions.
  4. The DfE should work with voluntary organisations and private providers to fund a freely accessible, online programme aimed at supporting young people and those who work with them in developing basic mindfulness skills118.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.

Leading up to these recommendations, the report outlined an “alarming crisis” in the mental health of children and adolescents and proposes:

Given the scale of this mental health crisis, there is real urgency to innovate new approaches where there is good preliminary evidence. Mindfulness fits this criterion and we believe there is enough evidence of its potential benefits to warrant a significant scaling-up of its availability in schools.

Think of all the financial and professional opportunities that proponents of mindfulness involved in preparation of this report have garnered for themselves.

Mindfulness to promote executive functioning in children and adolescents

For the remainder of the blog post, I will focus on the two studies cited in support of the following statement:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

The terms “executive control” and “emotional stability” were clarified:

Many argue that the most important prerequisites for child development are executive control (the management of cognitive processes such as memory, problem solving, reasoning and planning) and emotion regulation (the ability to understand and manage the emotions, including and especially impulse control). These main contributors to self-regulation underpin emotional wellbeing, effective learning and academic attainment. They also predict income, health and criminality in adulthood69. American psychologist, Daniel Goleman, is a prominent exponent of the research70 showing that these capabilities are the biggest single determinant of life outcomes. They contribute to the ability to cope with stress, to concentrate, and to use metacognition (thinking about thinking: a crucial skill for learning). They also support the cognitive flexibility required for effective decision-making and creativity.

Actually, Daniel Goleman is the former editor of the pop magazine Psychology Today and an author of numerous pop books.

The first cited paper.

73 Flook L, Smalley SL, Kitil MJ, Galla BM, Kaiser-Greenland S, Locke J, et al. Effects of mindful  awareness practices on executive functions in elementary school children. Journal of Applied School Psychology. 2010;26(1):70-95.

Journal of Applied School Psychology is a Taylor-Francis journal, formerly known as Special Services in the Schools (1984 – 2002).  Its Journal Impact Factor is 1.30.

One of the authors of the article, Susan Kaiser-Greenland is a mindfulness entrepreneur as seen in her website describing her as an author, public speaker, and educator on the subject of sharing secular mindfulness and meditation with children and families. Her books are The Mindful Child: How to Help Your Kid Manage Stress and Become Happier, Kinder, and More Compassionate and Mindful Games: Sharing Mindfulness and Meditation with Children, Teens, and Families and the forthcoming The Mindful Games Deck: 50 Activities for Kids and Teens.

This article represents the main research available on Kaiser-Greenfield’s Inner Kids program and figures prominently in her promotion of her products.

The sample consisted of 64 children assigned to either mindful awareness practices (MAPs; n = 32) or a control group consisting of a silent reading period (n = 32).

The MAPs training used in the current study is a curriculum developed by one of the authors (SKG). The program is modeled after classical mindfulness training for adults and uses secular and age appropriate exercises and games to promote (a) awareness of self through sensory awareness (auditory, kinesthetic, tactile, gustatory, visual), attentional regulation, and awareness of thoughts and feelings; (b) awareness of others (e.g., awareness of one’s own body placement in relation to other people and awareness of other people’s thoughts and feelings); and (c) awareness of the environment (e.g., awareness of relationships and connections between people, places, and things).

A majority of exercises involve interactions among students and between students and the instructor.

Outcomes.

The primary EF outcomes were the Metacognition Index (MI), Behavioral Regulation Index (BRI), and Global Executive Composite (GEC) as reported by teachers and parents

Wikipedia presents the results of this study as:

The program was delivered for 30 minutes, twice per week, for 8 weeks. Teachers and parents completed questionnaires assessing children’s executive function immediately before and following the 8-week period. Multivariate analysis of covariance on teacher and parent reports of executive function (EF) indicated an interaction effect baseline EF score and group status on posttest EF. That is, children in the group that received mindful awareness training who were less well regulated showed greater improvement in EF compared with controls. Specifically, those children starting out with poor EF who went through the mindful awareness training showed gains in behavioral regulation, metacognition, and overall global executive control. These results indicate a stronger effect of mindful awareness training on children with executive function difficulties.

The finding that both teachers and parents reported changes suggests that improvements in children’s behavioral regulation generalized across settings. Future work is warranted using neurocognitive tasks of executive functions, behavioral observation, and multiple classroom samples to replicate and extend these preliminary findings.”

What I discovered when I scrutinized the study.

 This study is unblinded, with students and their teachers and parents providing the subjective ratings of the students well aware of which group students are assigned. We are not given any correlations among or between their ratings and so we don’t know whether there is just a global subjective factor (easy or difficult child, well-behaved or not) operating for either teachers or parents, or both.

It is unclear for what features of the mindfulness training the comparison reading group offers control or equivalence. The two groups are  different in positive expectations and attention and support that are likely to be reflected the parent and teacher ratings. There’s a high likelihood of any differences in outcomes being nonspecific and not something active and distinct ingredient of mindfulness training. In any comparison with the students assigned to reading time, students assigned to mindfulness training have the benefit of any active ingredient it might have, as well as any nonspecific, placebo ingredients.

This is exceedingly weak design, but one that dominates evaluations of mindfulness.

With only 32 students per group, note too that this is a seriously underpowered study. It has less than a 50% probability of detecting a moderate sized effect if one is present. And because of the larger effect size needed to achieve statistical significance with such a small sample size, and statistically significant effects will be large, even if unlikely to replicate in a larger sample. That is the paradox of low sample size we need to understand in these situations.

Not surprisingly, there were no differences between the mindfulness and reading control groups on any outcomes variable, whether rated by parents or teachers. Nonetheless, the authors rescued their claims for an effective intervention with:

However, as shown by the significance of interaction terms, baseline levels of EF (GEC reported by teachers) moderated improvement in posttest EF for those children in the MAPs group compared to children in the control group. That is, on the teacher BRIEF, children with poorer initial EF (higher scores on BRIEF) who went through MAPs training showed improved EF subsequent to the training (indicated by lower GEC scores at posttest) compared to controls.

Similar claims were made about parent ratings. But let’s look at figure 3 depicting post-test scores. These are from the teachers, but results for the parent ratings are essentially the same.

teacher BRIEF quartiles

Note the odd scaling of the X axis. The data are divided into four quartiles and then the middle half is collapsed so that there are three data points. I’m curious about what is being hidden. Even with the sleight-of-hand, it appears that scores for the intervention and control groups are identical except for the top quartile. It appears that just a couple of students in the control group are accounting for any appearance of a difference. But keep in mind that the upper quartile is only a matter of eight students in each group.

This scatter plot is further revealing:

teacher BRIEF

It appears that the differences that are limited to the upper quartile are due to a couple of outlier control students. Without them, even the post-hoc differences that were found in the upper quartile between intervention control groups would likely disappear.

Basically what we are seeing is that most students do not show any benefit whatsoever from mindfulness training over being in a reading group. It’s not surprising that students who were not particularly elevated on the variables of interest do not register an effect. That’s a common ceiling effect in such universally delivered interventions in general population samples

Essentially, if we focus on the designated outcome variables, we are wasting the students’ time as well as that of the staff. Think of what could be done if the same resources could be applied in more effective ways. There are a couple of students in in this study were outliers with low executive function. We don’t know how else they otherwise differ.Neither in the study, nor in the validation of these measures is much attention given to their discriminant validity, i.e., what variables influence the ratings that shouldn’t. I suspect strongly that there are global, nonspecific aspects to both parent and teacher ratings such that they are influenced by the other aspects of these couple of students’ engagement with their classroom environment, and perhaps other environments.

I see little basis for the authors’ self-congratulatory conclusion:

The present findings suggest that mindfulness introduced in a general  education setting is particularly beneficial for children with EF difficulties.

And

Introduction of these types of awareness practices in elementary education may prove to be a viable and cost-effective way to improve EF processes in general, and perhaps specifically in children with EF difficulties, and thus enhance young children’s socio-emotional, cognitive, and academic development.

Maybe the authors stared with this conviction and it was unshaken by disappointing findings.

Or the statement made in Mindfulness Nation:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

But we have another study that is cited for this statement.

74. Huppert FA, Johnson DM. A controlled trial of mindfulness training in schools: The importance of practice for an impact on wellbeing. The Journal of Positive Psychology. 2010; 5(4):264-274.

The first author, Felicia Huppert is a  Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University.

This study involved 173 14- and 15- year old  boys from a private Catholic school.

The Journal of Positive Psychology is not known for its high methodological standards. A look at its editorial board suggests a high likelihood that manuscripts submitted will be reviewed by sympathetic reviewers publishing their own methodologically flawed studies, often with results in support of undeclared conflicts of interest.

The mindfulness training was based on the program developed by Kabat-Zinn and colleagues at the University of Massachusetts Medical School (Kabat-Zinn, 2003). It comprised four 40 minute classes, one per week, which presented the principles and practice of mindfulness meditation. The mindfulness classes covered the concepts of awareness and acceptance, and the mindfulness practices included bodily awareness of contact points, mindfulness of breathing and finding an anchor point, awareness of sounds, understanding the transient nature of thoughts, and walking meditation. The mindfulness practices were built up progressively, with a new element being introduced each week. In some classes, a video clip was shown to highlight the practical value of mindful awareness (e.g. “The Last Samurai”, “Losing It”). Students in the mindfulness condition were also provided with a specially designed CD, containing three 8-minute audio files of mindfulness exercises to be used outside the classroom. These audio files reflected the progressive aspects of training which the students were receiving in class. Students were encouraged to undertake daily practice by listening to the appropriate audio files. During the 4-week training period, students in the control classes attended their normal religious studies lessons.

A total of 155 participants had complete data at baseline and 134 at follow-up (78 in the mindfulness and 56 in the control condition). Any student who had missing data are at either time point was simply dropped from the analysis. The effects of this statistical decison are difficult to track in the paper. Regardless, there was a lack of any difference between intervention and control group and any of a host of outcome variables, with none designated as primary outcome.

Actual practicing of mindfulness by students was inconsistent.

One third of the group (33%) practised at least three times a week, 34.8% practised more than once but less than three times a week, and 32.7% practised once a week or less (of whom 7 respondents, 8.4%, reported no practice at all). Only two students reported practicing daily. The practice variable ranged from 0 to 28 (number of days of practice over four weeks). The practice variable was found to be highly skewed, with 79% of the sample obtaining a score of 14 or less (skewness = 0.68, standard error of skewness = 0.25).

The authors rescue their claim of a significant effect for the mindfulness intervention with highly complex multivariate analyses with multiple control variables in which outcomes within-group effects for students assigned to mindfulness  were related to the extent of students actually practicing mindfulness. Without controlling for the numerous (and post-hoc) multiple comparisons, results were still largely nonsignificant.

One simple conclusion that can be drawn is that despite a lot of encouragement, there was little actual practice of mindfulness by the relatively well-off students in a relatively highly resourced school setting. We could expect results to improve with wider dissemination to schools with less resources and less privileged students.

The authors conclude:

The main finding of this study was a significant improvement on measures of mindfulness and psychological well-being related to the degree of individual practice undertaken outside the classroom.

Recall that Mindful Nation cited the study in the following context:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

These are two methodologically weak studies with largely null findings. They are hardly the basis for launching a national policy implementing universal mindfulness in the schools.

As noted in the US AHRQ report, despite a huge number of studies of mindfulness having been conducted, few involved a test with an adequate control group, and so there’s little evidence that mindfulness has any advantage over any active treatment. Neither of these studies disturbed that conclusion, although they are spun both in the original studies and in the Mindful Nation report to be positive. Both papers were published in journals where the reviewers were likely to be overly sympathetic and not at him tentative to serious methodological and statistical problems.

The committee writing Mindful Nation arrived at conclusions consistent with their prior enthusiasm for mindfulness and their vested interest in it. They sorted through evidence to find what supported their pre-existing assumptions.

Like UK resilience programs, the recommendations of Mindful Nation put considerable resources in the delivery of services to a large population and likely to have the threshold of need to register a socially in clinically significant effect. On a population level, results of the implementation are doomed to fall short of its claims. Those many fewer students in need more timely, intensive, and tailored services are left underserved. Their presence is ignored or, worse, invoked to justify the delivery of services to the larger group, with the needy students not benefiting.

In this blog post, I mainly focused on two methodologically poor studies. But for the selection of these particular studies, I depended on the search of the authors of Mindful Nation and the emphasis that were given to these two studies for some sweeping claims in the report. I will continue to be writing about the recommendations of Mindful Nation. I welcome reader feedback, particularly from readers whose enthusiasm for mindfulness is offended. But I urge them not simply to go to Google and cherry pick an isolated study and ask me to refute its claims.

Rather, we need to pay attention to the larger literature concerning mindfulness, its serious methodological problems, and the sociopolitical forces and vested interests that preserve a strong confirmation bias, both in the “scientific” literature and its echoing in documents like Mindful Nation.