“ACT: The best thing [for pain] since sliced bread or the Emperor’s new clothes?”

Reflections on the debate with David Gillanders about Acceptance and Commitment Therapy at the British Pain Society, Glasgow, September 15, 2017

mind the brain logo

Reflections on the debate with David Gillanders about Acceptance and Commitment Therapy at the British Pain Society, Glasgow, September 15, 2017

my title slideDavid Gillanders  and I held our debate “ACT: best thing since sliced bread or the Emperor’s new clothes?” at the British Pain Society meeting on Thursday, September 15, 2017 in Glasgow. We will eventually make our slides and a digital recording of the debate available.

I enjoyed hanging out with David Gillanders. He is a great guy who talks the talk, but also walks the walk. He lives ACT as a life philosophy. He was an ACT trainer speaking before a sympathetic audience, many who had been trained by him.

Some reflections from a few days later.

I was surprised how much Acceptance and Commitment Therapy (along with #mindfulness) has taken over UK pain services. A pre-debste poll showed most of the  audience  came convinced that indeed, ACT was the best thing since sliced bread.

I was confident that my skepticism was firmly rooted in the evidence. I don’t think there is debate about that. David Gillanders agreed that higher quality studies were needed.

But in the end, even I did not convert many, I came away quite pleased with the debate.

Standards for evaluating the  evidence for ACT for pain

 I recently wrote that ACT may have moved into a post-evidence phase, with its chief proponents switching from citing evidence to making claims about love, suffering, and the meaning of life. Seriously.

Steve Hayes prompted me on Twitter to take a closer look at the most recent evidence for ACT. As reported in an earlier blog, I took a close look.  I was not impressed that proponents of ACT are making much progress in developing evidence in any way as strong as their claims. We need a lot less ACT research that doesn’t add any quality evidence despite ACT being promoted enthusiastically as if it does. We need more sobriety from the promoters of ACT, particularly those in academia, like Steve Hayes and Kelly Wilson who know something about how to evaluate evidence. They should not patronize workshop goers with fanciful claims.

David Gillanders talked a lot about the philosophy and values that are expressed in ACT, but he also made claims about its research base, echoing the claims made by Steve Hayes and other prominent ACT promoters.

Standards for evaluating research exist independent of any discussion of ACT

There are standards for interpreting clinical trials and integration of their results in meta analysis that exist independent of the ACT literature. It is not a good idea to challenge these standards in the context of defending ACT against unfavorable evaluations, although that is exactly how Hayes and his colleagues often respond. I will get around to blogging about the most recent example of this.

Atkins PW, Ciarrochi J, Gaudiano BA, Bricker JB, Donald J, Rovner G, Smout M, Livheim F, Lundgren T, Hayes SC. Departing from the essential features of a high quality systematic review of psychotherapy: A response to Öst (2014) and recommendations for improvement. Behaviour Research and Therapy. 2017 May 29.

Within-group (pre-post) differences in outcome. David Gillanders echoed Hayes in using within-group effects sizes to describe the effectiveness of ACT. Results presented in this way are better and may look impressive, but they are exaggerated when compared to results obtained between groups. I am not making that up. Changes within the group of patients who received ACT reflect the specific effects of ACT plus whatever nonspecific factors were operating. That is why we need an appropriate comparison-control group to examine between-group differences, which are always more modest than just looking at the within-group effects.

Compared to what? Most randomized trials of ACT involve a wait list, no-treatment, or ill-described standard care (which often represents no treatment). Such comparisons are methodologically weak, especially when patients and providers know what is going on-called an unblinded trial– and when outcomes are subjective self-report measures.

homeopathyA clever study in New England Journal of Medicine showed that with such subjective self-report measures, one cannot distinguish between a proven effective inhaled medication for asthma, an inert substance simply inhaled, and sham acupuncture. In contrast, objective measures of breathing clearly distinguish the medication from the comparison-control conditions.

So, it is not an exaggeration to say that most evaluations of ACT are conducted under circumstances that even sham acupuncture or homeopathy would look effective.

Not superior to other treatments. There are no trials comparing ACT to a credible active treatment in which ACT proves superior, either for pain or other clinical problems. So, we are left saying ACT is better than doing nothing, at least in trials where any nonspecific effects are concentrated among the patients receiving ACT.

Rampant investigator bias. A lot of trials of ACT are conducted by researchers having an investment in showing that ACT is effective. That is a conflict of interest. Sometimes it is called investigator allegiance, or a promoter or originator bias.

Regardless, when drugs are being evaluated in a clinical trial, it is recognized that there will be a bias toward the drug favored by the manufacturer conducting the trial. It is increasingly recognized that meta analyses conducted by promoters should also be viewed with extra skepticism. And that trials conducted with researchers having such conflicts of interest should be considered separately to see if they produced exaggerated.

ACT desperately needs randomized trials conducted by researchers who don’t have a dog in the fight, who lack the motivation to torture findings to give positive results when they are simply not present. There’s a strong confirmation bias in current ACT trials, with promoter/researchers embarrassing themselves in their maneuvers to show strong, positive effects when their only weak or null findings available. I have documented [ 1, 2 ] how this trend started with Steve Hayes dropping two patients from his study of effects of brief ACT on re-hospitalization of inpatients with Patricia Bach. One patient had died by suicide and another was in jail and so they couldn’t be rehospitalized and were drop from the analyses. The deed could only be noticed by comparing the published paper with Patricia Bach’s dissertation. It allowed an otherwise nonsignificant finding a small trial significant.

Trials that are too small to matter. A lot of ACT trials have too few patients to produce a reliable, generalizable effect size. Lots of us in situations far removed from ACT trials have shown justification for the rule of thumb that we should consider effect sizes from trials having less than 35 patients per treatment of comparison cell. Even this standard is quite liberal. Even if a moderate effect would be significantly larger trial, there is less than a 50% probability it be detected the trial this small. To be significant with such a small sample size, differences between treatments have to be large, and there probably either due to chance or something dodgy that the investigators did.

Many claims for the effectiveness of ACT for particular clinical problems come from trials too small to generate a reliable effect sizes. I invite readers to undertake the simple exercise of looking at the sample sizes in a study cited has support of the effectiveness of ACT. If you exclude such small studies, there is not much research left to talk about.

Too much flexibility in what researchers report in publications. Many trials of ACT involve researchers administering a whole battery of outcome measures and then emphasizing those that make ACT look best and either downplaying or not mentioning further the rest. Similarly, many trials of ACT deemphasize whether the time X treatment interaction is significant in and simply ignore it if it is not all focus on the within-group differences. I know, we’re getting a big tactical here. But this is another way of saying is that many trials of ACT gives researchers too much latitude in choosing what variables to report and what statistics are used to evaluate them.

Under similar circumstances, showed that listening to the Beatles song When I’m 64 left undergraduates 18 months younger than when they listen to the song Karamba. Of course, the researchers knew damn well that the Beatles song didn’t have this effect, but they indicated they were doing what lots of investigators due to get significant results, what they call p-hacking.

Many randomized trials of ACT are conducted with the same researcher flexibility that would allow a demonstration that listening to a Beatles song drops the age of undergraduates 18 months.

Many of the problems with ACT research could be avoided if researchers were required to publish ahead of time their primary outcome variables and plans for analyzing them. Such preregistration is increasingly recognized as best research practices, including by NIMH. There is  no excuse not to do it.

My take away message?

ACT gurus have been able to dodge the need to develop quality data to support their claims that their treatment is effective (and their sometime claim it is more effective than other approaches). A number of them are university-based academics and have ample resources to develop better quality evidence.

Workshop and weekend retreat attendees are convinced that ACT works on the strength of experiential learning and a lot of theoretical mumbo jumbo.

But the ACT promoters also make a lot of dodgy claims that there is strong evidence that the specific ingredients of ACT, techniques and values, account for the power of ACT. But some of the ACT gurus, Steve Hayes and Kelly Wilson at least, are academics and should limit their claims of being ‘evidence-based” to what is supported by strong, quality evidence. They don’t. I think they are being irresponsible in throwing in “evidence-based’ with all the

What should I do as an evidence-based skeptic wanting to improve the conversation about ACT?

 Earlier in my career, I spent six years in live supervision in some world-renowned therapists behind the one-way mirror including John Weakland, Paul Watzlawick, and Dick Fisch. I gave workshops world wide on how to do brief strategic therapies with individuals, couples, and families. I chose not to continue because (1) I didn’t like the pressure for drama and exciting interventions when I interviewed patients in front of large groups; (2) Even when there was a logic and appearance of effectiveness to what I did, I didn’t believe it could be manualized; and (3) My group didn’t have the resources to conduct proper outcome studies.

But I got it that workshop attendees like drama, exciting interventions, and emotional experiences. They go to trainings expecting to be entertained, as much as informed. I don’t think I can change that.

Many therapists have not had the training to evaluate claims about research, even if they accept that being backed by research findings is important. They depend on presenters to tell them about research and tend to trust what they say. Even therapist to know something about research, tennis and critical judgment when caught up in emotionality provided by some training experiences. Experiential learning can be powerful, even when it is used to promote interventions that are not supported by evidence.

I can’t change the training of therapists nor the culture of workshops and training experience. But I can reach out to therapist who want to develop skills to evaluate research for themselves. I think some of the things that point out in this blog post are quite teachable as things to look for.

I hope I can connect with therapists who want to become citizen scientists who are skeptical about what they hear and want to become equipped to think for themselves and look for effective resources when they don’t know how to interpret claims.

This is certainly not all therapists and may only be a minority. But such opinion leaders can be champions for the others in facilitating intelligent discussions of research concerning the effectiveness of psychotherapies. And they can prepare their colleagues to appreciate that most change in psychotherapy is not as dramatic or immediate as seen in therapy workshops.

eBook_PositivePsychology_345x550I will soon be offering e-books providing skeptical looks at positive psychology and mindfulness, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

Sign up at my website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

 

Creating illusions of wondrous effects of yoga and meditation on health: A skeptic exposes tricks

The tour of the sausage factory is starting, here’s your brochure telling you’ll see.

 

A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.

What Is the Molecular Signature of Mind–Body Interventions? A Systematic Review of Gene Expression Changes Induced by Meditation and Related Practices.  Frontiers in Immunology. 2017;8.

Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release.  Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:

“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”

And

“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”

[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]

Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.

Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.

This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.

Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring  confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments.  Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:

More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.

Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review.  But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.

Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]

As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA.  We can be no more confident with what the trials provide than we would be without them ever having been done.

I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.

I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.

Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.

The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.

One giveaway is how the introductions  referred to the biggest, most badass, comprehensive and well-done review, that of Goyal and colleagues.

That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored.  If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing.  Only a handful  of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.

One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?

For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.

Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.

Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.

If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:

In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).

Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.

Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.

Some general rules:

  1. Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.

Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.

When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!

Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.

Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up.  If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.

  1. Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.

What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.

For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.

I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.

  1. Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.

In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants.  A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.

  1. Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.

Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).

  1. Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or  browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
  1. Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.

Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.

But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.

  1. Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
  1. Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.

Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.”  Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and  . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.

Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of  altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:

One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.

How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?

 

 

Calling out pseudoscience, radically changing the conversation about Amy Cuddy’s power posing paper

Part 1: Reviewed as the clinical trial that it is, the power posing paper should never have been published.

Has too much already been written about Amy Cuddy’s power pose paper? The conversation should not be stopped until its focus shifts and we change our ways of talking about psychological science.

The dominant narrative is now that a junior scientist published an influential paper on power posing and was subject to harassment and shaming by critics, pointing to the need for greater civility in scientific discourse.

Attention has shifted away from the scientific quality of the paper and the dubious products the paper has been used to promote and on the behavior of its critics.

Amy Cuddy and powerful allies are given forums to attack and vilify critics, accusing them of damaging the environment in which science is done and discouraging prospective early career investigators from entering the field.

Meanwhile, Amy Cuddy commands large speaking fees and has a top-selling book claiming the original paper provides strong science for simple behavioral manipulations altering mind-body relations and producing socially significant behavior.

This misrepresentation of psychological science does potential harm to consumers and the reputation of psychology among lay persons.

This blog post is intended to restart the conversation with a reconsideration of the original paper as a clinical and health psychology randomized trial (RCT) and, on that basis, identifying the kinds of inferences that are warranted from it.

In the first of a two post series, I argue that:

The original power pose article in Psychological Science should never been published.

-Basically, we have a therapeutic analog intervention delivered in 2 1-minute manipulations by unblinded experimenters who had flexibility in what they did,  what they communicated to participants, and which data they chose to analyze and how.

-It’s unrealistic to expect that 2 1-minute behavioral manipulations would have robust and reliable effects on salivary cortisol or testosterone 17 minutes later.

-It’s absurd to assume that the hormones mediated changes in behavior in this context.

-If Amy Cuddy retreats to the idea that she is simply manipulating “felt power,” we are solidly in the realm of trivial nonspecific and placebo effects.

The original power posing paper

Carney DR, Cuddy AJ, Yap AJ. Power posing brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science. 2010 Oct 1;21(10):1363-8.

The Psychological Science article can be construed as a brief mind-body intervention consisting of 2 1-minute behavioral manipulations. Central to the attention that the paper attracted is that argument that this manipulation  affected psychological state and social performance via the effects of the manipulation on the neuroendocrine system.

The original study is in effect, a disguised randomized clinical trial (RCT) of a biobehavioral intervention. Once this is recognized, a host of standards can come into play for reporting this study and interpreting the results.

CONSORT

All major journals and publishers including Association for Psychological Science have adopted the Consolidated Standards of Reporting Trials (CONSORT). Any submission of a manuscript reporting a clinical trial is required to be accompanied by a checklist  indicating that the article reports that particular details of how the trial was conducted. Item 1 on the checklist specifies that both the title and abstract indicate the study was a randomized trial. This is important and intended to aid readers in evaluating the study, but also for the study to be picked up in systematic searches for reviews that depend on screening of titles and abstracts.

I can find no evidence that Psychological Science adheres to CONSORT. For instance, my colleagues and I provided a detailed critique of a widely promoted study of loving-kindness meditation that was published in Psychological Science the same year as Cuddy’s power pose study. We noted that it was actually a poorly reported null trial with switched outcomes. With that recognition, we went on to identify serious conceptual, methodological and statistical problems. After overcoming considerable resistance, we were able  to publish a muted version of our critique. Apparently reviewers of the original paper had failed to evaluate it in terms of it being an RCT.

The submission of the completed CONSORT checklist has become routine in most journals considering manuscripts for studies of clinical and health psychology interventions. Yet, additional CONSORT requirements that developed later about what should be included in abstracts are largely being ignored.

It would be unfair to single out Psychological Science and the Cuddy article for noncompliance to CONSORT for abstracts. However, the checklist can be a useful frame of reference for noting just how woefully inadequate the abstract was as a report of a scientific study.

CONSORT for abstracts

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF, CONSORT Group. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLOS Medicine. 2008 Jan 22;5(1):e20.

Journal and conference abstracts should contain sufficient information about the trial to serve as an accurate record of its conduct and findings, providing optimal information about the trial within the space constraints of the abstract format. A properly constructed and well-written abstract should also help individuals to assess quickly the validity and applicability of the findings and, in the case of abstracts of journal articles, aid the retrieval of reports from electronic databases.

Even if CONSORT for abstracts did not exist, we could argue that readers, starting with the editor and reviewers were faced with an abstract with extraordinary claims that required better substantiation. They were disarmed by a lack of basic details from evaluating these claims.

In effect, the abstract reduces the study to an experimercial for products about to be marketed in corporate talks and workshops, but let’s persist in evaluating it as an abstract as a scientific study.

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

I don’t believe I have ever encountered in an abstract the extravagant claims with which this abstract concludes. But readers are not provided any basis for evaluating the claim until the Methods section. Undoubtedly, many holding opinions about the paper did not read that far.

Namely:

Forty-two participants (26 females and 16 males) were randomly assigned to the high-power-pose or low-power-pose condition.

Testosterone levels were in the normal range at both Time 1 (M = 60.30 pg/ml, SD = 49.58) and Time 2 (M = 57.40 pg/ml, SD = 43.25). As would be suggested by appropriately taken and assayed samples (Schultheiss & Stanton, 2009), men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.2

Too small a study to provide an effect size

Hold on! First. Only 42 participants  (26 females and 16 males) would readily be recognized as insufficient for an RCT, particularly in an area of research without past RCTs.

After decades of witnessing the accumulation of strong effect sizes from underpowered studies, many of us have reacted by requiring 35 participants per group as the minimum acceptable level for a generalizable effect size. Actually, that could be an overly liberal criterion. Why?

Many RCTs are underpowered, yet a lack of enforcement of preregistration allows positive results by redefining the primary outcomes after results are known. A psychotherapy trial with 30 or less patients in the smallest cell has less than a 50% probability of detecting a moderate sized significant effect, even if it is present (Coyne,Thombs, & Hagedoorn, 2010). Yet an examination of the studies mustered for treatments being evidence supported by APA Division 12 indicates that many studies were too underpowered to be reliably counted as evidence of efficacy, but were included without comment about this problem. Taking an overview, it is striking the extent to which the literature continues depend on small, methodologically flawed RCTs conducted by investigators with strong allegiances to one of the treatments being evaluated. Yet, which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated (Luborsky et al., 2006).

Earlier my colleagues and I had argued for the non-accumulative  nature of evidence from small RCTs:

Kraemer, Gardner, Brooks, and Yesavage (1998) propose excluding small, underpowered studies from meta-analyses. The risk of including studies with inadequate sample size is not limited to clinical and pragmatic decisions being made on the basis of trials that cannot demonstrate effectiveness when it is indeed present. Rather, Kraemer et al. demonstrate that inclusion of small, underpowered trials in meta-analyses produces gross overestimates of effect size due to substantial, but unquantifiable confirmatory publication bias from non-representative small trials. Without being able to estimate the size or extent of such biases, it is impossible to control for them. Other authorities voice support for including small trials, but generally limit their argument to trials that are otherwise methodologically adequate (Sackett & Cook, 1993; Schulz & Grimes, 2005). Small trials are particularly susceptible to common methodological problems…such as lack of baseline equivalence of groups; undue influence of outliers on results; selective attrition and lack of intent-to-treat analyses; investigators being unblinded to patient allotment; and not having a pre-determined stopping point so investigators are able to stop a trial when a significant effect is present.

In the power posing paper, there was the control for sex in all analyses because a peek at the data revealed baseline sex differences in testosterone dwarfing any other differences. What do we make of investigators conducting a study depending on testosterone mediating a behavioral manipulation who did not anticipate large baseline sex differences in testosterone?

In a Pubpeer comment leading up to this post , I noted:

We are then told “men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.”

The findings alluded to in the abstract should be recognizable as weird and uninterpretable. Most basically, how could the 16 males be distributed across the two groups so that the authors could confidently say that differences held for both males and females? Especially when all analyses control for sex? Sex is highly correlated with testosterone and so an analysis that controlled for both the variables, sex and testosterone would probably not generalize to testosterone without such controls.

We are never given the basic statistics in the paper to independently assess what the authors are doing, not the correlation between cortisol and testosterone, only differences in time 2 cortisol controlling for time 1 cortisol, time 1 testosterone and gender. These multivariate statistics are not  very generalizable in a sample with 42 participants distributed across 2 groups. Certainly not for the 26 females and 16  males taken separately.

The behavioral manipulation

The original paper reports:

Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation.

And then elaborates:

To configure the test participants into the poses, the experimenter placed an electrocardiography lead on the back of each participant’s calf and underbelly of the left arm and explained, “To test accuracy of physiological responses as a function of sensor placement relative to your heart, you are being put into a certain physical position.” The experimenter then manually configured participants’ bodies by lightly touching their arms and legs. As needed, the experimenter provided verbal instructions (e.g., “Keep your feet above heart level by putting them on the desk in front of you”). After manually configuring participants’ bodies into the two poses, the experimenter left the room. Participants were videotaped; all participants correctly made and held either two high-power or two low-power poses for 1 min each. While making and holding the poses, participants completed a filler task that consisted of viewing and forming impressions of nine faces.

The behavioral task and subjective self-report assessment

Measure of risk taking and powerful feelings. After they finished posing, participants were presented with the gambling task. They were endowed with $2 and told they could keep the money—the safe bet—or roll a die and risk losing the $2 for a payoff of $4 (a risky but rational bet; odds of winning were 50/50). Participants indicated how “powerful” and “in charge” they felt on a scale from 1 (not at all) to 4 (a lot).

An imagined bewildered review from someone accustomed to evaluating clinical trials

Although the authors don’t seem to know what they’re doing, we have an underpowered therapy analogue study with extraordinary claims. It’s unconvincing  that the 2 1-minute behavioral manipulations would change subsequent psychological states and behavior with any extralaboratory implications.

The manipulation poses a puzzle to research participants, challenging them to figure out what is being asked of them. The $2 gambling task presumably is meant to simulate effects on real-world behavior. But the low stakes could mean that participants believed the task evaluated whether they “got” the purpose of the intervention and behaved accordingly. Within that perspective, the unvalidated subjective self-report rating scale would serve as a clue to the intentions of the experimenter and an opportunity to show the participants were smart. The  manipulation of putting participants  into a low power pose is even more unconvincing as a contrasting active intervention or a control condition.  Claims that this manipulation did anything but communicate experimenter expectancies are even less credible.

This is a very weak form of evidence: A therapy analogue study with such a brief, low intensity behavioral manipulation followed by assessments of outcomes that might just inform participants of what they needed to do to look smart (i.e., demand characteristics). Add in that the experimenters were unblinded and undoubted had flexibility in how they delivered the intervention and what they said to participants. As a grossly underpowered trial, the study cannot make a contribution to the literature and certainly not an effect size.

Furthermore, if the authors had even a basic understanding of gender differences in social status or sex differences in testosterone, they would have stratified the study with respect to participate gender, not attempted to obtain control by post hoc statistical manipulation.

I could comment on signs of p-hacking and widespread signs of inappropriate naming, use, and interpretation of statistics, but why bother? There are no vital signs of a publishable paper here.

Is power posing salvaged by fashionable hormonal measures?

 Perhaps the skepticism of the editor and reviewers was overcome by the introduction of mind-body explanations  of what some salivary measures supposedly showed. Otherwise, we would be left with a single subjective self-report measure and a behavioral task susceptible to demand characteristics and nonspecific effects.

We recognize that the free availability of powerful statistical packages risks people using them without any idea of the appropriateness of their use or interpretation. The same observation should be made of the ready availability of means of collecting spit samples from research participants to be sent off to outside laboratories for biochemical analysis.

The clinical health psychology literature is increasingly filled with studies incorporating easily collected saliva samples intended to establish that psychological interventions influence mind-body relations. These have become particularly applied in attempts to demonstrate that mindfulness meditation and even tai chi can have beneficial effects on physical health and even cancer outcomes.

Often inaccurately described as as “biomarkers,” rather than merely as biological measurements, there is seldom little learned by inclusion of such measures that is generalizable within participants or across studies.

Let’s start with salivary-based cortisol measures.

A comprehensive review  suggests that:

  • A single measurement on a participant  or a pre-post pair of assessments would not be informative.
  • Single measurements are unreliable and large intra-and inter-individual differences not attributable to intervention can be in play.
  • Minor variations in experimental procedures can have large, unwanted effects.
  • The current standard is cortisol awakening response in the diurnal slope over more than one day, which would not make sense for the effects of 2 1-minute behavioral manipulations.
  • Even with sophisticated measurement strategies there is low agreement across and even within studies and low agreement with behavioral and self-report data.
  • The idea of collecting saliva samples would serve the function the investigators intended is an unscientific, but attractive illusion.

Another relevant comprehensive theoretical review and synthesis of cortisol reactivity was available at the time the power pose study was planned. The article identifies no basis for anticipating that experimenters putting participants into a 1-minute expansive poses would lower cortisol. And certainly no basis for assuming that putting participants into a 1-minute slumped position would raise cortisol. Or what such findings could possibly mean.

But we are clutching at straws. The authors’ interpretations of their hormonal data depend on bizarre post hoc decisions about how to analyze their data in a small sample in which participant sex is treated in incomprehensible  fashion. The process of trying to explain spurious results risks giving the results a credibility that authors have not earned for them. And don’t even try to claim we are getting signals of hormonal mediation from this study.

Another system failure: The incumbent advantage given to a paper that should not have been published.

Even when publication is based on inadequate editorial oversight and review, any likelihood or correction is diminished by published results having been blessed as “peer reviewed” and accorded an incumbent advantage over whatever follows.

A succession of editors have protected the power pose paper from post-publication peer review. Postpublication review has been relegated to other journals and social media, including PubPeer and blogs.

Soon after publication of  the power pose paper, a critique was submitted to Psychological Science, but it was desk rejected. The editor informally communicated to the author that the critique read like a review and teh original article had already been peer reviewed.

The critique by Steven J. Stanton nonetheless eventually appeared in Frontiers in Behavioral Neuroscience and is worth a read.

Stanton took seriously the science being invoked in the claims of the power pose paper.

A sampling:

Carney et al. (2010) collapsed over gender in all testosterone analyses. Testosterone conforms to a bimodal distribution when including both genders (see Figure 13; Sapienza et al., 2009). Raw testosterone cannot be considered a normally distributed dependent or independent variable when including both genders. Thus, Carney et al. (2010) violated a basic assumption of the statistical analyses that they reported, because they used raw testosterone from pre- and post-power posing as independent and dependent variables, respectively, with all subjects (male and female) included.

And

^Mean cortisol levels for all participants were reported as 0.16 ng/mL pre-posing and 0.12 ng/mL post-posing, thus showing that for all participants there was an average decrease of 0.04 ng/mL from pre- to post-posing, regardless of condition. Yet, Figure 4 of Carney et al. (2010) shows that low-power posers had mean cortisol increases of roughly 0.025 ng/mL and high-power posers had mean cortisol decreases of roughly 0.03 ng/mL. It is unclear given the data in Figure 4 how the overall cortisol change for all participants could have been a decrease of 0.04 ng/mL.

Another editor of Psychological Science received a critical comment from Marcus Crede and Leigh A. Phillips. After the first round of reviews, the Crede and Philips removed references to changes in the published power pose paper from earlier drafts that they had received from the first author, Dana Carney. However, Crede and Phillips withdrew their critique when asked to respond to a review by Amy Cuddy in a second resubmission.

The critique is now forthcoming in Social Psychological and Personality Science

Revisiting the Power Pose Effect: How Robust Are the Results Reported by Carney, Cuddy and Yap (2010) to Data Analytic Decisions

The article investigates effects of choices made in p-hacking in the original paper. An excerpt from the abstract

In this paper we use multiverse analysis to examine whether the findings reported in the original paper by Carney, Cuddy, and Yap (2010) are robust to plausible alternative data analytic specifications: outlier identification strategy; the specification of the dependent variable; and the use of control variables. Our findings indicate that the inferences regarding the presence and size of an effect on testosterone and cortisol are  highly sensitive to data analytic specifications. We encourage researchers to routinely explore the influence of data analytic choices on statistical inferences and also encourage editors and  reviewers to require explicit examinations of the influence of alternative data analytic  specifications on the inferences that are drawn from data.

Dana Carney, the first author of the has now posted an explanation why she no longer believes the originally reported findings are genuine and why “the evidence against the existence of power poses is undeniable.” She discloses a number of important confounds and important “researcher degrees of freedom in the analyses reported in the published paper.

Coming Up Next

A different view of the Amy Cuddy’s Ted talk in terms of its selling of pseudoscience to consumers and its acknowledgment of a strong debt to Cuddy’s adviser Susan Fiske.

A disclosure of some of the financial interests that distort discussion of the scientific flaws of the power pose.

How the reflexive response of the replicationados inadvertently reinforced the illusion that the original pose study provided meaningful effect sizes.

How Amy Cuddy and her allies marshalled the resources of the Association for Psychological Science to vilify and intimidate critics of bad science and of the exploitation of consumers by psychological pseudoscience.

How journalists played into this vilification.

What needs to be done to avoid a future fiasco for psychology like the power pose phenomenon and protect reformers of the dissemination of science.

Note: Time to reiterate that all opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Jane Brody promoting the pseudoscience of Barbara Fredrickson in the New York Times

Journalists’ coverage of positive psychology and health is often shabby, even in prestigious outlets like The New York Times.

Jane Brody’s latest installment of the benefits of being positive on health relied heavily on the work of Barbara Fredrickson that my colleagues and I have thoroughly debunked.

All of us need to recognize that research concerning effects of positive psychology interventions are often disguised randomized controlled trials.

With that insight, we need to evaluate this research in terms of reporting standards like CONSORT and declarations of conflict of interests.

We need to be more skeptical about the ability of small changes in behavior being able to profoundly improve health.

When in doubt, assume that much of what we read in the media about positivity and health is false or at least exaggerated.

Jane Brody starts her article in The New York Times by describing how most mornings she is “grinning from ear to ear, uplifted not just by my own workout but even more so” by her interaction with toddlers on the way home from where she swims. When I read Brody’s “Turning Negative Thinkers Into Positive Ones.” I was not left grinning ear to ear. I was left profoundly bummed.

I thought real hard about what was so unsettling about Brody’s article. I now have some clarity.

I don’t mind suffering even pathologically cheerful people in the morning. But I do get bothered when they serve up pseudoscience as the real thing.

I had expected to be served up Brody’s usual recipe of positive psychology pseudoscience concocted  to coerce readers into heeding her Barnum advice about how they should lead their lives. “Smile or die!” Apologies to my friend Barbara Ehrenreich for my putting the retitling of her book outside of North America to use here. I invoke the phrase because Jane Brody makes the case that unless we do what she says, we risk hurting our health and shortening our lives. So we better listen up.

What bummed me most this time was that Brody was drawing on the pseudoscience of Barbara Fredrickson that my colleagues and I have worked so hard to debunk. We took the trouble of obtaining data sets for two of her key papers for reanalysis. We were dismayed by the quality of the data. To start with, we uncovered carelessness at the level of data entry that undermined her claims. But her basic analyses and interpretations did not hold up either.

Fredrickson publishes exaggerated claims about dramatic benefits of simple positive psychology exercises. Fredrickson is very effective in blocking or muting the publication of criticism and getting on with hawking her wares. My colleagues and I have talked to others who similarly met considerable resistance from editors in getting detailed critiques and re-analyses published. Fredrickson is also aided by uncritical people like Jane Brody to promote her weak and inconsistent evidence as strong stuff. It sells a lot of positive psychology merchandise to needy and vulnerable people, like self-help books and workshops.

If it is taken seriously, Fredrickson’s research concerns health effects of behavioral intervention. Yet, her findings are presented in a way that does not readily allow their integration with the rest of health psychology literature. It would be difficult, for instance, to integrate Fredrickson’s randomized trials of loving-kindness meditation with other research because she makes it almost impossible to isolate effect sizes in a way that they could be integrated with other studies in a meta-analysis. Moreover, Fredrickson has multiply published contradictory claims from the sae data set without acknowledging the duplicate publication. [Please read on. I will document all of these claims before the post ends.]

The need of self-help gurus to generate support for their dramatic claims in lucrative positive psychology self-help products is never acknowledged as a conflict of interest.  It should be.

Just imagine, if someone had a contract based on a book prospectus promising that the claims of their last pop psychology book would be surpassed. Such books inevitably paint life too simply, with simple changes in behavior having profound and lasting effects unlike anything obtained in the randomized trials of clinical and health psychology. Readers ought to be informed that these pressures to meet demands of a lucrative book contract could generate a strong confirmation bias. Caveat emptor auditor, but how about at least informing readers and let them decide whether following the money influences their interpretation of what they read?

Psychology journals almost never require disclosures of conflicts of interest of this nature. I am campaigning to make that practice routine, nondisclosure of such financial benefits tantamount to scientific misconduct. I am calling for readers to take to social media when these disclosures do not appear in scientific journals where they should be featured prominently. And holding editors responsible for non-enforcement . I can cite Fredrickson’s work as a case in point, but there are many other examples, inside and outside of positive psychology.

Back to Jane Brody’s exaggerated claims for Fredrickson’s work.

I lived for half a century with a man who suffered from periodic bouts of depression, so I understand how challenging negativism can be. I wish I had known years ago about the work Barbara Fredrickson, a psychologist at the University of North Carolina, has done on fostering positive emotions, in particular her theory that accumulating “micro-moments of positivity,” like my daily interaction with children, can, over time, result in greater overall well-being.

The research that Dr. Fredrickson and others have done demonstrates that the extent to which we can generate positive emotions from even everyday activities can determine who flourishes and who doesn’t. More than a sudden bonanza of good fortune, repeated brief moments of positive feelings can provide a buffer against stress and depression and foster both physical and mental health, their studies show.

“Research…demonstrates” (?). Brody is feeding stupid-making pablum to readers. Fredrickson’s kind of research may produce evidence one way or the other, but it is too strong a claim, an outright illusion, to even begin suggesting that it “demonstrates” (proves) what follows in this passage.

Where, outside of tabloids and self-help products, do the immodest claims that one or a few poor quality studies “demonstrate”?

Negative feelings activate a region of the brain called the amygdala, which is involved in processing fear and anxiety and other emotions. Dr. Richard J. Davidson, a neuroscientist and founder of the Center for Healthy Minds at the University of Wisconsin — Madison, has shown that people in whom the amygdala recovers slowly from a threat are at greater risk for a variety of health problems than those in whom it recovers quickly.

Both he and Dr. Fredrickson and their colleagues have demonstrated that the brain is “plastic,” or capable of generating new cells and pathways, and it is possible to train the circuitry in the brain to promote more positive responses. That is, a person can learn to be more positive by practicing certain skills that foster positivity.

We are knee deep in neuro-nonsense. Try asking a serious neuroscientists about the claims that this duo have “demonstrated that the brain is ‘plastic,’ or that practicing certain positivity skills change the brain with the health benefits that they claim via Brody. Or that they are studying ‘amygdala recovery’ associated with reduced health risk.

For example, Dr. Fredrickson’s team found that six weeks of training in a form of meditation focused on compassion and kindness resulted in an increase in positive emotions and social connectedness and improved function of one of the main nerves that helps to control heart rate. The result is a more variable heart rate that, she said in an interview, is associated with objective health benefits like better control of blood glucose, less inflammation and faster recovery from a heart attack.

I will dissect this key claim about loving-kindness meditation and vagal tone/heart rate variability shortly.

Dr. Davidson’s team showed that as little as two weeks’ training in compassion and kindness meditation generated changes in brain circuitry linked to an increase in positive social behaviors like generosity.

We will save discussing Richard Davidson for another time. But really, Jane, just two weeks to better health? Where is the generosity center in brain circuitry? I dare you to ask a serious neuroscientist and embarrass yourself.

“The results suggest that taking time to learn the skills to self-generate positive emotions can help us become healthier, more social, more resilient versions of ourselves,” Dr. Fredrickson reported in the National Institutes of Health monthly newsletter in 2015.

In other words, Dr. Davidson said, “well-being can be considered a life skill. If you practice, you can actually get better at it.” By learning and regularly practicing skills that promote positive emotions, you can become a happier and healthier person. Thus, there is hope for people like my friend’s parents should they choose to take steps to develop and reinforce positivity.

In her newest book, “Love 2.0,” Dr. Fredrickson reports that “shared positivity — having two people caught up in the same emotion — may have even a greater impact on health than something positive experienced by oneself.” Consider watching a funny play or movie or TV show with a friend of similar tastes, or sharing good news, a joke or amusing incidents with others. Dr. Fredrickson also teaches “loving-kindness meditation” focused on directing good-hearted wishes to others. This can result in people “feeling more in tune with other people at the end of the day,” she said.

Brody ends with 8 things Fredrickson and others endorse to foster positive emotions. (Why only 8 recommendations, why not come up with 10 and make them commandments?) These include “Do good things for other people” and “Appreciate the world around you. Okay, but do Fredrickson and Davidson really show that engaging in these activities have immediate and dramatic effects on our health? I have examined their research and I doubt it. I think the larger problem, though, is the suggestion that physically ill people facing shortened lives risk being blamed for being bad people. They obviously did not do these 8 things or else they would be healthy.

If Brody were selling herbal supplements or coffee enemas, we would readily label the quackery. We should do the same for advice about psychological practices that are promised to transform lives.

Brody’s sloppy links to support her claims: Love 2.0

Journalists who talk of “science”  and respect their readers will provide links to their actual sources in the peer-reviewed scientific literature. That way, readers who are motivated can independently review the evidence. Especially in an outlet as prestigious as The New York Times.

Jane Brody is outright promiscuous in the links that she provides, often secondary or tertiary sources. The first link provide for her discussion of Fredrickson’s Love 2.0 is actually to a somewhat negative review of the book. https://www.scientificamerican.com/article/mind-reviews-love-how-emotion-afftects-everything-we-feel/

Fredrickson builds her case by expanding on research that shows how sharing a strong bond with another person alters our brain chemistry. She describes a study in which best friends’ brains nearly synchronize when exchanging stories, even to the point where the listener can anticipate what the storyteller will say next. Fredrickson takes the findings a step further, concluding that having positive feelings toward someone, even a stranger, can elicit similar neural bonding.

This leap, however, is not supported by the study and fails to bolster her argument. In fact, most of the evidence she uses to support her theory of love falls flat. She leans heavily on subjective reports of people who feel more connected with others after engaging in mental exercises such as meditation, rather than on more objective studies that measure brain activity associated with love.

I would go even further than the reviewer. Fredrickson builds her case by very selectively drawing on the literature, choosing only a few studies that fit.  Even then, the studies fit only with considerable exaggeration and distortion of their findings. She exaggerates the relevance and strength of her own findings. In other cases, she says things that have no basis in anyone’s research.

I came across Love 2.0: How Our Supreme Emotion Affects Everything We Feel, Think, Do, and Become (Unabridged) that sells for $17.95. The product description reads:

We all know love matters, but in this groundbreaking book positive emotions expert Barbara Fredrickson shows us how much. Even more than happiness and optimism, love holds the key to improving our mental and physical health as well as lengthening our lives. Using research from her own lab, Fredrickson redefines love not as a stable behemoth, but as micro-moments of connection between people – even strangers. She demonstrates that our capacity for experiencing love can be measured and strengthened in ways that improve our health and longevity. Finally, she introduces us to informal and formal practices to unlock love in our lives, generate compassion, and even self-soothe. Rare in its scope and ambitious in its message, Love 2.0 will reinvent how you look at and experience our most powerful emotion.

There is a mishmash of language games going on here. Fredrickson’s redefinition of love is not based on her research. Her claim that love is ‘really’ micro-moments of connection between people  – even strangers is a weird re-definition. Attempt to read her book, if you have time to waste.

You will quickly see that much of what she says makes no sense in long-term relationships which is solid but beyond the honeymoon stage. Ask partners in long tem relationships and they will undoubtedly lack lots of such “micro-moments of connection”. I doubt that is adaptive for people seeking to build long term relationships to have the yardstick that if lots of such micro-moments don’t keep coming all the time, the relationship is in trouble. But it is Fredrickson who is selling the strong claims and the burden is on her to produce the evidence.

If you try to take Fredrickson’s work seriously, you wind up seeing she has a rather superficial view of a close relationships and can’t seem to distinguish them from what goes on between strangers in drunken one-night stands. But that is supposed to be revolutionary science.

We should not confuse much of what Fredrickson emphatically states with testable hypotheses. Many statements sound more like marketing slogans – what Joachim Kruger and his student Thomas Mairunteregger identify as the McDonaldalization of positive psychology. Like a Big Mac, Fredrickson’s Love 2.0 requires a lot of imagination to live up to its advertisement.

Fredrickson’s love the supreme emotion vs ‘Trane’s Love Supreme

Where Fredrickson’s selling of love as the supreme emotion is not simply an advertising slogan, it is a bad summary of the research on love and health. John Coltrane makes no empirical claim about love being supreme. But listening to him is an effective self-soothing after taking Love 2.0 seriously and trying to figure it out.  Simply enjoy and don’t worry about what it does for your positivity ratio or micro-moments, shared or alone.

Fredrickson’s study of loving-kindness meditation

Jane Brody, like Fredrickson herself depends heavily on a study of loving kindness meditation in proclaiming the wondrous, transformative health benefits of being loving and kind. After obtaining Fredrickson’s data set and reanalyzing it, my colleagues – James Heathers, Nick Brown, and Harrison Friedman – and I arrived at a very different interpretation of her study. As we first encountered it, the study was:

Kok, B. E., Coffey, K. A., Cohn, M. A., Catalino, L. I., Vacharkulksemsuk, T., Algoe, S. B., . . . Fredrickson, B. L. (2013). How positive emotions build physical health: Perceived positive social connections account for the upward spiral between positive emotions and vagal tone. Psychological Science, 24, 1123-1132.

Consolidated standards for reporting randomized trials (CONSORT) are widely accepted for at least two reasons. First, clinical trials should be clearly identified as such in order to ensure that the results are a recognized and available in systematic searches to be integrated with other studies. CONSORT requires that RCTs be clearly identified in the titles and abstracts. Once RCTs are labeled as such, the CONSORT checklist becomes a handy tallying of what needs to be reported.

It is only in supplementary material that the Kok and Fredrickson paper is identify as a clinical trial. Only in that supplement is the primary outcome is identified, even in passing. No means are reported anywhere in the paper or supplement. Results are presented in terms of what Kok and Fredrickson term “a variant of a mediational, parallel process, latent-curve model.” Basic statistics needed for its evaluation are left to readers’ imagination. Figure 1 in the article depicts the awe-inspiring parallel-process mediational model that guided the analyses. We showed the figure to a number of statistical experts including Andrew Gelman. While some elements were readily recognizable, the overall figure was not, especially the mysterious large dot (a causal pathway roundabout?) near the top.

So, not only might study not be detected as an RCT, there isn’t relevant information that could be used for calculating effect sizes.

Furthermore, if studies are labeled as RCTs, we immediately seek protocols published ahead of time that specify the basic elements of design and analyses and primary outcomes. At Psychological Science, studies with protocols are unusual enough to get the authors awarded a badge. In the clinical and health psychology literature, protocols are increasingly common, like flushing a toilet after using a public restroom. No one runs up and thanks you, “Thank you for flushing/publishing your protocol.”

If Fredrickson and her colleagues are going to be using the study to make claims about the health benefits of loving kindness meditation, they have a responsibility to adhere to CONSORT and to publish their protocol. This is particularly the case because this research was federally funded and results need to be transparently reported for use by a full range of stakeholders who paid for the research.

We identified a number of other problems and submitted a manuscript based on a reanalysis of the data. Our manuscript was promptly rejected by Psychological Science. The associate editor . Batja Mesquita noted that two of my co-authors, Nick Brown and Harris Friedman had co-authored a paper resulting in a partial retraction of Fredrickson’s, positivity ratio paper.

Brown NJ, Sokal AD, Friedman HL. The Complex Dynamics of Wishful Thinking: The Critical Positivity Ratio American Psychologist. 2013 Jul 15.

I won’t go into the details, except to say that Nick and Harris along with Alan Sokal unambiguously established that Fredrickson’s positivity ratio of 2.9013 positive to negative experiences was a fake fact. Fredrickson had been promoting the number  as an “evidence-based guideline” of a ratio acting as a “tipping point beyond which the full impact of positive emotions becomes unleashed.” Once Brown and his co-authors overcame strong resistance to getting their critique published, their paper garnered a lot of attention in social and conventional media. There is a hilariously funny account available at Nick Brown Smelled Bull.

Batja Mesquita argued that that the previously published critique discouraged her from accepting our manuscript. To do, she would be participating in “a witch hunt” and

 The combatant tone of the letter of appeal does not re-assure me that a revised commentary would be useful.

Welcome to one-sided tone policing. We appealed her decision, but Editor Eric Eich indicated, there was no appeal process at Psychological Science, contrary to the requirements of the Committee on Publication Ethics, COPE.

Eich relented after I shared an email to my coauthors in which I threatened to take the whole issue into social media where there would be no peer-review in the traditional outdated sense of the term. Numerous revisions of the manuscript were submitted, some of them in response to reviews by Fredrickson  and Kok who did not want a paper published. A year passed occurred before our paper was accepted and appeared on the website of the journal. You can read our paper here. I think you can see that fatal problems are obvious.

Heathers JA, Brown NJ, Coyne JC, Friedman HL. The elusory upward spiral a reanalysis of Kok et al.(2013). Psychological Science. 2015 May 29:0956797615572908.

In addition to the original paper not adhering to CONSORT, we noted

  1. There was no effect of whether participants were assigned to the loving kindness mediation vs. no-treatment control group on the key physiological variable, cardiac vagal tone. This is a thoroughly disguised null trial.
  2. Kok and Frederickson claimed that there was an effect of meditation on cardiac vagal tone, but any appearance of an effect was due to reduced vagal tone in the control group, which cannot readily be explained.
  3. Kok and Frederickson essentially interpreted changes in cardiac vagal tone as a surrogate outcome for more general changes in physical health. However, other researchers have noted that observed changes in cardiac vagal tone are not consistently related to changes in other health variables and are susceptible to variations in experimental conditions that have nothing to do with health.
  4. No attention was given to whether participants assigned to the loving kindness meditation actually practiced it with any frequency or fidelity. The article nonetheless reported that such data had been collected.

Point 2 is worth elaborating. Participants in the control condition received no intervention. Their assessment of cardiac vagal tone/heart rate variability was essentially a test/retest reliability test of what should have been a stable physiological characteristic. Yet, participants assigned to this no-treatment condition showed as much change as the participants who were assigned to meditation, but in the opposite direction. Kok and Fredrickson ignored this and attributed all differences to meditation. Houston, we have a problem, a big one, with unreliability of measurement in this study.

We could not squeeze all of our critique into our word limit, but James Heathers, who is an expert on cardiac vagal tone/heart rate variability elaborated elsewhere.

  • The study was underpowered from the outset, but sample size decreased from 65 to 52 to missing data.
  • Cardiac vagal tone is unreliable except in the context of carefully control of the conditions in which measurements are obtained, multiple measurements on each participant, and a much larger sample size. None of these conditions were met.
  • There were numerous anomalies in the data, including some participants included without baseline data, improbable baseline or follow up scores, and improbable changes. These alone would invalidate the results.
  • Despite not reporting  basic statistics, the article was full of graphs, impressive to the unimformed, but useless to readers attempting to make sense of what was done and with what results.

We later learned that the same data had been used for another published paper. There was no cross-citation and the duplicate publication was difficult to detect.

Kok, B. E., & Fredrickson, B. L. (2010). Upward spirals of the heart: Autonomic flexibility, as indexed by vagal tone, reciprocally and prospectively predicts positive emotions and social connectedness. Biological Psychology, 85, 432–436. doi:10.1016/j.biopsycho.2010.09.005

Pity the poor systematic reviewer and meta analyst trying to make sense of this RCT and integrate it with the rest of the literature concerning loving-kindness meditation.

This was not our only experience obtained data for a paper crucial to Fredrickson’s claims and having difficulty publishing  our findings. We obtained data for claims that she and her colleagues had solved the classical philosophical problem of whether we should pursue pleasure or meaning in our lives. Pursuing pleasure, they argue, will adversely affect genomic transcription.

We found we could redo extremely complicated analyses and replicate original findings but there were errors in the the original entering data that entirely shifted the results when corrected. Furthermore, we could replicate the original findings when we substituted data from a random number generator for the data collected from study participants. After similar struggles to what we experienced with Psychological Science, we succeeded in getting our critique published.

The original paper

Fredrickson BL, Grewen KM, Coffey KA, Algoe SB, Firestine AM, Arevalo JM, Ma J, Cole SW. A functional genomic perspective on human well-being. Proceedings of the National Academy of Sciences. 2013 Aug 13;110(33):13684-9.

Our critique

Brown NJ, MacDonald DA, Samanta MP, Friedman HL, Coyne JC. A critical reanalysis of the relationship between genomics and well-being. Proceedings of the National Academy of Sciences. 2014 Sep 2;111(35):12705-9.

See also:

Nickerson CA. No Evidence for Differential Relations of Hedonic Well-Being and Eudaimonic Well-Being to Gene Expression: A Comment on Statistical Problems in Fredrickson et al.(2013). Collabra: Psychology. 2017 Apr 11;3(1).

A partial account of the reanalysis is available in:

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure. PLOS Blogs Mind the Brain

Wrapping it up

Strong claims about health effects require strong evidence.

  • Evidence produced in randomized trials need to be reported according to established conventions like CONSORT and clear labeling of duplicate publications.
  • When research is conducted with public funds, these responsibilities are increased.

I have often identified health claims in high profile media like The New York Times and The Guardian. My MO has been to trace the claims back to the original sources in peer reviewed publications, and evaluate both the media reports and the quality of the primary sources.

I hope that I am arming citizen scientists for engaging in these activities independent of me and even to arrive at contradictory appraisals to what I offer.

  • I don’t think I can expect to get many people to ask for data and perform independent analyses and certainly not to overcome the barriers my colleagues and I have met in trying to publish our results. I share my account of some of those frustrations as a warning.
  • I still think I can offer some take away messages to citizen scientists interested in getting better quality, evidence-based information on the internet.
  • Assume most of the claims readers encounter about psychological states and behavior being simply changed and profoundly influencing physical health are false or exaggerated. When in doubt, disregard the claims and certainly don’t retweet or “like” them.
  • Ignore journalists who do not provide adequate links for their claims.
  • Learn to identify generally reliable sources and take journalists off the list when they have made extravagant or undocumented claims.
  • Appreciate the financial gains to be made by scientists who feed journalists false or exaggerated claims.

Advice to citizen scientists who are cultivating more advanced skills:

Some key studies that Brody invokes in support of her claims being science-based are poorly conducted and reported clinical trials that are not labeled as such. This is quite common in positive psychology, but you need to cultivate skills to even detect that is what is going on. Even prestigious psychology journals are often lax in labeling studies as RCTs and in enforcing reporting standards. Authors’ conflicts of interest are ignored.

It is up to you to

  • Identify when the claims you are being fed should have been evaluated in a clinical trial.
  • Be skeptical when the original research is not clearly identified as clinical trial but nonetheless compares participants who received the intervention and those who did not.
  • Be skeptical when CONSORT is not followed and there is no published protocol.
  • Be skeptical of papers published in journals that do not enforce these requirements.

Disclaimer

I think I have provided enough details for readers to decide for themselves whether I am unduly influenced by my experiences with Barbara Fredrickson and her data. She and her colleagues have differing accounts of her research and of the events I have described in this blog.

As a disclosure, I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

The Prescription Pain Pill Epidemic: A Conversation with Dr. Anna Lembke

back-pain-in-seniors-helped-with-mindfulness-300x200manypills
My colleague, Dr. Anna Lembke is the Program Director for the Stanford University Addiction Medicine Fellowship, and Chief of the Stanford Addiction Medicine Dual Diagnosis Clinic. She is the author of a newly released book on the prescription pain pill epidemic: “Drug Dealer, MD: How Doctors Were Duped, Patients Got Hooked, and Why It’s So Hard to Stop” (Johns Hopkins University Press, October 2016).

I spoke with her recently about the scope of this public health tragedy, how we got here and what we need to do about it.

Dr. Jain: About 15-20 years ago American medicine underwent a radical cultural shift in its attitude towards pain, a shift that ultimately culminated in a public health tragedy. Can you comment on factors that contributed to that shift occurring in the first place?
Dr. Lembke: Sure. So the first thing that happened (and it was really more like the early 1980’s when this shift occurred) was that there were more people with daily pain. Overall, our population is getting healthier, but we also have more people with more pain conditions. No one really knows exactly the reason for that, but it probably involves people living longer with chronic illnesses, and more people getting surgical interventions for all types of condition. Any time you cut into the body, you cut across the nerves and you create the potential for some kind of neuropathic pain problem.
The other thing that happened in the 1980’s was the beginning of the hospice movement. This movement helped people at the very end of life (the last month to weeks to days of their lives) to transition to death in a more humane and peaceful way. There was growing recognition that we weren’t doing enough for people at the end of life. As part of this movement, many doctors began advocating for using opioids more liberally at the end of life.
There was also a broader cultural shift regarding the meaning of pain. Prior to 1900 people viewed pain as having positive value: “what does not kill you makes you stronger” or “after darkness comes the dawn”. There were spiritual and biblical connotations and positive meaning in enduring suffering. What arose, through the 20th century, was this idea that pain is actually something that you need to avoid because pain itself can lead to a psychic scar that contributes to future pain. Today, not only is pain painful, but pain begets future pain. By the 1990’s, pain was viewed as a very bad thing and something that had to be eliminated at all cost.
Growing numbers of people experiencing chronic pain, the influence of the hospice movement, and a shifting paradigm about the meaning and consequences of experiencing pain, led to increased pressures within medicine for doctors to prescribe more opioids. This shift was a departure from prior practice, when doctors were loathe to prescribe opioids, for fear of creating addiction, except in cases of severe trauma, cases involving surgery, or cases of the very end of life.
Dr. Jain: The American Pain Society had introduced “pain as the 5th vital sign,” a term which suggested physicians, who were not taking their patients’ pain seriously, were being neglectful. What are your thoughts about this term?
Dr. Lembke: “Pain is the 5th vital sign” is a slogan. It’s kind of an advertising campaign. We use slogans all the time in medicine, many times to good effect, to raise awareness both inside and outside the profession about a variety of medical issues. The reason that “pain is the 5th vital sign” went awry, however, has to do with the ways in which professional medical societies, like the American Pain Society, and so-called “academic thought leaders”, began to collaborate and cooperate with the pharmaceutical industry. That’s where “pain is the 5th vital sign” went from being an awareness campaign to being a brand for a product, namely prescription opioids.
So the good intentions in the early 1980’s turned into something really quite nefarious when it came to the way that we started treating patients. To really understand what happened, you have to understand the ways in which the pharmaceutical industry, particularly the makers of opioid analgesics, covertly collaborated with various institutions within what I’ll call Big Medicine, in order to promote opioid prescribing.
Dr. Jain: So by Big Medicine what do you mean?
Dr. Lembke: I mean the Federation of State Medical Boards, The Joint Commission (JACHO), pain societies, academic thought leaders, and the Food and Drug Administration (FDA). These are the leading organizations within medicine whose job it is to guide and regulate medicine. None of these are pharmaceutical companies per se, but what happened around opioid pain pills was that Big Pharma infiltrated these various organizations in order to use false evidence to encourage physicians to prescribe more opioids. They used a Trojan Horse approach.. They didn’t come out and say we want you to prescribe more opioids because we’re Big Pharma and we want to make more money, instead what they said was we want you to prescribe more opioids because that’s what the scientific evidence supports.
The story of how they did that is really fascinating. Let’s take The Joint Commission (JACHO) as an example. In 1996, when oxycontin was introduced to the market, JACHO launched a nationwide pain management educational program where they sold educational materials to hospitals, which they acquired for free from Purdue Pharma. These materials included statements which we now know to be patently false. JACHO sold the Purdue Pharma videos and literature on pain to hospitals.
These educational materials perpetuated four myths about opioid prescribing. The first myth was that opioids work for chronic pain. We have no evidence to support that. The second was that no dose is too high. So if your patient responds to opioids initially and then develops tolerance, just keep going up. And that’s how we got patients on astronomical amounts of opioids. The third myth was about pseudo addiction. If you have a patient who appears to be demonstrating drug seeking behavior, they’re not addicted. They just need more pain meds. The fourth and most insidious myth was that there is a halo effect when opioids are prescribed by a doctor, that is, they’re not addictive as long as they’re being used to treat pain.
So getting back to JACHO, not only did they use material propagating myths about the use of opioids to treat pain, but they also did something that was very insidious and, ultimately, very bad for patients. They made pain a “quality measure”. By The Joint Commission’s own definition of a quality measure, it must be something that you can count. So what they did was they created this visual analog scale, also known as the “pain scale”. The scale consists of numbers from one to ten describing pain, with sad and happy faces to match. JAHCO told doctors they needed to use this pain scale in order to assess a patients’ pain. What we know today is that this pain scale has not led to improved treatment or functional outcomes for patients with pain. The only thing that it has been correlated with is increased opioid prescribing.
This sort of stealth maneuver by Big Pharma to use false evidence or pseudo-science to infiltrate academic medicine, regulatory agencies, and academic societies in order to promote more opioid prescribing: that’s an enduring theme throughout any analysis of this epidemic.
Dr. Jain: Can you comment specifically on the breadth and depth of the opioid epidemic in the US? What were the key factors involved?
Dr. Lembke: Drug overdose is now the leading cause of accidental death in this country, exceeding death due to motor vehicle accidents or firearms. Driving this statistic is opioid deaths and driving opioid deaths is opioid pain prescription deaths, which in turn correlates with excessive opioid prescribing. There are more than 16,000 deaths per year due to prescription opioid overdoses.
What’s really important to understand is that an opioid overdose is not a suicide attempt. The vast majority of these people are not trying to kill themselves, and many of them are not even taking the medication in excess. They’re often taking it as prescribed, but over time are developing a low grade hypoxia. They may get a minor cold, let’s say a pneumonia, then they’ll take the pills and they’ll fall asleep and won’t wake up again because their tolerance to the euphorigenic and pain effects of the opioids is very robust, but their tolerance to the respiratory suppressant effect doesn’t keep pace with that. You can feel like you need to take more in order to eliminate the pain, but at the same time the opioid is suppressing your respiratory drive, so you eventually become hypoxemic and can’t breathe anymore and just fall into a gradual sleep that way.
There are more than two million people today who are addicted to prescription opioids. So not only is there this horrible risk of accidental death, but there’s obviously the risk of addiction. We also have heroin overdose deaths and heroin addiction on the rise, most likely on the coattails of the prescription opioids epidemic, driven largely by young people who don’t have reservations about switching from pills to heroin..
Dr. Jain: I was curious about meds like oxycontin, vicodin, and percocet. Are they somehow more addictive than other opioid pills?
Dr. Lembke: All opioids are addictive, especially if you’re dealing with an opioid naive person. But it is certainly true that some of the opioids are more addictive than others because of pharmacology. Let’s consider oxycontin. The major ingredient in oxycontin is oxycodone. Oxycodone is a very potent synthetic opioid. When Purdue formulated it into oxycontin, what they wanted to create was a twice daily pain medication for cancer patients. So they put this hard shell on a huge 12 hours worth of oxycodone. That hard shell was intended to release oxycodone slowly over the course of the day. But what people discovered is that if they chewed the oxycontin and broke that hard shell, then they got a whole day’s worth of very potent oxycodone at once. With that came the typical rush that people who are addicted to opioids describe, as well as this long and powerful and sustained high. So that is why oxycontin was really at the center of the prescription opioid epidemic. It basically was more addictive because of the quantity and potency once that hard shell was cracked.
Dr. Jain: So has the epidemic plateaued? And if so, why?
Dr. Lembke: The last year for which we have CDC data is 2014, when there were more prescription opioid-related deaths, and more opioid prescriptions written by doctors, than in any year prior. This is remarkable when you think that by 2014, there was already wide-spread awareness of the problem. Yet doctors were not changing their prescribing habits, and patients were dying in record numbers.
I’m really looking forward to the next round of CDC data to come out and tell us what 2015 looked like. I do not believe we have reached the end or even the waning days of this epidemic. Doctors continue to write over 250 million opioid prescriptions annually, a mere fraction of what was written three decades ago.
Also, the millions of people who have been taking opioids for years are not easily weaned from opioids.. They now have neuroadaptive changes in their brains which are very hard to undo. I can tell you from clinical experience that even when I see patients motivated to get off of their prescription opioids, it can take weeks, months, and even years to make that happen.
So I don’t think that the epidemic has plateaued, and this is one of the major points that I try to make in my book. The prescription drug epidemic is the canary in the coal mine. It speaks to deeper problems within medicine. Doctors get reimbursed for prescribing a pill or doing a procedure, but not for talking to our patients and educating them. That’s a problem. The turmoil in the insurance system we can’t even establish long term relationships with our patients. So as a proxy for real healing and attachment, we prescribe opioids. ! Those kinds of endemic issues within medicine have not changed, and until they do, I believe this prescription drug problem will continue unabated.

No, JAMA Internal Medicine, acupuncture should not be considered an option for preventing migraines.

….And no further research is needed.

These 3 excellent articles provide some background for my blog, but their titles alone are worth leading with:

Acupuncture is astrology practice with needles.

Acupuncture: 3000 studies and more research is not needed.

Acupuncture is  theatrical placebo.

Each of these articles helps highlights an important distinction between an evidence-based medicine versus science based medicine perspective on acupuncture that will be discussed here.

A recent article in the prestigious JAMA Internal Medicine concluded:

“Acupuncture should be considered as one option for migraine prophylaxis in light of our findings.”

The currently freely accessible article can be found here.

A pay-walled editorial from Dr.Amy Gefland can be found here.

The trial was registered long after patient recruitment had started and the trial protocol can be found here 

[Aside: What is the value of registering a trial long after recruitment commenced? Do journal articles have a responsibility to acknowledge a link they publish for trial registration is for what occurred after the trial commenced? Is trial registration another ritual like acupuncture?]

Uncritical reports of the results of the trial as interpreted by the authors echoed through both the lay and physician-aimed media.

news coverage

Coverage by Reuters was somewhat more interesting than the rest. The trial authors’ claim that acupuncture for preventing migraines was ready for prime time was paired with some reservations expressed in the accompanying editorial.

reuters coverage

“Placebo response is strong in migraine treatment studies, and it is possible that the Deqi sensation . . . that was elicited in the true acupuncture group could have led to a higher degree of placebo response because there was no attempt made to elicit the Deqi sensation in the sham acupuncture group,” Dr. Amy Gelfand writes in an accompanying editorial.

gelfand_amy_antmanCome on, Dr. Gelfand, if you checked the article, you would have that Deqi is not measured. If you checked the literature, even proponents concede that Deqi remains a vague, highly subjective judgment in, this case, being made by an unblinded acupuncturist. Basically the acupuncturist persisted in whatever was being done until there was indication that a sensation of soreness, numbness, distention, or radiating seemed to be elicited from the patient. What part of a subjective response to acupuncture, with or without Deqi, would you consider NOT a placebo response?

 Dr. Gelfand  also revealed some reasons why she may bother to write an editorial for a treatment with an incoherent and implausible nonscientific rationale.

“When I’m a researcher, placebo response is kind of a troublesome thing, because it makes it difficult to separate signal from noise,” she said. But when she’s thinking as a doctor about the patient in front of her, placebo response is welcome, Gelfand said.

“You know, what I really want is my patient to feel better, and to be improved and not be in pain. So, as long as something is safe, even if it’s working through a placebo mechanism, it may still be something that some patients might want to use,” she said.

Let’s contemplate the implications of this. This editorial in JAMA Internal Medicine accompanies an article in which the trial author suggests acupuncture is ready to become a standard treatment for migraine. There is nothing in the article to which suggests that the unscientific basis of acupuncture had been addressed, only that it might have achieved a placebo response. Is Dr. Gelfand suggesting that would be sufficient, although there are some problems in the trial. What if that became the standard for recommending medications and medical procedures?

With increasing success in getting acupuncture and other now-called “integrative medicine” approaches ensconced in cancer centers and reimbursed by insurance, will be facing again and again some of the issues that started this blog post. Is acupuncture not doing obvious from a reason for reimbursing it? Trials like this one can be cited in support for reimbursement.

The JAMA: Internal Medicine report of an RCT of acupuncture for preventing migraines

Participants were randomly assigned to one of three groups: true acupuncture, sham acupuncture, or a waiting-list control group.

Participants in the true acupuncture and sham acupuncture groups received treatment 5 days per week for 4 weeks for a total of 20 sessions.

Participants in the waiting-list group did not receive acupuncture but were informed that 20 sessions of acupuncture would be provided free of charge at the end of the trial.

As the editorial comment noted, this is incredibly intensive treatment that burdens patients coming in five days a week for treatment for four weeks. Yet the effects were quite modest in terms of number of migraine attacks, even if statistically significant:

The mean (SD) change in frequency of migraine attacks differed significantly among the 3 groups at 16 weeks after randomization (P < .001); the mean (SD) frequency of attacks decreased in the true acupuncture group by 3.2 (2.1), in the sham acupuncture group by 2.1 (2.5), and the waiting-list group by 1.4 (2.5); a greater reduction was observed in the true acupuncture than in the sham acupuncture group (difference of 1.1 attacks; 95%CI, 0.4-1.9; P = .002) and in the true acupuncture vs waiting-list group (difference of 1.8 attacks; 95%CI, 1.1-2.5; P < .001). Sham acupuncture was not statistically different from the waiting-list group (difference of 0.7 attacks; 95%CI, −0.1 to 1.4; P = .07).

There were no group by time differences in use of medication for migraine. Receiving “true” versus sham acupuncture did not matter.

Four acupoints were used per treatment. All patients received acupuncture on 2 obligatory points, including GB20 and GB8. The 2 other points were chosen according to the syndrome differentiation of meridians in the headache region. The potential acupoints included SJ5, GB34, BL60, SI3, LI4, ST44, LR3, and GB40.20. The use of additional acupoints other than the prescribed ones was not allowed.We chose the prescriptions as a result of a systematic review of ancient and modern literature,22,23 consensus meetings with clinical experts, and experience from our previous study.

Note that the “headache region” is not the region of the head where headaches occur, the selection of which there is no scientific basis. Since when does such a stir fry of ancient and contemporary wisdom, consensus meetings with experts, and the clinical experience of the investigators become the basis of the mechanism justified for study in a clinical trial published in a prestigious American medical journal?

What was sham about the sham acupuncture (SA) treatment?

The number of needles, electric stimulation, and duration of treatment in the SA group were identical in the TA group except that an attempt was not made to induce the Deqi sensation. Four nonpoints were chosen according to our previous studies.

From the trial protocol, we learn that the effort to induce the Deqi sensation involves the acupuncturist twirling and rotating the needles.

In a manner that can easily escape notice, the authors indicate that they acupuncture was administered by electro stimulation.

In the methods section, they abruptly state:

Electrostimulation generates an analgesic effect, as manual acupuncture does.21

I wonder if the reviewers or the editorialist checked this reference. It is to an article that provides the insight that “meridians” -the 365 designated acupuncture points- are identified on a particular patient by

feeling for 12 organ-specific pulses located on the wrists and with cosmological interpretations including a representation of five elements: wood, water, metal, earth, and fire.

The authors further state that they undertook a program of research to counter the perception in the United States in the 1970s that acupuncture was quackery and even “Oriental hypnosis.” Their article describes some of the experiments they conducted, including one in which the benefits of a rabbit having received finger-pressure acupuncture was transferred to another via a transfusion of cerebrospinal fluid.

In discussing the results of the present study in JAMA Internal Medicine, the authors again comment in passing:

We added electrostimulation to manual acupuncture because manual acupuncture requires more time until it reaches a similar analgesic effect as electrical stimulation.27 Previous studies have reported that electrostimulation is better than manual acupuncture in relieving pain27-30 and could induce a longer lasting effect.28

The citations are to methodologically poor laboratory studies in which dramatic results are often obtained with very small cell size (n= 10).

Can we dispense with the myth that the acupuncture provided in this study is an extension of traditional Chinese needle therapy?

It is high time that we dispense with the notion that acupuncture applied to migraines and other ailments represents a traditional Chinese medicine that is therefore not subject to any effort to critique its plausibility and status as a science-based treatment. If we dispense with that idea, we still have to  confront how unscientific and nonsensical the rationale is for the highly ritualized treatment provided in this study.

An excellent article by Ben Kavoussi offers a carefully documented debunking of:

 reformed and “sanitized” acupuncture and the makeshift theoretical framework of Maoist China that have flourished in the West as “Traditional,” “Chinese,” “Oriental,” and most recently as “Asian” medicine.

Kavoussi, who studied to become an acupuncturist, notes that:

Traditional theories for selecting points and means of stimulation are not based on an empirical rationale, but on ancient cosmology, astrology and mythology. These theories significantly resemble those that underlined European and Islamic astrological medicine and bloodletting in the Middle-Ages. In addition, the alleged predominance of acupuncture amongst the scholarly medical traditions of China is not supported by evidence, given that for most of China’s long medical history, needling, bloodletting and cautery were largely practiced by itinerant and illiterate folk-healers, and frowned upon by the learned physicians who favored the use of pharmacopoeia.

In the early 1930s a Chinese pediatrician by the name of Cheng Dan’an (承淡安, 1899-1957) proposed that needling therapy should be resurrected because its actions could potentially be explained by neurology. He therefore repositioned the points towards nerve pathways and away from blood vessels-where they were previously used for bloodletting. His reform also included replacing coarse needles with the filiform ones in use today.38 Reformed acupuncture gained further interest through the revolutionary committees in the People’s Republic of China in the 1950s and 1960s along with a careful selection of other traditional, folkloric and empirical modalities that were added to scientific medicine to create a makeshift medical system that could meet the dire public health and political needs of Maoist China while fitting the principles of Marxist dialectics. In deconstructing the events of that period, Kim Taylor in her remarkable book on Chinese medicine in early communist China, explains that this makeshift system has achieved the scale of promotion it did because it fitted in, sometimes in an almost accidental fashion, with the ideals of the Communist Revolution. As a result, by the 1960s acupuncture had passed from a marginal practice to an essential and high-profile part of the national health-care system under the Chinese Communist Party, who, as Kim Taylor argues, had laid the foundation for the institutionalized and standardized format of modern Chinese medicine and acupuncture found in China and abroad today.39 This modern construct was also a part of the training of the “barefoot doctors,” meaning peasants with an intensive three- to six-month medical and paramedical training, who worked in rural areas during the nationwide healthcare disarray of the Cultural Revolution era.40 They provided basic health care, immunizations, birth control and health education, and organized sanitation campaigns. Chairman Mao believed, however, that ancient natural philosophies that underlined these therapies represented a spontaneous and naive dialectical worldview based on social and historical conditions of their time and should be replaced by modern science.41 It is also reported that he did not use acupuncture and Chinese medicine for his own ailments.42

What is a suitable comparison/control group for a theatrical administration of a placebo?

A randomized double-blind crossover pilot study published in NEJM highlight some of the problems arising from poorly chosen control groups. The study compared an inhaled albuterol bronchodilator to one of three control conditions placebo inhaler, sham acupuncture, or no intervention. Subjective self-report measures of perceived improvement in asthma symptoms and perceived credibility of the treatments revealed only that the no-intervention condition was inferior to the active treatment of inhaled albuterol and the two placebo conditions, but no difference was found between the active treatment and the placebo conditions. However, strong differences were found between the active treatment in the three comparison/control conditions in an objective measure of physiological responses – improvement in forced expiratory volume (FEV1), measured with spirometry.

One take away lesson is we should be careful about accepting subjective self-report measures when objective measures are available. One objective measure in the present study was the taking of medication for migraines and there were no differences between groups. This point is missed in both the target article in JAMA Internal Medicine and the accompanying editorial.

The editorial does comment on the acupuncturists being unblinded – they clearly knew when they are providing the preferred “true” acupuncture and when they were providing sham. They had some instructions to avoid creating a desqi sensation in the sham group, but some latitude in working till it was achieved in the “true” group. Unblinded treatment providers are always a serious risk of bias in clinical trials, but we here we have a trial where the primary outcomes are subjective, the scientific status of desqi is dubious, and the providers might be seen as highly motivated to promote the “true” treatment.

I’m not sure why the editorialist was not stopped in her tracks by the unblinded acupuncturists – or for that matter why the journal published this article. But let’s ponder a bit difficulties in coming up with a suitable comparison/control group for what is – until proven otherwise – a theatrical and highly ritualized placebo. If a treatment has no scientifically valid crucial ingredient, how we construct a comparison/control group differs only in the absence of the active ingredient, but is otherwise equivalent?

There is a long history of futile efforts to apply sham acupuncture, defined by what practitioners consider the inappropriate meridians. An accumulation of failures to distinguish such sham from “true” acupuncture in clinical trials has led to arguments that the distinction may not be valid: the efficacy of acupuncture may depend only on the procedure, not choice of a correct meridian. There are other studies would seem to show some advantage to the active or “true” treatments. These are generally clinical trials with high risk of bias, especially the inability to blind practitioners as to what she treatment they are providing.

There are been some clever efforts to develop sham acupuncture techniques that can fool even experienced practitioners. A recent PLOS One article  tested needles that collapsed into themselves.

Up to 68% of patients and 83% of acupuncturists correctly identified the treatment, but for patients the distribution was not far from 50/50. Also, there was a significant interaction between actual or perceived treatment and the experience of de qi (p = 0.027), suggesting that the experience of de qi and possible non-verbal clues contributed to correct identification of the treatment. Yet, of the patients who perceived the treatment as active or placebo, 50% and 23%, respectively, reported de qi. Patients’ acute pain levels did not influence the perceived treatment. In conclusion, acupuncture treatment was not fully double-blinded which is similar to observations in pharmacological studies. Still, the non-penetrating needle is the only needle that allows some degree of practitioner blinding. The study raises questions about alternatives to double-blind randomized clinical trials in the assessment of acupuncture treatment.

This PLOS One study is supplemented by a recent review in PLOS One, Placebo Devices as Effective Control Methods in Acupuncture Clinical Trials:

Thirty-six studies were included for qualitative analysis while 14 were in the meta-analysis. The meta-analysis does not support the notion of either the Streitberger or the Park Device being inert control interventions while none of the studies involving the Takakura Device was included in the meta-analysis. Sixteen studies reported the occurrence of adverse events, with no significant difference between verum and placebo acupuncture. Author-reported blinding credibility showed that participant blinding was successful in most cases; however, when blinding index was calculated, only one study, which utilised the Park Device, seemed to have an ideal blinding scenario. Although the blinding index could not be calculated for the Takakura Device, it was the only device reported to enable practitioner blinding. There are limitations with each of the placebo devices and more rigorous studies are needed to further evaluate their effects and blinding credibility.

Really, must we we await better technology the more successfully fool’s acupuncturists and their patients whether they are actually penetrating the skin?

Results of the present study in JAMA: Internal Medicine are seemingly contradicted by the results of a large German trial  that found:

Results Between baseline and weeks 9 to 12, the mean (SD) number of days with headache of moderate or severe intensity decreased by 2.2 (2.7) days from a baseline of 5.2 (2.5) days in the acupuncture group compared with a decrease to 2.2 (2.7) days from a baseline of 5.0 (2.4) days in the sham acupuncture group, and by 0.8 (2.0) days from a baseline if 5.4 (3.0) days in the waiting list group. No difference was detected between the acupuncture and the sham acupuncture groups (0.0 days, 95% confidence interval, −0.7 to 0.7 days; P = .96) while there was a difference between the acupuncture group compared with the waiting list group (1.4 days; 95% confidence interval; 0.8-2.1 days; P<.001). The proportion of responders (reduction in headache days by at least 50%) was 51% in the acupuncture group, 53% in the sham acupuncture group, and 15% in the waiting list group.

Conclusion Acupuncture was no more effective than sham acupuncture in reducing migraine headaches although both interventions were more effective than a waiting list control.

I welcome someone with more time on their hands to compare and contrast the results of these two studies and decide which one has more credibility.

Maybe we step should back and ask “why is anyone care about such questions, when there is such doubt that a plausible scientific mechanism is in play?”

Time for JAMA: Internal Medicine to come clean

The JAMA: Internal Medicine article on acupuncture for prophylaxis of migraines is yet another example of a publication where revelation of earlier drafts, reviewer critiques, and author responses would be enlightening. Just what standard to which the authors are being held? What issues were raised in the review process? Beyond resolving crucial limitations like blinding of acupuncturists, under what conditions would be journal conclude that studies of acupuncture in general are sufficiently scientifically unsound and medically irrelevant to warrant publication in a prestigious JAMA journal.

Alternatively, is the journal willing to go on record that it is sufficient to establish that patients are satisfied with a pain treatment in terms of self-reported subjective experiences? Could we then simply close the issue of whether there is a plausible scientific mechanism involved where the existence of one can be seriously doubted? If so, why stop with evaluations with subjective pain would days without pain as the primary outcome?

acupuncture treatmentWe must question the wisdom of JAMA: Internal Medicine of inviting Dr. Amy Gelfand for editorial comment. She is apparently willing to allow that demonstration of a placebo response is sufficient for acceptance as a clinician. She also is attached to the University of California, San Francisco Headache Center which offers “alternative medicine, such as acupuncture, herbs, massage and meditation for treating headaches.” Endorsement of acupuncture in a prestigious journal as effective becomes part of the evidence considered for its reimbursement. I think there are enough editorial commentators out there without such conflicts of interest.

 

eBook_Mindfulness_345x550I will soon be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. Sign up at my new website to get notified about these courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

Danish RCT of cognitive behavior therapy for whatever ails your physician about you

I was asked by a Danish journalist to examine a randomized controlled trial (RCT) of cognitive behavior therapy (CBT) for functional somatic symptoms. I had not previously given the study a close look.

I was dismayed by how highly problematic the study was in so many ways.

I doubted that the results of the study showed any benefits to the patients or have any relevance to healthcare.

I then searched and found the website for the senior author’s clinical offerings.  I suspected that the study was a mere experimercial or marketing effort of the services he offered.

Overall, I think what I found hiding in plain sight has broader relevance to scrutinizing other studies claiming to evaluate the efficacy of CBT for what are primarily physical illnesses, not psychiatric disorders. Look at the other RCTs. I am confident you will find similar problems. But then there is the bigger picture…

[A controversial assessment ahead? You can stop here and read the full text of the RCT  of the study and its trial registration before continuing with my analysis.]

Schröder A, Rehfeld E, Ørnbøl E, Sharpe M, Licht RW, Fink P. Cognitive–behavioural group treatment for a range of functional somatic syndromes: randomised trial. The British Journal of Psychiatry. 2012 Apr 13:bjp-p.

A summary overview of what I found:

 The RCT:

  • Was unblinded to patients, interventionists, and to the physicians continuing to provide routine care.
  • Had a grossly unmatched, inadequate control/comparison group that leads to any benefit from nonspecific (placebo) factors in the trial counting toward the estimated efficacy of the intervention.
  • Relied on subjective self-report measures for primary outcomes.
  • With such a familiar trio of design flaws, even an inert homeopathic treatment would be found effective, if it were provided with the same positive expectations and support as the CBT in this RCT. [This may seem a flippant comment that reflects on my credibility, not the study. But please keep reading to my detailed analysis where I back it up.]
  • The study showed an inexplicably high rate of deterioration in both treatment and control group. Apparent improvement in the treatment group might only reflect less deterioration than in the control group.
  • The study is focused on unvalidated psychiatric diagnoses being applied to patients with multiple somatic complaints, some of whom may not yet have a medical diagnosis, but most clearly had confirmed physical illnesses.

But wait, there is more!

  • It’s not CBT that was evaluated, but a complex multicomponent intervention in which what was called CBT is embedded in a way that its contribution cannot be evaluated.

The “CBT” did not map well on international understandings of the assumptions and delivery of CBT. The complex intervention included weeks of indoctrination of the patient with an understanding of their physical problems that incorporated simplistic pseudoscience before any CBT was delivered. We focused on goals imposed by a psychiatrist that didn’t necessarily fit with patients’ sense of their most pressing problems and the solutions.

OMGAnd the kicker.

  • The authors switched primary outcomes – reconfiguring the scoring of their subjective self-report measures years into the trial, based on a peeking at the results with the original scoring.

Investigators have a website which is marketing services. Rather than a quality contribution to the literature, this study can be seen as an experimercial doomed to bad science and questionable results from before the first patient was enrolled. An undeclared conflict of interest in play? There is another serious undeclared conflict of interest for one of the authors.

For the uninformed and gullible, the study handsomely succeeds as an advertisement for the investigators’ services to professionals and patients.

Personally, I would be indignant if a primary care physician tried to refer me or friend or family member to this trial. In the absence of overwhelming evidence to the contrary, I assume that people around me who complain of physical symptoms have legitimate physical concerns. If they do not yet have a confirmed diagnosis, it serves little purpose to stop the probing and refer them to psychiatrists. This trial operates with an anachronistic Victorian definition of psychosomatic condition.

something is rotten in the state of DenmarkBut why should we care about a patently badly conducted trial with switched outcomes? Is it only a matter of something being rotten in the state of Denmark? Aside from the general impact on the existing literature concerning CBT for somatic conditions, results of this trial  were entered into a Cochrane review of nonpharmacological interventions for medically unexplained symptoms. I previously complained about one of the authors of this RCT also being listed as an author on another Cochrane review protocol. Prior to that, I complained to Cochrane  about this author’s larger research group influencing a decision to include switched outcomes in another Cochrane review.  A lot of us rightfully depend heavily on the verdict of Cochrane reviews for deciding best evidence. That trust is being put into jeopardy.

Detailed analysis

1.This is an unblinded trial, a particularly weak methodology for examining whether a treatment works.

The letter that alerted physicians to the trial had essentially encouraged them to refer patients they were having difficulty managing.

‘Patients with a long-term illness course due to medically unexplained or functional somatic symptoms who may have received diagnoses like fibromyalgia, chronic fatigue syndrome, whiplash associated disorder, or somatoform disorder.

Patients and the physicians who referred them subsequently got feedback about to which group patients were assigned, either routine care or what was labeled as CBT. This information could have had a strong influence on the outcomes that were reported, particularly for the patients left in routine care.

Patients’ learning that they did not get assigned to the intervention group was undoubtedly disappointing and demoralizing. The information probably did nothing to improve the positive expectations and support available to patients in routine. This could have had a nocebo effect. The feedback may have contributed to the otherwise  inexplicably high rates of subjective deterioration [to be noted below] reported by patients left in the routine care condition. In contrast, the authors’ disclosure that patients had been assigned to the intervention group undoubtedly boosted the morale of both patients and physicians and also increased the gratitude of the patients. This would be reflected in the responses to the subjective outcome measures.

The gold standard alternative to an unblinded trial is a double-blind, placebo-controlled trial in which neither providers, nor patients, nor even the assessors rating outcomes know to which group particular patients were assigned. Of course, this is difficult to achieve in a psychotherapy trial. Yet a fair alternative is a psychotherapy trial in which patients and those who refer them are blind to the nature of the different treatments, and in which an effort is made to communicate credible positive expectations about the comparison control group.

Conclusion: A lack of blinding seriously biases this study toward finding a positive effect for the intervention, regardless of whether the intervention has any active, effective component.

2. A claim that this is a randomized controlled trial depends on the adequacy of the control offered by the comparison group, enhanced routine care. Just what is being controlled by the comparison? In evaluating a psychological treatment, it’s important that the comparison/control group offers the same frequency and intensity of contact, positive expectations, attention and support. This trial decidedly did not.

 There were large differences between the intervention and control conditions in the amount of contact time. Patients assigned to the cognitive therapy condition received an additional 9 group sessions with a psychiatrist of 3.5 hour duration, plus the option of even more consultations. The over 30 hours of contact time with a psychiatrist should be very attractive to patients who wanted it and could not otherwise obtain it. For some, it undoubtedly represented an opportunity to have someone to listen to their complaints of pain and suffering in a way that had not previously happened. This is also more than the intensity of psychotherapy typically offered in clinical trials, which is closer to 10 to 15, 50-minute sessions.

The intervention group thus received substantially more support and contact time, which was delivered with more positive expectations. This wealth of nonspecific factors favoring the intervention group compromises an effort to disentangle the specific effects of any active ingredient in the CBT intervention package. From what has been said so far, the trials’ providing a fair and generalizable evaluation of the CBT intervention is nigh impossible.

Conclusion: This is a methodologically poor choice of control groups with the dice loaded to obtain a positive effect for CBT.

3.The primary outcomes, both as originally scored and after switching, are subjective self-report measures that are highly responsive to nonspecific treatments, alleviation of mild depressive symptoms and demoralization. They are not consistently related to objective changes in functioning. They are particularly problematic when used as outcome measures in the context of an unblinded clinical trial within an inadequate control group.

There have been consistent demonstrations that assigning patients to inert treatments and measuring the outcomes with subjective measures may register improvements that will not correspond to what would be found with objective measures.

For instance, a provocative New England Journal of Medicine study showed that sham acupuncture as effective as an established medical treatment – an albuterol inhaler – for asthma when judged with subjective measures, but there was a large superiority for the established medical treatment obtained with objective measures.

There have been a number of demonstrations that treatments such as the one offered in the present study to patient populations similar to those in the study produce changes in subjective self-report that are not reflected in objective measures.

Much of the improvement in primary outcomes occurred before the first assessment after baseline and not very much afterwards. The early response is consistent with a placebo response.

The study actually included one largely unnoticed objective measure, utilization of routine care. Presumably if the CBT was effective as claimed, it would have produced a significant reduction in healthcare utilization. After all, isn’t the point of this trial to demonstrate that CBT can reduce health-care utilization associated with (as yet) medically unexplained symptoms? Curiously, utilization of routine care did not differ between groups.

The combination of the choice of subjective outcomes, unblinded nature of the trial, and poorly chosen control group bring together features that are highly likely to produce the appearance of positive effects, without any substantial benefit to the functioning and well-being of the patients.

Conclusion: Evidence for the efficacy of a CBT package for somatic complaints that depends solely on subjective self-report measures is unreliable, and unlikely to generalize to more objective measures of meaningful impact on patients’ lives.

4. We need to take into account the inexplicably high rates of deterioration in both groups, but particularly in the control group receiving enhanced care.

There was an unexplained deterioration of 50% deterioration in the control group and 25% in the intervention group. Rates of deterioration are only given a one-sentence mention in the article, but deserve much more attention. These rates of deterioration need to qualify and dampen any generalizable clinical interpretation of other claims about outcomes attributed to the CBT. We need to keep in mind that the clinical trials cannot determine how effective treatments are, but only how different a treatment is from a control group. So, an effect claimed for a treatment and control can largely or entirely come from deterioration in the control group, not what the treatment offers. The claim of success for CBT probably largely depends on the deterioration in the control group.

One interpretation of this trial is that spending an extraordinary 30 hours with a psychiatrist leads to only half the deterioration experienceddoing nothing more than routine care. But this begs the question of why are half the patients left in routine care deteriorating in such a large proportion. What possibly could be going on?

Conclusion: Unexplained deterioration in the control group may explain apparent effects of the treatment, but both groups are doing badly.

5. The diagnosis of “functional somatic symptoms” or, as the authors prefer – Severe Bodily Distress Syndromes – is considered by the authors to be a psychiatric diagnosis. It is not accepted as a valid diagnosis internationally. Its validation is limited to the work done almost entirely within the author group, which is explicitly labeled as “preliminary.” This biased sample of patients is quite heterogeneous, beyond their physicians having difficulty managing them. They have a full range of subjective complaints and documented physical conditions. Many of these patients would not be considered primarily having a psychiatric disorder internationally and certainly within the US, except where they had major depression or an anxiety disorder. Such psychiatric disorders were not an exclusion criteria.

Once sent on the pathway to a psychiatric diagnosis by their physicians’ making a referral to the study, patients had to meet additional criteria:

To be eligible for participation individuals had to have a chronic (i.e. of at least 2 years duration) bodily distress syndrome of the severe multi-organ type, which requires functional somatic symptoms from at least three of four bodily systems, and moderate to severe impairment.in daily living.

The condition identified in the title of the article is not validated as a psychiatric diagnosis. Two papers to which the authors refer to their  own studies ( 1 , 2 ) from a single sample. The title of one of these papers makes a rather immodest claim:

Fink P, Schröder A. One single diagnosis, bodily distress syndrome, succeeded to capture 10 diagnostic categories of functional somatic syndromes and somatoform disorders. Journal of Psychosomatic Research. 2010 May 31;68(5):415-26.

In neither the two papers nor the present RCT is there sufficient effort to rule out a physical basis for the complaints qualifying these patients for a psychiatric diagnosis. There is also a lack of follow-up to see if physical diagnoses were later applied.

Citation patterns of these papers strongly suggest  the authors are not having got much traction internationally. The criteria of symptoms from three out of four bodily systems is arbitrary and unvalidated. Many patients with known physical conditions would meet these criteria without any psychiatric diagnosis being warranted.

The authors relate what is their essentially homegrown diagnosis to functional somatic syndromes, diagnoses which are themselves subject to serious criticism. See for instance the work of Allen Frances M.D., who had been the chair of the American Psychiatric Association ‘s Diagnostic and Statistical Manual (DSM-IV) Task Force. He became a harsh critic of its shortcomings and the failures of APA to correct coverage of functional somatic syndromes in the next DSM.

Mislabeling Medical Illness As Mental Disorder

Unless DSM-5 changes these incredibly over inclusive criteria, it will greatly increase the rates of diagnosis of mental disorders in the medically ill – whether they have established diseases (like diabetes, coronary disease or cancer) or have unexplained medical conditions that so far have presented with somatic symptoms of unclear etiology.

And:

The diagnosis of mental disorder will be based solely on the clinician’s subjective and fallible judgment that the patient’s life has become ‘subsumed’ with health concerns and preoccupations, or that the response to distressing somatic symptoms is ‘excessive’ or ‘disproportionate,’ or that the coping strategies to deal with the symptom are ‘maladaptive’.

And:

 “These are inherently unreliable and untrustworthy judgments that will open the floodgates to the overdiagnosis of mental disorder and promote the missed diagnosis of medical disorder.

The DSM 5 Task force refused to adopt changes proposed by Dr. Frances.

Bad News: DSM 5 Refuses to Correct Somatic Symptom Disorder

Leading Frances to apologize to patients:

My heart goes out to all those who will be mislabeled with this misbegotten diagnosis. And I regret and apologize for my failure to be more effective.

The chair of The DSM Somatic Symptom Disorder work group has delivered a scathing critique of the very concept of medically unexplained symptoms.

Dimsdale JE. Medically unexplained symptoms: a treacherous foundation for somatoform disorders?. Psychiatric Clinics of North America. 2011 Sep 30;34(3):511-3.

Dimsdale noted that applying this psychiatric diagnosis sidesteps the quality of medical examination that led up to it. Furthermore:

Many illnesses present initially with nonspecific signs such as fatigue, long before the disease progresses to the point where laboratory and physical findings can establish a diagnosis.

And such diagnoses may encompass far too varied a group of patients for any intervention to make sense:

One needs to acknowledge that diseases are very heterogeneous. That heterogeneity may account for the variance in response to intervention. Histologically, similar tumors have different surface receptors, which affect response to chemotherapy. Particularly in chronic disease presentations such as irritable bowel syndrome or chronic fatigue syndrome, the heterogeneity of the illness makes it perilous to diagnose all such patients as having MUS and an underlying somatoform disorder.

I tried making sense of a table of the additional diagnoses that the patients in this study had been given. A considerable proportion of patients had physical conditions that would not be considered psychiatric problems in the United States.. Many patients could be suffering from multiple symptoms not only from the conditions, but side effects of the medications being offered. It is very difficult to manage multiple medications required by multiple comorbidities. Physicians from the community found their competence and ability to spend time with these patients taxing.

table of functional somatic symptoms

Most patients had a diagnosis of “functional headaches.” It’s not clear what this designation means, but conceivably it could include migraine headaches, which are accompanied by multiple physical complaints. CBT is not an evidence-based treatment of choice for functional headaches, much less migraines.

Over a third of the patients had irritable bowel syndrome (IBS). A systematic review of the comorbidity  of irritable bowel syndrome concluded physical comorbidity is the norm in IBS:

The nongastrointestinal nonpsychiatric disorders with the best-documented association are fibromyalgia (median of 49% have IBS), chronic fatigue syndrome (51%), temporomandibular joint disorder (64%), and chronic pelvic pain (50%).

In the United States, many patients and specialists would consider considering irritable bowel syndrome as a psychiatric condition offensive and counterproductive. There is growing evidence that irritable bowel syndrome is a disturbance in the gut microbiota. It involves a gut-brain interaction, but the primary direction of influence is of the disturbance in the gut on the brain. Anxiety and depression symptoms are secondary manifestations, a product of activity in the gut influencing the nervous system.

Most of the patients in the sample had a diagnosis of fibromyalgia and over half of all patients in this study had a diagnosis of chronic fatigue syndrome.

Other patients had diagnosable anxiety and depressive disorders, which, particularly at the lower end of severity, are responsive to nonspecific treatments.

Undoubtedly many of these patients, perhaps most of them, are demoralized by not been able to get a  diagnosis for what they have good basis to believe is a medical condition, aside from the discomfort, pain, and interference with their life that they are experiencing. They could be experiencing a demoralization secondary to physical illness.

These patients presented with pain, fatigue, general malaise, and demoralization. I have trouble imagining how their specific most pressing concerns could be addressed in group settings. These patients pose particular problems for making substantive clinical interpretation of outcomes that are highly general and subjective.

Conclusion: Diagnosing patients with multiple physical symptoms as having a psychiatric condition is highly controversial. Results will not generalize to countries and settings where the practice is not accepted. Many of the patients involved in the study had recognizable physical conditions, and yet they are being shunted to psychiatrists who focused only on their attitude towards the symptoms. They are being denied the specialist care and treatments that might conceivably reduce the impact of their conditions on their lives

6. The “CBT” offered in this study is as part of a complex, multicomponent treatment that does not resemble cognitive behavior therapy as it is practiced in the United States.

it is thoughtAs seen in figure 1 in the article, The multicomponent intervention is quite complex and consists of more than cognitive behavior therapy. Moreover, at least in the United States, CBT has distinctive elements of collaborative empiricism. Patients and therapist work together selecting issues on which to focus, developing strategies, with the patients reporting back on efforts to implement them. From the details available in the article, the treatment sounded much more like an exhortation or indoctrination, even arguing with the patients, if necessary. An English version available on the web of the educational material used in initial sessions confirmed a lot of condescending pseudoscience was presented to convince the patients that their problems were largely in their heads.

Without a clear application of learning theory, behavioral analysis, or cognitive science, the “CBT”  treatment offered in this RCT has much more in common with the creative novation therapy offered by Hans Eysenck, which is now known to have been justified with fraudulent data. Indeed,  the educational materials  for this study to what is offered in Eysenck’s study reveal striking similarities. Eysenck was advancing the claim that his intervention could prevent cardiovascular disease and cancer and overcome the iatrogenic effects. I know, this sounds really crazy, but see my careful documentation elsewhere.

Conclusion: The embedding of an unorthodox “CBT” in a multicomponent intervention in this study does not allow isolating any specific, active component ofCBT that might be at work.

7. The investigators disclose having altered their scoring of their primary outcome years after the trial began, and probably after a lot of outcome data had been collected.

I found a casual disclosure in the method section of this article unsettling, particularly noting that the original trial registration was:

We found an unexpected moderate negative correlation of the physical and mental component summary measures, which are constructed as independent measures. According to the SF-36 manual, a low or zero correlation of the physical and mental components is a prerequisite of their use.23 Moreover, three SF-36 scales that contribute considerably to the PCS did not fulfil basic scaling assumptions.31 These findings, together with a recent report of problems with the PCS in patients with physical and mental comorbidity,32 made us concerned that the PCS would not reliably measure patients’ physical health in the study sample. We therefore decided before conducting the analysis not to use the PCS, but to use instead the aggregate score as outlined above as our primary outcome measure. This decision was made on 26 February 2009 and registered as a protocol change at clinical trials. gov on 11 March 2009. Only baseline data had been analysed when we made our decision and the follow-up data were still concealed.

Switching outcomes, particularly after some results are known, constitutes a serious violation of best research practices and leads to suspicion of the investigators refining their hypotheses after they had peeked at the data. See How researchers dupe the public with a sneaky practice called “outcome switching”.

The authors had originally proposed a scoring consistent with a very large body of literature. Dropping the original scoring precludes any direct comparison with this body of research, including basic norms. They claim that they switched scoring because two key subscales were correlated in the opposite direction of what is reported in the larger literature. This is troubling indication that something has gone terribly wrong in authors’ recruitment of a sample. It should not be pushed under the rug.

The authors claim that they switched outcomes based only on examining of baseline data from their study. However, one of the authors, Michael Sharpe is also an author on the controversial PACE trial  A parallel switch was made to the scoring of the subjective self-reports in that trial. When the data were eventually re-analyzed using the original scoring, any positive findings for the trial were substantially reduced and arguably disappeared.

Even if the authors of the present RCT did not peekat their outcome data before deciding to switch scoring of the primary outcome, they certainly had strong indications from other sources that the original scoring would produce weak or null findings. In 2009, one of the authors, Michael Sharpe had access to results of a relevant trial. What is called the FINE trial had null findings, which affected decisions to switch outcomes in the PACE trial. Is it just a coincidence that the scoring of the outcomes was then switched for the present RCT?

Conclusion: The outcome switching for the present trial  represents bad research practices. For the trial to have any credibility, the investigators should make their data publicly available so these data could be independently re-analyzed with the original scoring of primary outcomes.

The senior author’s clinic

 I invite readers to take a virtual tour of the website for the senior author’s clinical services  ]. Much of it is available in English. Recently, I blogged about dubious claims of a health care system in Detroit achieving a goal of “zero suicide.” . I suggested that the evidence for this claim was quite dubious, but was a powerful advertisement for the health care system. I think the present report of an RCT can similarly be seen as an infomercial for training and clinical services available in Denmark.

Conflict of interest

 No conflict of interest is declared for this RCT. Under somewhat similar circumstances, I formally complained about undeclared conflicts of interest in a series of papers published in PLOS One. A correction has been announced, but not yet posted.

Aside from the senior author’s need to declare a conflict of interest, the same can be said for one of the authors, Michael Sharpe.

Apart from the professional and reputational interest, (his whole career has been built making strong claims about such interventions) Sharpe works for insurance companies, and publishes on the subject. He declared a conflict of interest for the for PACE trial.

MS has done voluntary and paid consultancy work for government and for legal and insurance companies, and has received royalties from Oxford University Press.

Here’s Sharpe’s report written for the social benefits reinsurance company UnumProvident.

If results of this are accepted at face, they will lend credibility to the claims that effective interventions are available to reduce social disability. It doesn’t matter that the intervention is not effective. Rather persons receiving social disability payments can be disqualified because they are not enrolled in such treatment.

Effects on the credibility of Cochrane collaboration report

The switched outcomes of the trial were entered into a Cochrane systematic review, to which primary care health professionals look for guidance in dealing with a complex clinical situation. The review gives no indication of the host of problems that I exposed here. Furthermore, I have glanced at some of the other trials included and I see similar difficulties.

I been unable to convince the Cochrane to clean up conflicts of interest that are attached to switched outcomes being entered in reviews. Perhaps some of my readers will want to approach Cochrane to revisit this issue.
I think this post raises larger issues about whether Cochrane has any business conducting and disseminating reviews of such a bogus psychiatric diagnosis, medically unexplained symptoms. These reviews do patients no good, and may sidetrack them from getting the medical care they deserve. The reviews do serve the interest of special interests, including disability insurance companies.

Special thanks to John Peters and to Skeptical Cat for their assistance with my writing this blog. However, I have sole responsibility for any excesses or distortions.