Calling out pseudoscience, radically changing the conversation about Amy Cuddy’s power posing paper

Part 1: Reviewed as the clinical trial that it is, the power posing paper should never have been published.

Has too much already been written about Amy Cuddy’s power pose paper? The conversation should not be stopped until its focus shifts and we change our ways of talking about psychological science.

The dominant narrative is now that a junior scientist published an influential paper on power posing and was subject to harassment and shaming by critics, pointing to the need for greater civility in scientific discourse.

Attention has shifted away from the scientific quality of the paper and the dubious products the paper has been used to promote and on the behavior of its critics.

Amy Cuddy and powerful allies are given forums to attack and vilify critics, accusing them of damaging the environment in which science is done and discouraging prospective early career investigators from entering the field.

Meanwhile, Amy Cuddy commands large speaking fees and has a top-selling book claiming the original paper provides strong science for simple behavioral manipulations altering mind-body relations and producing socially significant behavior.

This misrepresentation of psychological science does potential harm to consumers and the reputation of psychology among lay persons.

This blog post is intended to restart the conversation with a reconsideration of the original paper as a clinical and health psychology randomized trial (RCT) and, on that basis, identifying the kinds of inferences that are warranted from it.

In the first of a two post series, I argue that:

The original power pose article in Psychological Science should never been published.

-Basically, we have a therapeutic analog intervention delivered in 2 1-minute manipulations by unblinded experimenters who had flexibility in what they did,  what they communicated to participants, and which data they chose to analyze and how.

-It’s unrealistic to expect that 2 1-minute behavioral manipulations would have robust and reliable effects on salivary cortisol or testosterone 17 minutes later.

-It’s absurd to assume that the hormones mediated changes in behavior in this context.

-If Amy Cuddy retreats to the idea that she is simply manipulating “felt power,” we are solidly in the realm of trivial nonspecific and placebo effects.

The original power posing paper

Carney DR, Cuddy AJ, Yap AJ. Power posing brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science. 2010 Oct 1;21(10):1363-8.

The Psychological Science article can be construed as a brief mind-body intervention consisting of 2 1-minute behavioral manipulations. Central to the attention that the paper attracted is that argument that this manipulation  affected psychological state and social performance via the effects of the manipulation on the neuroendocrine system.

The original study is in effect, a disguised randomized clinical trial (RCT) of a biobehavioral intervention. Once this is recognized, a host of standards can come into play for reporting this study and interpreting the results.

CONSORT

All major journals and publishers including Association for Psychological Science have adopted the Consolidated Standards of Reporting Trials (CONSORT). Any submission of a manuscript reporting a clinical trial is required to be accompanied by a checklist  indicating that the article reports that particular details of how the trial was conducted. Item 1 on the checklist specifies that both the title and abstract indicate the study was a randomized trial. This is important and intended to aid readers in evaluating the study, but also for the study to be picked up in systematic searches for reviews that depend on screening of titles and abstracts.

I can find no evidence that Psychological Science adheres to CONSORT. For instance, my colleagues and I provided a detailed critique of a widely promoted study of loving-kindness meditation that was published in Psychological Science the same year as Cuddy’s power pose study. We noted that it was actually a poorly reported null trial with switched outcomes. With that recognition, we went on to identify serious conceptual, methodological and statistical problems. After overcoming considerable resistance, we were able  to publish a muted version of our critique. Apparently reviewers of the original paper had failed to evaluate it in terms of it being an RCT.

The submission of the completed CONSORT checklist has become routine in most journals considering manuscripts for studies of clinical and health psychology interventions. Yet, additional CONSORT requirements that developed later about what should be included in abstracts are largely being ignored.

It would be unfair to single out Psychological Science and the Cuddy article for noncompliance to CONSORT for abstracts. However, the checklist can be a useful frame of reference for noting just how woefully inadequate the abstract was as a report of a scientific study.

CONSORT for abstracts

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF, CONSORT Group. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLOS Medicine. 2008 Jan 22;5(1):e20.

Journal and conference abstracts should contain sufficient information about the trial to serve as an accurate record of its conduct and findings, providing optimal information about the trial within the space constraints of the abstract format. A properly constructed and well-written abstract should also help individuals to assess quickly the validity and applicability of the findings and, in the case of abstracts of journal articles, aid the retrieval of reports from electronic databases.

Even if CONSORT for abstracts did not exist, we could argue that readers, starting with the editor and reviewers were faced with an abstract with extraordinary claims that required better substantiation. They were disarmed by a lack of basic details from evaluating these claims.

In effect, the abstract reduces the study to an experimercial for products about to be marketed in corporate talks and workshops, but let’s persist in evaluating it as an abstract as a scientific study.

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

I don’t believe I have ever encountered in an abstract the extravagant claims with which this abstract concludes. But readers are not provided any basis for evaluating the claim until the Methods section. Undoubtedly, many holding opinions about the paper did not read that far.

Namely:

Forty-two participants (26 females and 16 males) were randomly assigned to the high-power-pose or low-power-pose condition.

Testosterone levels were in the normal range at both Time 1 (M = 60.30 pg/ml, SD = 49.58) and Time 2 (M = 57.40 pg/ml, SD = 43.25). As would be suggested by appropriately taken and assayed samples (Schultheiss & Stanton, 2009), men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.2

Too small a study to provide an effect size

Hold on! First. Only 42 participants  (26 females and 16 males) would readily be recognized as insufficient for an RCT, particularly in an area of research without past RCTs.

After decades of witnessing the accumulation of strong effect sizes from underpowered studies, many of us have reacted by requiring 35 participants per group as the minimum acceptable level for a generalizable effect size. Actually, that could be an overly liberal criterion. Why?

Many RCTs are underpowered, yet a lack of enforcement of preregistration allows positive results by redefining the primary outcomes after results are known. A psychotherapy trial with 30 or less patients in the smallest cell has less than a 50% probability of detecting a moderate sized significant effect, even if it is present (Coyne,Thombs, & Hagedoorn, 2010). Yet an examination of the studies mustered for treatments being evidence supported by APA Division 12 indicates that many studies were too underpowered to be reliably counted as evidence of efficacy, but were included without comment about this problem. Taking an overview, it is striking the extent to which the literature continues depend on small, methodologically flawed RCTs conducted by investigators with strong allegiances to one of the treatments being evaluated. Yet, which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated (Luborsky et al., 2006).

Earlier my colleagues and I had argued for the non-accumulative  nature of evidence from small RCTs:

Kraemer, Gardner, Brooks, and Yesavage (1998) propose excluding small, underpowered studies from meta-analyses. The risk of including studies with inadequate sample size is not limited to clinical and pragmatic decisions being made on the basis of trials that cannot demonstrate effectiveness when it is indeed present. Rather, Kraemer et al. demonstrate that inclusion of small, underpowered trials in meta-analyses produces gross overestimates of effect size due to substantial, but unquantifiable confirmatory publication bias from non-representative small trials. Without being able to estimate the size or extent of such biases, it is impossible to control for them. Other authorities voice support for including small trials, but generally limit their argument to trials that are otherwise methodologically adequate (Sackett & Cook, 1993; Schulz & Grimes, 2005). Small trials are particularly susceptible to common methodological problems…such as lack of baseline equivalence of groups; undue influence of outliers on results; selective attrition and lack of intent-to-treat analyses; investigators being unblinded to patient allotment; and not having a pre-determined stopping point so investigators are able to stop a trial when a significant effect is present.

In the power posing paper, there was the control for sex in all analyses because a peek at the data revealed baseline sex differences in testosterone dwarfing any other differences. What do we make of investigators conducting a study depending on testosterone mediating a behavioral manipulation who did not anticipate large baseline sex differences in testosterone?

In a Pubpeer comment leading up to this post , I noted:

We are then told “men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.”

The findings alluded to in the abstract should be recognizable as weird and uninterpretable. Most basically, how could the 16 males be distributed across the two groups so that the authors could confidently say that differences held for both males and females? Especially when all analyses control for sex? Sex is highly correlated with testosterone and so an analysis that controlled for both the variables, sex and testosterone would probably not generalize to testosterone without such controls.

We are never given the basic statistics in the paper to independently assess what the authors are doing, not the correlation between cortisol and testosterone, only differences in time 2 cortisol controlling for time 1 cortisol, time 1 testosterone and gender. These multivariate statistics are not  very generalizable in a sample with 42 participants distributed across 2 groups. Certainly not for the 26 females and 16  males taken separately.

The behavioral manipulation

The original paper reports:

Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation.

And then elaborates:

To configure the test participants into the poses, the experimenter placed an electrocardiography lead on the back of each participant’s calf and underbelly of the left arm and explained, “To test accuracy of physiological responses as a function of sensor placement relative to your heart, you are being put into a certain physical position.” The experimenter then manually configured participants’ bodies by lightly touching their arms and legs. As needed, the experimenter provided verbal instructions (e.g., “Keep your feet above heart level by putting them on the desk in front of you”). After manually configuring participants’ bodies into the two poses, the experimenter left the room. Participants were videotaped; all participants correctly made and held either two high-power or two low-power poses for 1 min each. While making and holding the poses, participants completed a filler task that consisted of viewing and forming impressions of nine faces.

The behavioral task and subjective self-report assessment

Measure of risk taking and powerful feelings. After they finished posing, participants were presented with the gambling task. They were endowed with $2 and told they could keep the money—the safe bet—or roll a die and risk losing the $2 for a payoff of $4 (a risky but rational bet; odds of winning were 50/50). Participants indicated how “powerful” and “in charge” they felt on a scale from 1 (not at all) to 4 (a lot).

An imagined bewildered review from someone accustomed to evaluating clinical trials

Although the authors don’t seem to know what they’re doing, we have an underpowered therapy analogue study with extraordinary claims. It’s unconvincing  that the 2 1-minute behavioral manipulations would change subsequent psychological states and behavior with any extralaboratory implications.

The manipulation poses a puzzle to research participants, challenging them to figure out what is being asked of them. The $2 gambling task presumably is meant to simulate effects on real-world behavior. But the low stakes could mean that participants believed the task evaluated whether they “got” the purpose of the intervention and behaved accordingly. Within that perspective, the unvalidated subjective self-report rating scale would serve as a clue to the intentions of the experimenter and an opportunity to show the participants were smart. The  manipulation of putting participants  into a low power pose is even more unconvincing as a contrasting active intervention or a control condition.  Claims that this manipulation did anything but communicate experimenter expectancies are even less credible.

This is a very weak form of evidence: A therapy analogue study with such a brief, low intensity behavioral manipulation followed by assessments of outcomes that might just inform participants of what they needed to do to look smart (i.e., demand characteristics). Add in that the experimenters were unblinded and undoubted had flexibility in how they delivered the intervention and what they said to participants. As a grossly underpowered trial, the study cannot make a contribution to the literature and certainly not an effect size.

Furthermore, if the authors had even a basic understanding of gender differences in social status or sex differences in testosterone, they would have stratified the study with respect to participate gender, not attempted to obtain control by post hoc statistical manipulation.

I could comment on signs of p-hacking and widespread signs of inappropriate naming, use, and interpretation of statistics, but why bother? There are no vital signs of a publishable paper here.

Is power posing salvaged by fashionable hormonal measures?

 Perhaps the skepticism of the editor and reviewers was overcome by the introduction of mind-body explanations  of what some salivary measures supposedly showed. Otherwise, we would be left with a single subjective self-report measure and a behavioral task susceptible to demand characteristics and nonspecific effects.

We recognize that the free availability of powerful statistical packages risks people using them without any idea of the appropriateness of their use or interpretation. The same observation should be made of the ready availability of means of collecting spit samples from research participants to be sent off to outside laboratories for biochemical analysis.

The clinical health psychology literature is increasingly filled with studies incorporating easily collected saliva samples intended to establish that psychological interventions influence mind-body relations. These have become particularly applied in attempts to demonstrate that mindfulness meditation and even tai chi can have beneficial effects on physical health and even cancer outcomes.

Often inaccurately described as as “biomarkers,” rather than merely as biological measurements, there is seldom little learned by inclusion of such measures that is generalizable within participants or across studies.

Let’s start with salivary-based cortisol measures.

A comprehensive review  suggests that:

  • A single measurement on a participant  or a pre-post pair of assessments would not be informative.
  • Single measurements are unreliable and large intra-and inter-individual differences not attributable to intervention can be in play.
  • Minor variations in experimental procedures can have large, unwanted effects.
  • The current standard is cortisol awakening response in the diurnal slope over more than one day, which would not make sense for the effects of 2 1-minute behavioral manipulations.
  • Even with sophisticated measurement strategies there is low agreement across and even within studies and low agreement with behavioral and self-report data.
  • The idea of collecting saliva samples would serve the function the investigators intended is an unscientific, but attractive illusion.

Another relevant comprehensive theoretical review and synthesis of cortisol reactivity was available at the time the power pose study was planned. The article identifies no basis for anticipating that experimenters putting participants into a 1-minute expansive poses would lower cortisol. And certainly no basis for assuming that putting participants into a 1-minute slumped position would raise cortisol. Or what such findings could possibly mean.

But we are clutching at straws. The authors’ interpretations of their hormonal data depend on bizarre post hoc decisions about how to analyze their data in a small sample in which participant sex is treated in incomprehensible  fashion. The process of trying to explain spurious results risks giving the results a credibility that authors have not earned for them. And don’t even try to claim we are getting signals of hormonal mediation from this study.

Another system failure: The incumbent advantage given to a paper that should not have been published.

Even when publication is based on inadequate editorial oversight and review, any likelihood or correction is diminished by published results having been blessed as “peer reviewed” and accorded an incumbent advantage over whatever follows.

A succession of editors have protected the power pose paper from post-publication peer review. Postpublication review has been relegated to other journals and social media, including PubPeer and blogs.

Soon after publication of  the power pose paper, a critique was submitted to Psychological Science, but it was desk rejected. The editor informally communicated to the author that the critique read like a review and teh original article had already been peer reviewed.

The critique by Steven J. Stanton nonetheless eventually appeared in Frontiers in Behavioral Neuroscience and is worth a read.

Stanton took seriously the science being invoked in the claims of the power pose paper.

A sampling:

Carney et al. (2010) collapsed over gender in all testosterone analyses. Testosterone conforms to a bimodal distribution when including both genders (see Figure 13; Sapienza et al., 2009). Raw testosterone cannot be considered a normally distributed dependent or independent variable when including both genders. Thus, Carney et al. (2010) violated a basic assumption of the statistical analyses that they reported, because they used raw testosterone from pre- and post-power posing as independent and dependent variables, respectively, with all subjects (male and female) included.

And

^Mean cortisol levels for all participants were reported as 0.16 ng/mL pre-posing and 0.12 ng/mL post-posing, thus showing that for all participants there was an average decrease of 0.04 ng/mL from pre- to post-posing, regardless of condition. Yet, Figure 4 of Carney et al. (2010) shows that low-power posers had mean cortisol increases of roughly 0.025 ng/mL and high-power posers had mean cortisol decreases of roughly 0.03 ng/mL. It is unclear given the data in Figure 4 how the overall cortisol change for all participants could have been a decrease of 0.04 ng/mL.

Another editor of Psychological Science received a critical comment from Marcus Crede and Leigh A. Phillips. After the first round of reviews, the Crede and Philips removed references to changes in the published power pose paper from earlier drafts that they had received from the first author, Dana Carney. However, Crede and Phillips withdrew their critique when asked to respond to a review by Amy Cuddy in a second resubmission.

The critique is now forthcoming in Social Psychological and Personality Science

Revisiting the Power Pose Effect: How Robust Are the Results Reported by Carney, Cuddy and Yap (2010) to Data Analytic Decisions

The article investigates effects of choices made in p-hacking in the original paper. An excerpt from the abstract

In this paper we use multiverse analysis to examine whether the findings reported in the original paper by Carney, Cuddy, and Yap (2010) are robust to plausible alternative data analytic specifications: outlier identification strategy; the specification of the dependent variable; and the use of control variables. Our findings indicate that the inferences regarding the presence and size of an effect on testosterone and cortisol are  highly sensitive to data analytic specifications. We encourage researchers to routinely explore the influence of data analytic choices on statistical inferences and also encourage editors and  reviewers to require explicit examinations of the influence of alternative data analytic  specifications on the inferences that are drawn from data.

Dana Carney, the first author of the has now posted an explanation why she no longer believes the originally reported findings are genuine and why “the evidence against the existence of power poses is undeniable.” She discloses a number of important confounds and important “researcher degrees of freedom in the analyses reported in the published paper.

Coming Up Next

A different view of the Amy Cuddy’s Ted talk in terms of its selling of pseudoscience to consumers and its acknowledgment of a strong debt to Cuddy’s adviser Susan Fiske.

A disclosure of some of the financial interests that distort discussion of the scientific flaws of the power pose.

How the reflexive response of the replicationados inadvertently reinforced the illusion that the original pose study provided meaningful effect sizes.

How Amy Cuddy and her allies marshalled the resources of the Association for Psychological Science to vilify and intimidate critics of bad science and of the exploitation of consumers by psychological pseudoscience.

How journalists played into this vilification.

What needs to be done to avoid a future fiasco for psychology like the power pose phenomenon and protect reformers of the dissemination of science.

Note: Time to reiterate that all opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Creating the illusion that mindfulness improves the survival of cancer patients

  • A demonstration of just how unreliable investigators’ reports of mindfulness studies can be.
  • Exaggerations of efficacy combined with self-contradiction in the mindfulness literature pose problems for any sense being made of the available evidence by patients, clinicians, and those having responsibility for clinical and public policy decisions.

Despite thousands of studies, mindfulness-based stress reduction (MBSR) and related meditation approaches have not yet been shown  to be more efficacious than other active treatments for reducing stress. Nonetheless many cancer patients seek MBSR or mindfulness-based cancer recovery (MBCR) believing that they are improving their immune system and are on their way to a better outcome in “fighting” their cancer.

UVa Cancer Center
UVa Cancer Center

This unproven claim leads many cancer patients to integrative cancer centers. Once patients begin receiving treatment at these centers, they are offered a variety of other services that can be expensive, despite being unproven or having been proven ineffective. Services provided by integrative cancers treatments can discourage patients from seeking conventional treatments that are more effective, but that come with serious side effects and disfigurement. Moreover, integrative treatments give false hope to patients who would otherwise accept the limits of treatments for cancer and come to terms with their own mortality. And integrative treatments can lead to patients blaming themselves when they do not benefit.

Mindfulness studies keep being added to the literature, often in quality journals, that cultivate these illusions of vulnerable cancer patients. This psychoneuroimmunology(PNI) literature is self-perpetuating in its false claims, exaggerations, and spin. The literature ignores some basic findings:

  1. Psychotherapy and support groups have not been shown to improve the survival of cancer patients.
  2. The contribution of stress to the onset, progression, and outcome of cancer is likely to be minimal, if at all.
  3. Effects of psychological interventions like MBSR/MBCR on the immune system are weak or nonexistent, and the clinical significance of any effects is not established.

Evidence-based oncologists and endocrinologists would not take seriously the claims regularly appearing in the PNI literature. Such clinician-scientists would find bizarre many of the supposed mechanisms by which MBCR supposedly affects cancer. Yet, investigators create the illusion of accumulating evidence, undaunted by negative findings and the lack of plausible mechanisms by which MBCR could conceivably influence basic disease processes in cancer.

This blog post debunks a study by one of the leading proponents of MBCR for cancer patients, showing how exaggerated and outright false claims are created and amplified across publications.

Responsible scientists and health care providers should dispel myths that patients may have about the effectiveness of psychosocial treatments in extending life. But in the absence of responsible professionals speaking out, patients can be intimidated by how these studies are headlined in the popular media, particularly when they believe that they are dealing with expert opinion based on peer-reviewed studies.

Mindfulness-based cancer recovery (MBCR)

The primary report for study was published in the prestigious Journal of Clinical Oncology and is available as a downloadable PDF 

Carlson LE, Doll R, Stephen J, Faris P, Tamagawa R, Drysdale E, Speca M. Randomized controlled trial of mindfulness-based cancer recovery versus supportive expressive group therapy for distressed survivors of breast cancer (MINDSET).  Journal of Clinical Oncology. 2013 Aug 5:JCO-2012.

The authors compared the efficacy of what they describe as“two empirically supported group interventions to help distressed survivors of breast cancer”: mindfulness-based cancer recovery (MBCR) and supportive-expressive group therapy (SET). Each of these active treatments was delivered in 8 weekly 90 minute sessions plus a six-hour workshop. A six-hour, one day didactic seminar served as the comparison/control condition.

The 271 participants were Stage I, II, or III breast patients who had completed all cancer treatment a mean of two years ago. Patients also had to meet a minimal level of distress and not have a psychiatric diagnosis.  Use of psychotropic medication was not an exclusion, because of the high prevalence of antidepressants and anxiolytics in this population.

One hundred thirteen patients were randomized to MBSR, 104 to SET, and 54 to the didactic seminar control group.

A full range of self-measures was collected, along with saliva samples at four times (awakening, noon,  5 PM, and bedtime) over three days. The trial registration for this study is basically useless. It is lacking in basic detail.  Rather than declaring one or maybe two outcomes as primary, the authors specify broad classes – mood, stress, post-traumatic growth, social support, quality of life, spirituality and cortisol levels (stress hormone). Yet,

A later report  states that ”The sample size estimate was based on the primary outcome measure (POM TMD)” – Profile of Mood Total Mood score. The saliva collection was geared to assessing cortisol, although in such studies saliva can provide a full range of biological variables, including immune function.

Why bring up the lack of registration and multiple outcome measures?

The combination of a vague trial registration and multiple outcome measures allows investigators considerable flexibility in which outcome they pick. They can wait to make a choice until after results are known, but that is considered a questionable research practice. The collection of saliva was obviously geared to assessing saliva cortisol. However, a recent comprehensive review  of salivary diurnal cortisol as an outcome measure at least three parameters (the cortisol awakening response, diurnal slope an area under the curve), each reflecting different aspects of hypothalamus pituitary adrenal) HPA axis function.

So, the authors have a lot of options from which to choose data points and analyses best suggesting that MBCR is effective.

Results

Cortisol-levels-400x210Modest effects on POMS TMS disappeared in corrected pairwise comparisons between MBSR in SET. So, according to the evidence presented, mood was not improved.

Baseline cortisol data were only available for 242 patients, and only 172 had data for post intervention slopes. Uncorrected group differences in cortisol slope across the day are not reported. However, when cancer severity number of cigarettes smoked per day, and sleep quality were entered as control variables, a group X time difference was found (p< .009).

We should beware of studies that do not present uncorrected group differences, but depend on only data adjusted for covariates, the appropriateness of which is not established.

But going further, there was no difference between MBCR and SEM. Actually, any difference between these two groups and the control with due to an unexpected increase in the control group slope while the patients in the MBCR and SEM remained unchanged. I can’t see how this would have been predicted. The assumption guiding the study had been that cortisol slope should decrease one or both of the active intervention groups.

The authors searched for more positive findings from cortisol and found:

There were no significant group x time interaction effects for cortisol concentrations at any single collection point, but a time x group contrast between MBCR and SMS was significant for bedtime cortisol concentrations (P =.044; Table 3), which were elevated after SMS (mean change, 0.11) but slightly decreased after MBCR (mean change,=0.02; Fig 2D).

These are weak findings revealed by a post hoc search of a number of different cortisol measures. Aside from the analysis been post-hoc,  I would not place much confidence in a cherry-picked p = .044.

How the authors discuss the results

 Ignoring the null results for the primary measure, the Profile of Mood States Total Score (POM TS), the authors jump to secondary outcomes to proclaim the greater effectiveness of MBCR:

As predicted, MBCR emerged as superior for decreasing symptoms of stress and also for improving overall quality of life and social support in these women, even though we hypothesized that SET might be superior on social support. Improvements were clinically meaningful and similar to those reported in our previous work with mixed groups of patients with cancer.

Keep in mind the disappointing result for cortisol profiles when reading their closing claims for “significantly altered” cortisol:

Cortisol profiles were significantly altered after program completion. Participants in both MBCR and SET maintained the initial steepness of cortisol slopes, whereas SMS participants evidenced increasingly flatter diurnal cortisol slopes, with a medium between-group effect size. Hence, the two interventions buffered unfavorable biologic changes that may occur without active psychosocial intervention. Because abnormal or flattened cortisol profiles have been related to both poorer psychological functioning and shorter survival time in breast,16,17,45,46 lung,47 and renal cell48 carcinoma, this finding may point to the potential for these psychosocial interventions to improve biologic processes related to both patient-reported outcomes and more objective indices. More work is needed to fully understand the clinical meaning of these parameters in primary breast cancer.

The authors set out to demonstrate this psychological interventions decreased cortisol slopes and found no evidence that they did. However, they seized on the finding of increasingly flatter cortisol slopes in the control group. But all these breast cancer patients are receiving MBCR and SET two years after their cancer treatment ended. For most patients, distress levels have receded by then to what they were before cancer was detected. One has to ask the authors if they are taking seriously this continuing decline in cortisol slopes, where are cortisol levels heading?  And when did the decline start?

I attach no credibility to the authors’ claims unless they provide us with an understanding of how they occurred. Do the authors assume they have an odd group of patients who have been declining since diagnosis or maybe since the end of active cancer treatment, but have somehow ended up at the same level of cortisol as the other patients in the sample? There was, you know, random assignment and, there were no baseline differences at the start of this study.

The attempt to relate their findings to shorter survival time in a variety of cancers is dodgy and irresponsible. Their overview of the literature is highly selective, depends on small samples, and there is no evidence that the alleged flattened cortisol profiles are causes rather than being an effect of disease parameters associated with shorter survival.

The authors have not demonstrated an effect of their psychological interventions on survival. No previous study ever has.

Interestingly, a classic small study by Spiegel prompted a whole line of research in which an effect of psychological intervention on survival was sought. However, a careful look at the graphs in his original study reveals that the survival curves for the patients receiving the intervention approximated with other patients with advanced breast cancer in the larger community in the absence of intervention. Compared to the large population from which they were drawn, the patients receiving the intervention in Spiegel’s study were no better off.

survival curve-page-0In the contrast, there were unexplainable deaths in Spiegel’s  control group that generated the illusion that his intervention was increasing survival. Given how small his control group was (39 patients at the outset), it only took the sudden death of four patients in the control group to create an effect where previously there was none. So, it is not that psychotherapy extended survival,  but that a small cluster of patients in the control group died suddenly, years after randomization. Go figure, but keep in mind that the study was never designed to test the effects of psychological intervention on survival. That hypothesis was generated after data were available and Spiegel claimed surprise that they were positive findings.

Spiegel himself has never been able to replicate this finding. You can read more about this study here.

From Hilda Bastian
From Hilda Bastian

The present authors did not identify survival has a primary outcome for the trial, nor did they assess it. They are essentially depending on spun data that assumes cortisol slope not just as a biological variable, but a surrogate for survival. See a blog post by Hilda Bastian’s Statistically funny: Biomarkers Unlimited: Accept Only OUR Substitutes!  for an explanation of why this is sheer folly. Too many promising medical treatments for cancer have been accepted as efficacious on the basis of surrogate outcomes, only to be later shown to have no effect on survival. But these psychological treatments are not even in the running.

This is the kind of nonsense that encourages cancer patients to continue with the false hope that mindfulness-based treatment will extend their lives.

1681869-slide-aforanimation-sitonmyface-1The fish gets bigger with each telling.

A follow up paper  makes stronger claims and makes new claims of telomere length,  the clinical implications of which the authors ultimately concede they don’t understand.

Carlson LE, Beattie TL, Giese‐Davis J, Faris P, Tamagawa R, Fick LJ, Degelman ES, Speca M. Mindfulness‐based cancer recovery and supportive‐expressive therapy maintain telomere length relative to controls in distressed breast cancer survivors. Cancer. 2015 Feb 1;121(3):476-84.

The authors opening summary of their previously reported results we have been discussing:

We recently reported primary outcomes of the MINDSET trial, which compared 2 empirically supported psychosocial group interventions, mindfulness-based cancer recovery (MBCR) and supportive-expressive group therapy (SET), with a minimal-intervention control condition on mood, stress symptoms, quality of life, social support, and diurnal salivary cortisol in distressed breast cancer survivors.[4] Although MBCR participation resulted in the most psychosocial benefit, including improvements across a range of psychosocial outcomes, both MBCR and SET resulted in healthier cortisol profiles over time compared with the control condition.

Endocrinologists would scratch their heads and laugh at the claim that intervention resulted in “healthier cortisol profiles.” There is a wide range of cortisol values in the general population, and these are well within the normal range. The idea that they are somehow “healthier” is as bogus as claims made for super foods and supplements. You have to ask, “healthier” in what sense?

In this secondary analysis of MINDSET trial data, we collected and stored blood samples taken from a subset of women to further investigate the effects of these interventions on potentially important biomarkers. Telomeres are specialized nucleoprotein complexes that form the protective ends of linear chromosomes and provide genomic stability through several mechanisms.

The authors justify the study with speculations that stop just short of claiming their intervention increased survival:

Telomere dysfunction and the loss of telomere integrity may result in DNA damage or cell death; when a critically short telomere length (TL) is reached, cells enter senescence and have reduced viability, and chromosomal fusions appear.[6] Shorter TL has been implicated in several disease states, including cardiovascular disease, diabetes, dyskeritosis congenita, aplastic anemia, and idiopathic pulmonary fibrosis.[7] Shorter TL also was found to be predictive of earlier mortality in patients with chronic lymphocytic leukemia,[8] promyelocytic leukemia,[9] and breast cancer.[10-12] However, the relationships between TL and the clinical or pathological features of tumors are still not clearly understood.[13].

They waffle some more and then acknowledge there are few relevant data concerning cancer:

Telomere dysfunction and the loss of telomere integrity may result in DNA damage or cell death; when a critically short telomere length (TL) is reached, cells enter senescence and have reduced viability, and chromosomal fusions appear.[6] Shorter TL has been implicated in several disease states, including cardiovascular disease, diabetes, dyskeritosis congenita, aplastic anemia, and idiopathic pulmonary fibrosis.[7] Shorter TL also was found to be predictive of earlier mortality in patients with chronic lymphocytic leukemia,[8] promyelocytic leukemia,[9] and breast cancer.[10-12] However, the relationships between TL and the clinical or pathological features of tumors are still not clearly understood.[13].

Too small a sample to find anything clinically significant and generalizable

Correlational studies of telomere length and disease require very large samples. These epidemiologic findings in no way encourage anticipating finding effects in a modest sized trial of a psychological intervention. Moreover, significant results from smaller studies exaggerate associations because they have to be larger to be statistically significant.  They not be expected to replicate in a larger study. The authors’ sample has shrunk considerably from recruitment and randomization to a sample of women provided two blood samples with which they hope to find differences among two interventions and one control group.

Due to the availability of resources, blood samples were only collected in Calgary. Of the 128 women in Calgary, 5 declined to donate their blood. Thirty-one women provided their blood only at the preintervention time period; therefore, the current study included 92 women who donated a blood sample before and after the intervention.

Not surprisingly, no differences between groups were found, but that inspires some creativity in analysis.

The results of ANCOVA demonstrated no statistical evidence of differences in postintervention TL between the MBCR and SET interventions after adjusting the impact of the preintervention log10 T/S ratios. The mean difference was −0.12 (95% confidence interval [95% CI], −0.74 to 0.50). Because the 2 interventions shared similar nonspecific components and no significant differences emerged in their baseline-adjusted postintervention T/S ratios, the 2 intervention groups were subsequently combined to allow greater power for detecting any effects on TL related to participation in a psychosocial intervention compared with the control condition.

The authors initially claimed that MBCR and SET were so different that an expensive large scale RCT was justified. Earlier in the present paper they claimed MBCR was superior. But now they are claiming there is so little difference between  treatments that a post hoc combining is justified to see if null findings can be overturned.

Their tortured post hoc analyses revealed a tiny effect that they fail to acknowledge was nonsignificat – confidence intervals (-0.01 to 1.35)  include 0:

After adjustment for the baseline log10 T/S ratio, there was a statistical trend toward a difference in posttreatment log10 T/S ratios between treatment and control subjects (statistics shown in Table 2). The adjusted mean difference was 0.67 (95% CI, -0.01 to 1.35). The effect size of g2 was 0.043 (small to medium).

There was no association between psychological outcomes and telomere length. Yet differences would be expected if interventions targeting psychological variables somehow influenced telomere length.

Nonetheless, the authors concluded they had a pattern in the results of the primary and secondary studies encouraging more research:

Together, these changes suggest an effect of the interventions on potentially important biomarkers of psychosocial stress. Given the increasingly well-documented association between TL and cancer initiation46 and survival,47 this finding adds to the literature supporting the potential for stress-reducing interventions to impact important disease-regulating processes and ultimately disease outcome.

They end with a call for bigger, more expensive studies, even if they cannot understand what is going on (or for that matter, whether anything of interest occurred in their study):

Future investigators should power studies of intervention effects on TL and telomerase as primary outcomes, and follow participants over time to better understand the clinical implications of group differences. The interpretation of any changes in TL in patients with breast cancer is difficult. One study that analyzed TL in breast tumor tissue found no relations between TL and any clinical or pathological features or disease or survival outcomes,13 whereas other studies have shown that TL was related to breast cancer risk46,51 and survival.10,46,47 Although interpretation remains difficult,the results of the current study nonetheless provide provocative new data that suggest it is possible to influence TL in cancer survivors through the use of psychosocial interventions involving group support, emotional expression, stress reduction, and mindfulness meditation.

This is not serious research. At the outset, the authors had to know that the sample was much too small and there been too much nonrandom attrition to make robust and generalizable conclusions concerning effects on telomere length. And the authors knew ahead of time, they had no idea how they would interpret such effect. But they didn’t find them. They delivered an intervention, administered questionnaires, took spit and blood samples, but this is not “research” in which they were willing to concede hypotheses were confirmed, this is an experimercial for mindfulness programs.

exaggeration-300x290But the power of MBCR gets even greater with yet another telling

A recent review:

Carlson LE. Mindfulness‐based interventions for coping with cancer. Annals of the New York Academy of Sciences. 2016 Mar

One of the authors of the two articles we have been discussing uses them as the main basis for even stronger claims and about MBCR specifically.

Our adaptation, mindfulness-based cancer recovery (MBCR), has resulted in improvements across a range of psychological and biological outcomes, including cortisol slopes, blood pressure, and telomere length, in various groups of cancer survivors.

Wow! Specifically,

Overall, women in the MBCR group showed more improvement on stress symptoms compared with women in both the SET and control groups, on QOL compared with the control group, and in social support compared with the SET group,[28] but both active-intervention groups’ cortisol slopes (a marker of stress responding) were maintained over time relative to the control group, whose cortisol slopes became flatter. Steeper slopes are generally considered to be healthier. The two intervention groups also maintained their telomere length, a potentially important marker of cell aging, over time compared to controls,

But wait! The superiority of MBCR gets even better with a follow-up study.

The publication of long term follow up data become the occasion for describing the superiority of MBCR over SET as ever greater.

Carlson LE, Tamagawa R, Stephen J, Drysdale E, Zhong L, Speca M. Randomized‐controlled trial of mindfulness‐based cancer recovery versus supportive expressive group therapy among distressed breast cancer survivors (MINDSET): long‐term follow‐up results. Psycho‐Oncology. 2016 Jan 1.

 The abstract describes the outcomes at the end of the intervention:

Immediately following the intervention, women in MBCR reported greater reduction in mood disturbance (primarily fatigue, anxiety and confusion) and stress symptoms including tension, sympathetic arousal and cognitive symptoms than those in SET. They also reported increased emotional and functional quality of life, emotional, affective and positive social support, spirituality (feelings of peace and meaning in life) and post-traumatic growth (appreciation for life and ability to see new possibilities) relative to those in SET, who also improved to a lesser degree on many outcomes.

A search for “cortisol” in this report finds it is never mentioned.

The methods section clarifies that the 54 women in the seminar control group were offered randomization to the two active treatments and 35 accepted, with 21 going to MBCR and 14 to SET. However, 8 of the women newly assigned to MBCR and 9 of the women newly assigned to SET did not provide post-intervention data.  The authors nonetheless used two-level piecewise hierarchical linear modelling (HLM) with random intercepts for intent-to-treat analyses for the full sample. The authors acknowledge a high attrition rate of over half of the patients being lost to follow up, but argue these hierarchical analyses were a solution. While this is often done, the analyses assumes attrition is random and validity is vulnerable to such high rates of attrition. I don’t know why a reviewer did not object to the analyses or the strong conclusions drawn from them.

Recognize what is being done here: the authors are including a small amount of new data in analyses, but with so much attrition by the end of treatment  that analyses depend more on estimating from data available from a minority of patients to what the authors claim would be obtained if the full sample were involved. This is statistically dodgy, but apparently acceptable to the stats editor of this journal. What the authors did is not considered fraud, but it is making up data.

The follow up study concludes:

In sum, these results represent the first demonstration in a comparative effectiveness approach that MBCR is superior to another active intervention, SET, which also showed lesser benefit to distressed survivors of breast cancer. Our previous report also showed that MBCR was superior to a minimal intervention control condition pre-intervention to post-intervention. Benefits were accrued across outcomes measuring stress, mood, quality of life and PTG, painting a picture of women who were more able to cope with cancer survivorship and to fully embrace and enjoy life.

I pity the poor detached investigator attempting to use these data in a meta-analysis. Do they go with the original, essentially null results, or do they rely on these voodoo statistics that post-hoc give a better picture. They would have to write to the authors anyway, because on corrected results are presented in the paper.

This is not science, it is promotion of a treatment by enthusiastic proponents who are strongly committed to demonstrating that the treatment is superior to alternatives, in defiance of contradictory data they have generated.

Terribly disappointing, but this effort is actually better than much of the studies of mindfulness for cancer patients. It is a randomized trial, and started with a reasonably large sample, even if it has substantial attrition – i.e., most patients were lost to follow-up.

For those you who have actually read this longread blog post from start to finish, would you have expected this kind of background if you’d only stumbled upon the authors’ glowing praise of their own work in the prestigious Annals of New York Academy of Sciences? I don’t think so.

Dammit! It shouldn’t be so hard to figure out what went on studies. We should be able to depend on authors to provide more transparent, consistent reports of the results they obtain. While mindfulness research has no monopoly on such contrary practices, it is exceptionally rich with exaggerated and even false claims and suppression of evidence to the contrary. Consumers be very skeptical of what they read!

Let’s get more independent re-evaluations of the claims made by promoters of mindfulness by those who don’t profit professionally or financially from exaggerating benefits. And please, clinicians, start dispelling the myths of cancer patients who think that they are obtaining effects on their disease from practicing mindfulness.

For further discussion, see Mindfulness-based stress reduction for improving sleep among cancer patients: A disappointing look.