Power pose: II. Could early career investigators participating in replication initiatives hurt their advancement?

Participation in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

mind the brain logo

This is the second  blog post concerning the special issue of Comprehensive Results in Social Psychology  devoted to replicating Amy Cuddy’s original power pose study in Psychological Science.

Some things for early career investigators to think about.

Participating in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

I have long argued that there should be better incentives for early career (as well as more senior) investigators (ECRs) participating in efforts to improve the trustworthiness of science.

ECRs should be encouraged -and expected- to engage in post publication peer review and PubPeer and PubMed Commons in which to develop way in which such activity can be listed on the CV.

The Pottery Barn rule should be extended so that ECRs can publish critical commentaries in the journals that publish the original flawed papers. Retraction notices should indicate whose complaints led to the retraction.

Rather than being pressured to publish more underpowered, under-resourced studies, ECRs should be rewarded for research parasite activity. They should be assisted in obtaining data sets from already published studies. With that data, they should conduct exploratory, secondary analyses aimed at understanding what went wrong in larger-scale studies that left them methodologically compromised and with shortfalls in recruitment.

But I wonder if we should counsel ECRs that participating in a multisite replication initiatives like the one directed at the power pose effect might not contribute to the career advancement and may even hurt it.

MturkI’ve been critical of the value of replication initiatives  as the primary means of addressing the own trustworthiness of psychology, particularly in areas with claims of clinical and public health relevance. To add to the other reservations I have, I can point that the necessary economy and efficiency of reliance on MTurk and other massive administrations of experimental manipulations can force the efforts to improve the trustworthiness of psychology into less socially significant and may be less representative areas.

I certainly wouldn’t penalize an early career investigator for involvement in a multisite replication. I appreciate there is room for disagreement with my skepticism about the value of such initiatives. I would recognize the expression of valuation a better research practices that involvement would represent.

But I think early career investigator’s need to consider that some senior investigators and members of hiring and promotion committees (HPCs) might give a low rating of publications coming from such initiatives in judging the candidates potential for original, creative, risk-taking research. That might be even if these committee members appreciate the need to improve the trustworthiness of psychology.

Here are some conceivable comments that could be made in such a committee’s deliberations.

“Why did this candidate get involved in a modest scale study so focused on two saliva assessments of cortisol? Even if it is not their area of expertise, shouldn’t they have consulted the literature and saw how uninformative a pair of assessments of cortisol are, given the well-known problems with cortisol of intra-individual and inter-individual variation in sensitivity to uncontrolled contextual variables?…They should have powered their study to find cortisol differences amidst all the noise.”

“Were they unaware that testosterone levels differ between men and women by a factor of five or six? How do they expect that discontinuity in distributions to be overcome in any statistical analyses combining men and women? What basis was there in the literature suggests that a brief, seemingly trivial manipulation of posture with have such enduring effects on hormones? Why they specifically anticipate differences would be registered in women? Overall, their involvement in this initiative demonstrates a willingness to commit considerable time and resources to ideas that could have been ruled out by a search of the relevant literature.”

lemmengiphy.gif

“There seems to be a lemming quality to this large group of researchers pursuing some bad hypotheses with inappropriate methods. Why didn’t this investigator have the independence of mind to object? Can we expect a similar going with the herd after fashionable topics in research over the next few years?”

“While I appreciate the motivation of this investigator, I believe there was a violation of the basic principle of ‘stop and think before you undertake a study’ that does not bode well for how they will spend their time when faced with the demands of teaching and administration as well as doing research.”

Readers may think that these comments represent horrible, cruel sentiments and would be a great injustice if they influence hiring and decisions. But anyone who is ever been on a hiring and promotion committee knows that they are full of such horrible comments and that such processes are not fair or just or even rational.

 

 

 

Power pose: I. Demonstrating that replication initiatives won’t salvage the trustworthiness of psychology

An ambitious multisite initiative showcases how inefficient and ineffective replication is in correcting bad science.

 

mind the brain logo

Bad publication practices keep good scientists unnecessarily busy, as in replicability projects.- Bjoern Brembs

Power-PoseAn ambitious multisite initiative showcases how inefficient and ineffective replication is in correcting bad science. Psychologists need to reconsider pitfalls of an exclusive reliance on this strategy to improve lay persons’ trust in their field.

Despite the consistency of null findings across seven attempted replications of the original power pose study, editorial commentaries in Comprehensive Results in Social Psychology left some claims intact and called for further research.

Editorial commentaries on the seven null studies set the stage for continued marketing of self-help products, mainly to women, grounded in junk psychological pseudoscience.

Watch for repackaging and rebranding in next year’s new and improved model. Marketing campaigns will undoubtedly include direct quotes from the commentaries as endorsements.

We need to re-examine basic assumptions behind replication initiatives. Currently, these efforts  suffer from prioritizing of the reputations and egos of those misusing psychological science to market junk and quack claims versus protecting the consumers whom these gurus target.

In the absence of a critical response from within the profession to these persons prominently identifying themselves as psychologists, it is inevitable that the void be filled from those outside the field who have no investment in preserving the image of psychology research.

In the case of power posing, watchdog critics might be recruited from:

Consumer advocates concerned about just another effort to defraud consumers.

Science-based skeptics who see in the marketing of the power posing familiar quackery in the same category as hawkers using pseudoscience to promote homeopathy, acupuncture, and detox supplements.

Feminists who decry the message that women need to get some balls (testosterone) if they want to compete with men and overcome gender disparities in pay. Feminists should be further outraged by the marketing of junk science to vulnerable women with an ugly message of self-blame: It is so easy to meet and overcome social inequalities that they have only themselves to blame if they do not do so by power posing.

As reported in Comprehensive Results in Social Psychology,  a coordinated effort to examine the replicability of results reported in Psychological Science concerning power posing left the phenomenon a candidate for future research.

I will be blogging more about that later, but for now let’s look at a commentary from three of the over 20 authors get reveals an inherent limitation to such ambitious initiatives in tackling the untrustworthiness of psychology.

Cesario J, Jonas KJ, Carney DR. CRSP special issue on power poses: what was the point and what did we learn?.  Comprehensive Results in Social Psychology. 2017

 

Let’s start with the wrap up:

The very costly expense (in terms of time, money, and effort) required to chip away at published effects, needed to attain a “critical mass” of evidence given current publishing and statistical standards, is a highly inefficient use of resources in psychological science. Of course, science is to advance incrementally, but it should do so efficiently if possible. One cannot help but wonder whether the field would look different today had peer-reviewed preregistration been widely implemented a decade ago.

 We should consider the first sentence with some recognition of just how much untrustworthy psychological science is out there. Must we mobilize similar resources in every instance or can we develop some criteria to decide what is on worthy of replication? As I have argued previously, there are excellent reasons for deciding that the original power pose study could not contribute a credible effect size to the literature. There is no there to replicate.

The authors assume preregistration of the power pose study would have solved problems. In clinical and health psychology, long-standing recommendations to preregister trials are acquiring new urgency. But the record is that motivated researchers routinely ignore requirements to preregister and ignore the primary outcomes and analytic plans to which they have committed themselves. Editors and journals let them get away with it.

What measures do the replicationados have to ensure the same things are not being said about bad psychological science a decade from now? Rather than urging uniform adoption and enforcement of preregistration, replicationados urged the gentle nudge of badges for studies which are preregistered.

Just prior to the last passage:

Moreover, it is obvious that the researchers contributing to this special issue framed their research as a productive and generative enterprise, not one designed to destroy or undermine past research. We are compelled to make this point given the tendency for researchers to react to failed replications by maligning the intentions or integrity of those researchers who fail to support past research, as though the desires of the researchers are fully responsible for the outcome of the research.

There are multiple reasons not to give the authors of the power pose paper such a break. There is abundant evidence of undeclared conflicts of interest in the huge financial rewards for publishing false and outrageous claims. Psychological Science about the abstract of the original paper to leave out any embarrassing details of the study design and results and end with a marketing slogan:

That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

 Then the Association for Psychological Science gave a boost to the marketing of this junk science with a Rising Star Award to two of the authors of this paper for having “already made great advancements in science.”

As seen in this special issue of Comprehensive Results in Social Psychology, the replicationados share responsibility with Psychological Science and APS for keeping keep this system of perverse incentives intact. At least they are guaranteeing plenty of junk science in the pipeline to replicate.

But in the next installment on power posing I will raise the question of whether early career researchers are hurting their prospects for advancement by getting involved in such efforts.

How many replicationados does it take to change a lightbulb? Who knows, but a multisite initiative can be combined with a Bayesian meta-analysis to give a tentative and unsatisfying answer.

Coyne JC. Replication initiatives will not salvage the trustworthiness of psychology. BMC Psychology. 2016 May 31;4(1):28.

The following can be interpreted as a declaration of financial interests or a sales pitch:

eBook_PositivePsychology_345x550I will soon be offering e-books providing skeptical looks at positive psychology and mindfulness, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

 Sign up at my website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

 

Power Poseur: The lure of lucrative pseudoscience and the crisis of untrustworthiness of psychology

This is the second of two segments of Mind the Brain aimed at redirecting the conversation concerning power posing to the importance of conflicts of interest in promoting and protecting its scientific status. 

The market value of many lines of products offered to consumers depends on their claims of being “science-based”. Products from psychologists that invoke wondrous mind-body or brain-behavior connections are particularly attractive. My colleagues and I have repeatedly scrutinized such claims, sometimes reanalyzing the original data, and consistently find the claims false or premature and exaggerated.

There is so little risk and so much money and fame to be gained in promoting questionable and even junk psychological science to lay audiences. Professional organizations confer celebrity status on psychologists who succeed, provide them with forums and free publicity that enhance their credibility, and protect their claims of being “science-based” from critics.

How much money academics make from popular books, corporate talks, and workshops and how much media attention they garner serve as alternative criteria for a successful career, sometimes seeming to be valued more than the traditional ones of quality and quantity of publications and the amount of grant funding obtained.

Efforts to improve the trustworthiness of what psychologists publish in peer-reviewed have no parallel in any efforts to improve the accuracy of what psychologists say to the public outside of the scientific literature.

By the following reasoning, there may be limits to how much the former efforts at reform can succeed without the latter. In the hypercompetitive marketplace, only the most dramatic claims gain attention. Seldom are the results of rigorously done, transparently reported scientific work sufficiently strong and  unambiguous enough to back up the claims with the broadest appeal, especially in psychology. Psychologists who remain in academic setting but want to sell market their merchandise to consumers face a dilemma: How much do they have to hype and distort their findings in peer-reviewed journals to fit with what they say to the public?

It important for readers of scientific articles to know that authors are engaged in these outside activities and have pressure to obtain particular results. The temptation of being able to make bold claims clash with the requirements to conduct solid science and report results transparently and completely. Let readers decide if this matters for their receptivity to what authors say in peer-reviewed articles by having information available to them. But almost never is a conflict of interest declared. Just search articles in Psychological Science and see if you can find a single declaration of a COI, even when the authors have booking agents and give high priced corporate talks and seminars.

The discussion of the quality of science backing power posing should have been shorter.

Up until now, much attention to power posing in academic circles has been devoted to the quality of the science behind it, whether results can be independently replicated, and whether critics have behaved badly. The last segment of Mind the Brain examined the faulty science of the original power posing paper in Psychological Science and showed why it could not contribute a credible effect size to the literature.

The discussion of the science behind power posing should have been much shorter and should have reached a definitive conclusion: the original power posing paper should never have been published in Psychological Science. Once the paper had been published, a succession of editors failed in their expanded Pottery-Barn responsibility to publish critiques by Steven J. Stanton  and by Marcus Crede and Leigh A. Phillips that were quite reasonable in their substance and tone. As is almost always the case, bad science was accorded an incumbent advantage once it was published. Any disparagement or criticism of this paper would be held by editors to strict and even impossibly high standards if it were to be published. Let’s review the bad science uncovered in the last blog. Readers who are familiar with that post can skip to the next section.

A brief unvarnished summary of the bad science of the original power posing paper has a biobehavioral intervention study

Reviewers of the original paper should have balked at the uninformative and inaccurate abstract. Minimally, readers need to know at the outset that there were only 42 participants (26 females and 16 males) in the study comparing high power versus low-power poses. Studies with so few participants cannot be expected to provide reproducible effect sizes. Furthermore, there is no basis for claiming that results held for both men and women because that claim depended on analyses with even smaller numbers. Note the 16 males were distributed in some unknown way across the two conditions. If power is fixed by the smaller cell size, even the optimal 8 males/cell is well below contributing an effect size. Any apparent significant effects in this study are likely to be meaning imposed on noise.

The end sentence in the abstract is an outrageously untrue statement of results. Yet, as we will see, it served as the basis of a product launch worth in the seven-figure range that was already taking shape:

That a person can, by assuming two simple 1-minute poses, embody power and instantly become more powerful has real-world, actionable implications.

Aside from the small sample size, as an author, editor and critic for in clinical and health psychology for over 40 years, I greet a claim of ‘real-world actionable implications’ from two one-minute manipulations of participants’ posture with extreme skepticism. My skepticism grows as we delve into the details of the study.

Investigators’ collecting a single pair of pre-post assessments of salivary cortisol is at best a meaningless ritual, and can contribute nothing to understanding what is going on in the study at a hormonal level.

Men in this age range of participants in this study have six times more testosterone than women. Statistical “control” of testosterone by controlling for gender is a meaningless gesture producing uninterpretable results. Controlling for baseline testosterone in analyses of cortisol and vice versa eliminates any faint signal in the loud noise of the hormonal data.

Although it was intended as a manipulation check (and subsequently as claimed as evidence of the effect of power posing on feelings),  the crude subjective self-report ratings of how “powerful” and “in charge” on a 1-4 scale could simply communicate the experimenters’ expectancies to participants. Endorsing whether they felt more powerful indicated how smart participants were and if they were go along with the purpose of the study. Inferences beyond that uninteresting finding require external validation.

In clinical and health psychology trials, we are quite wary of simple subjective self-report analogue scales, particularly when there is poor control of the unblinded experimenters’ behavior and what they communicate to participants.

The gambling task lacks external validation. Low stakes could simply reduce it to another communication of experimenters’ expectancies. Note that the saliva assessments were obtained after completion of the task and if there is any confidence left in the assessments of hormones, this is an important confound.

The unblinded experimenters’ physically placing participants in either 2 1-minute high power or 2 1-minute low-power poses is a weird, unvalidated experimental manipulation that could not have the anticipated effects on hormonal levels. Neither high- nor low-power poses are credible, but the hypothesis particularly strains credibility that they low-power pose would actually raise cortisol, if cortisol assessments in the study had any meaning at all.

Analyses were not accurately described, and statistical controls of any kind with such a small sample  are likely to add to spurious findings. The statistical controls in this study were particularly inappropriate and there is evidence of the investigators choosing the analyses to present after the results were known.

There is no there there: The original power pose paper did not introduce a credible effect size into the literature.

The published paper cannot introduce a credible effect size into the scientific literature. Power posing may be an interesting and important idea that deserves careful scientific study but if any future study of the idea would be “first ever,” not a replication of the  Psychological Science article. The two commentaries that were blocked from publication in Psychological Science but published elsewhere amplify any dismissal of the paper, but we are already well over the top. But then there is the extraordinary repudiation of the paper by the first author and her exposure of the exploitation of investigator degrees of freedom and outright p-hacking.  How many stakes do you have to plunge into the heart of a vampire idea?

Product launch

 Even before the power posing article appeared in Psychological Science, Amy Cuddy was promoting it at Harvard, first  in Power Posing: Fake It Until You Make It  in Harvard Business School’s Working Knowledge: Business Research for Business Leaders. Shortly afterwards was the redundant but elaborated article in Harvard Magazine, subtitled Amy Cuddy probes snap judgments, warm feelings, and how to become an “alpha dog.”

Amy Cuddy is the middle author on the actual Psychological Science between first author Dana Carney and third author, Dana Carney’s graduate student Andy J Yap. Yet, the Harvard Magazine article lists Cuddy first. The Harvard Magazine article is also noteworthy in unveiling what would grow into Cuddy’s redemptive self narrative, although Susan Fiske’s role as  as the “attachment figure” who nurtures Cuddy’s  realization of her inner potential was only hinted.

QUITE LITERALLY BY ACCIDENT, Cuddy became a psychologist. In high school and in college at the University of Colorado at Boulder, she was a serious ballet dancer who worked as a roller-skating waitress at the celebrated L.A. Diner. But one night, she was riding in a car whose driver fell asleep at 4:00 A.M. while doing 90 miles per hour in Wyoming; the accident landed Cuddy in the hospital with severe head trauma and “diffuse axonal injury,” she says. “It’s hard to predict the outcome after that type of injury, and there’s not much they can do for you.”

Cuddy had to take years off from school and “relearn how to learn,” she explains. “I knew I was gifted–I knew my IQ, and didn’t think it could change. But it went down by two standard deviations after the injury. I worked hard to recover those abilities and studied circles around everyone. I listened to Mozart–I was willing to try anything!” Two years later her IQ was back. And she could dance again.

Yup, leading up to promoting the idea that overcoming circumstances and getting what you want is as simple as adopitng these 2 minutes of  behavioral manipulation.

The last line of the Psychological Science abstract was easily fashioned into the pseudoscientific basis for this ease of changing behavior and outcomes, which now include the success of venture-capital pitches:

 

Tiny changes that people can make can lead to some pretty dramatic outcomes,” Cuddy reports. This is true because changing one’s own mindset sets up a positive feedback loop with the neuroendocrine secretions, and also changes the mindset of others. The success of venture-capital pitches to investors apparently turns, in fact, on nonverbal factors like “how comfortable and charismatic you are.”

Soon, The New York Times columnist David Brooks   placed power posing solidly within the positive thinking product line of positive psychology, even if Cuddy had no need to go out on that circuit: “If you act powerfully, you will begin to think powerfully.”

In 2011, both first author Dana Carney and Amy Cuddy received the Rising Star Award from the Association for Psychological Science (APS) for having “already made great advancements in science” Carney cited her power posing paper as one that she liked. Cuddy didn’t nominate the paper, but reported er recent work examined “how brief nonverbal expressions of competence/power and warmth/connection actually alter the neuroendocrine levels, expressions, and behaviors of the people making the expressions, even when the expressions are “posed.”

The same year, In 2011, Cuddy also appeared at PopTech, which is a”global community of innovators, working together to expand the edge of change” with tickets selling for $2,000. According to an article in The Chronicle of Higher Education :

When her turn came, Cuddy stood on stage in front of a jumbo screen showing Lynda Carter as Wonder Woman while that TV show’s triumphant theme song announced the professor’s arrival (“All the world is waiting for you! And the power you possess!”). After the music stopped, Cuddy proceeded to explain the science of power poses to a room filled with would-be innovators eager to expand the edge of change.

But that performance was just a warm up for Cuddy’s TedGlobal Talk which has now received almost 42 million views.

A Ted Global talk that can serve as a model for all Ted talks: Your body language may shape who you are  

This link takes you not only to Amy Cuddy’s Ted Global talk but to a transcript in 49 different languages

 Amy Cuddy’s TedGlobal Talk is brilliantly crafted and masterfully delivered. It has two key threads. The first thread is what David McAdams has described as an obligatory personal narrative of a redeemed self.  McAdams summarizes the basic structure:

As I move forward in life, many bad things come my way—sin, sickness, abuse, addiction, injustice, poverty, stagnation. But bad things often lead to good outcomes—my suffering is redeemed. Redemption comes to me in the form of atonement, recovery, emancipation, enlightenment, upward social mobility, and/or the actualization of my good inner self. As the plot unfolds, I continue to grow and progress. I bear fruit; I give back; I offer a unique contribution.

This is interwoven with a second thread, the claims of the strong science of power pose derived from the Psychological Science article. Without the science thread, the talk is reduced to a motivational talk of the genre of Oprah Winfrey or Navy Seal Admiral William McRaven Sharing Reasons You Should Make Bed Everyday

It is not clear that we should hold the redeemed self of a Ted Talk to the criteria of historical truth. Does it  really matter whether  Amy Cuddy’s IQ temporarily fell two standard deviations after an auto accident (13:22)? That Cuddy’s “angel adviser Susan Fiske saved her from feeling like an imposter with the pep talk that inspired the “fake it until you make it” theme of power posing (17:03)? That Cuddy similarly transformed the life of her graduate student (18:47) with:

So I was like, “Yes, you are! You are supposed to be here! And tomorrow you’re going to fake it, you’re going to make yourself powerful, and, you know –

This last segment of the Ted talk is best viewed, rather than read in the transcript. It brings Cuddy to tears and the cheering, clapping audience to their feet. And Cuddy wraps up with her takeaway message:

The last thing I’m going to leave you with is this. Tiny tweaks can lead to big changes. So, this is two minutes. Two minutes, two minutes, two minutes. Before you go into the next stressful evaluative situation, for two minutes, try doing this, in the elevator, in a bathroom stall, at your desk behind closed doors. That’s what you want to do. Configure your brain to cope the best in that situation. Get your testosterone up. Get your cortisol down. Don’t leave that situation feeling like, oh, I didn’t show them who I am. Leave that situation feeling like, I really feel like I got to say who I am and show who I am.

So I want to ask you first, you know, both to try power posing, and also I want to ask you to share the science, because this is simple. I don’t have ego involved in this. (Laughter) Give it away. Share it with people, because the people who can use it the most are the ones with no resources and no technology and no status and no power. Give it to them because they can do it in private. They need their bodies, privacy and two minutes, and it can significantly change the outcomes of their life.

Who cares if the story is literal historical truth? Maybe we should not. But I think psychologists should care about the misrepresentation of the study. For that matter, anyone concerned with truth in advertising to consumers. Anyone who believes that consumers have the right to fair and accurate portrayal of science in being offered products, whether anti-aging cream, acupuncture, or self-help merchandise:

Here’s what we find on testosterone. From their baseline when they come in, high-power people experience about a 20-percent increase, and low-power people experience about a 10-percent decrease. So again, two minutes, and you get these changes. Here’s what you get on cortisol. High-power people experience about a 25-percent decrease, and the low-power people experience about a 15-percent increase. So two minutes lead to these hormonal changes that configure your brain to basically be either assertive, confident and comfortable, or really stress-reactive, and feeling sort of shut down. And we’ve all had the feeling, right? So it seems that our nonverbals do govern how we think and feel about ourselves, so it’s not just others, but it’s also ourselves. Also, our bodies change our minds.

Why should we care? Buying into such simple solutions prepares consumers to accept other outrageous claims. It can be a gateway drug for other quack treatments like Harvard psychologist Ellen Langer’s claims that changing mindset can overcome advanced cancer.

Unwarranted claims breaks down the barriers between evidence-based recommendations and nonsense. Such claims discourages consumers from accepting more deliverable promises that evidence-based interventions like psychotherapy can indeed make a difference, but they take work and effort, and effects can be modest. Who would invest time and money in cognitive behavior therapy, when two one-minute self-manipulations can transform lives? Like all unrealistic promises of redemption, such advice may ultimately lead people to blame themselves when they don’t overcome adversity- after all it is so simple  and just a matter of taking charge of your life. Their predicament indicates that they did not take charge or that they are simply losers.

But some consumers can be turned cynical about psychology. Here is a Harvard professor trying to sell them crap advice. Psychology sucks, it is crap.

Conflict of interest: Nothing to declare?

In an interview with The New York Times, Amy Cuddy said: “I don’t care if some people view this research as stupid,” she said. “I feel like it’s my duty to share it.”

Amy Cuddy may have been giving her power pose advice away for free in her Ted Talk, but she already had given it away at the $2,000 a ticket PopTech talk. The book contract for Presence: Bringing Your Boldest Self to Your Biggest Challenges was reportedly for around a million dollars.  And of course, like many academics who leave psychology for schools of management, Cuddy had a booking agency soliciting corporate talks and workshops. With the Ted talk, she could command $40,000-$100,000.

Does this discredit the science of power posing? Not necessarily, but readers should be informed and free to decide for themselves. Certainly, all this money in play might make Cuddy more likely to respond defensively to criticism of her work. If she repudiated this work the way that first author Dana Carey did, would there be a halt to her speaking gigs, a product recall, or refunds issued by Amazon for Presence?

I think it is fair to suggest that there is too much money in play for Cuddy to respond to academic debate.  Maybe things are outside that realm because of these stakes.

The replicationados attempt replications: Was it counterproductive?

 Faced with overwhelming evidence of the untrustworthiness of the psychological literature, some psychologists have organized replication initiatives and accumulated considerable resources for multisite replications. But replication initiatives are insufficient to salvage the untrustworthiness of many areas of psychology, particularly clinical and health psychology intervention studies, and may inadvertently dampen more direct attacks on bad science. Many of those who promote replication initiatives are silent when investigators refused to share data for studies with important clinical and public health implications. They are also silent when journals like Psychological Science fail to publish criticism of papers with blatantly faulty science.

Replication initiatives take time and results are often,but not always ultimately published outside of the journals where a flawed original work was published. But in important unintended consequence of them is they lend credibility to effect sizes that had no validity whatsoever when they occurred in the original papers. In debate attempting to resolve discrepancies between original studies and large scale replications, the original underpowered studies are often granted a more entrenched incumbent advantage.

It should be no surprise that in large-scale attempted  replication,  Ranehill , Dreber, Johannesson, Leiberg, Sul , and Weber failed to replicate the key, nontrivial findings of the original power pose study.

Consistent with the findings of Carney et  al., our results showed a significant effect of power posing on self-reported feelings of power. However, we found no significant effect of power posing on hormonal levels or in any of the three behavioral tasks.

It is also not surprising that Cuddy invoked her I-said-it-first-and-i-was-peer-reviewed incumbent advantage reasserting of her original claim, along with a review of 33 studies including the attempted replication:

The work of Ranehill et al. joins a body of research that includes 33 independent experiments published with a total of 2,521 research participants. Together, these results may help specify when nonverbal expansiveness will and will not cause embodied psychological changes.

Cuddy asserted methodological differences between their study and the attempted Ranehill replication, may have moderated the effects of posing. But no study has shown that putting participants into a power pose affects hormones.

Joe Simmons and Uri Simonsohn and performed a meta analysis of the studies nominated by Cuddy and ultimately published in Psychological Science. Their blog Data Colada succinctly summarized the results:

Consistent with the replication motivating this post, p-curve indicates that either power-posing overall has no effect, or the effect is too small for the existing samples to have meaningfully studied it. Note that there are perfectly benign explanations for this: e.g., labs that run studies that worked wrote them up, labs that run studies that didn’t, didn’t. [5]

While the simplest explanation is that all studied effects are zero, it may be that one or two of them are real (any more and we would see a right-skewed p-curve). However, at this point the evidence for the basic effect seems too fragile to search for moderators or to advocate for people to engage in power posing to better their lives.

Come on, guys, there was never a there there, don’t invent one, but keeping trying to explain it.

It is interesting that none of these three follow up articles in Psychological Science have abstracts, especially in contrast to the original power pose paper that effectively delivered its misleading message in the abstract.

Just as this blog post was being polished, a special issue of Comprehensive Results in Social Psychology (CRSP) on Power Poses was released.

  1. No preregistered tests showed positive effects of expansive poses on any behavioral or hormonal measures. This includes direct replications and extensions.
  2. Surprise: A Bayesian meta-analysis across the studies reveals a credible effect of expansive poses on felt power. (Note that this is described as a ‘manipulation check’ by Cuddy in 2015.) Whether this is anything beyond a demand characteristic and whether it has any positive downstream behavioral effects is unknown.

No, not a surprise, just an uninteresting artifact. But stay tuned for the next model of poser pose dropping the tainted name and focusing on “felt power.” Like rust, commercialization of bad psychological science never really sleeps, it only takes power naps.

Meantime, professional psychological organizations, with their flagship journals and publicity machines need to:

  • Lose their fascination with psychologists whose celebrity status depends on Ted talks and the marketing of dubious advice products grounded in pseudoscience.
  • Embrace and adhere to an expanded Pottery Barn rule that covers not only direct replications, but corrections to bad science that has been published.
  • Make the protection of  consumers from false and exaggerated claims a priority equivalent to the vulnerable reputations of academic psychologists in efforts to improve the trustworthiness of psychology.
  • Require detailed conflicts of interest statements for talks and articles.

All opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Disclosure:

I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

 

 

 

 

Calling out pseudoscience, radically changing the conversation about Amy Cuddy’s power posing paper

Part 1: Reviewed as the clinical trial that it is, the power posing paper should never have been published.

Has too much already been written about Amy Cuddy’s power pose paper? The conversation should not be stopped until its focus shifts and we change our ways of talking about psychological science.

The dominant narrative is now that a junior scientist published an influential paper on power posing and was subject to harassment and shaming by critics, pointing to the need for greater civility in scientific discourse.

Attention has shifted away from the scientific quality of the paper and the dubious products the paper has been used to promote and on the behavior of its critics.

Amy Cuddy and powerful allies are given forums to attack and vilify critics, accusing them of damaging the environment in which science is done and discouraging prospective early career investigators from entering the field.

Meanwhile, Amy Cuddy commands large speaking fees and has a top-selling book claiming the original paper provides strong science for simple behavioral manipulations altering mind-body relations and producing socially significant behavior.

This misrepresentation of psychological science does potential harm to consumers and the reputation of psychology among lay persons.

This blog post is intended to restart the conversation with a reconsideration of the original paper as a clinical and health psychology randomized trial (RCT) and, on that basis, identifying the kinds of inferences that are warranted from it.

In the first of a two post series, I argue that:

The original power pose article in Psychological Science should never been published.

-Basically, we have a therapeutic analog intervention delivered in 2 1-minute manipulations by unblinded experimenters who had flexibility in what they did,  what they communicated to participants, and which data they chose to analyze and how.

-It’s unrealistic to expect that 2 1-minute behavioral manipulations would have robust and reliable effects on salivary cortisol or testosterone 17 minutes later.

-It’s absurd to assume that the hormones mediated changes in behavior in this context.

-If Amy Cuddy retreats to the idea that she is simply manipulating “felt power,” we are solidly in the realm of trivial nonspecific and placebo effects.

The original power posing paper

Carney DR, Cuddy AJ, Yap AJ. Power posing brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science. 2010 Oct 1;21(10):1363-8.

The Psychological Science article can be construed as a brief mind-body intervention consisting of 2 1-minute behavioral manipulations. Central to the attention that the paper attracted is that argument that this manipulation  affected psychological state and social performance via the effects of the manipulation on the neuroendocrine system.

The original study is in effect, a disguised randomized clinical trial (RCT) of a biobehavioral intervention. Once this is recognized, a host of standards can come into play for reporting this study and interpreting the results.

CONSORT

All major journals and publishers including Association for Psychological Science have adopted the Consolidated Standards of Reporting Trials (CONSORT). Any submission of a manuscript reporting a clinical trial is required to be accompanied by a checklist  indicating that the article reports that particular details of how the trial was conducted. Item 1 on the checklist specifies that both the title and abstract indicate the study was a randomized trial. This is important and intended to aid readers in evaluating the study, but also for the study to be picked up in systematic searches for reviews that depend on screening of titles and abstracts.

I can find no evidence that Psychological Science adheres to CONSORT. For instance, my colleagues and I provided a detailed critique of a widely promoted study of loving-kindness meditation that was published in Psychological Science the same year as Cuddy’s power pose study. We noted that it was actually a poorly reported null trial with switched outcomes. With that recognition, we went on to identify serious conceptual, methodological and statistical problems. After overcoming considerable resistance, we were able  to publish a muted version of our critique. Apparently reviewers of the original paper had failed to evaluate it in terms of it being an RCT.

The submission of the completed CONSORT checklist has become routine in most journals considering manuscripts for studies of clinical and health psychology interventions. Yet, additional CONSORT requirements that developed later about what should be included in abstracts are largely being ignored.

It would be unfair to single out Psychological Science and the Cuddy article for noncompliance to CONSORT for abstracts. However, the checklist can be a useful frame of reference for noting just how woefully inadequate the abstract was as a report of a scientific study.

CONSORT for abstracts

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF, CONSORT Group. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLOS Medicine. 2008 Jan 22;5(1):e20.

Journal and conference abstracts should contain sufficient information about the trial to serve as an accurate record of its conduct and findings, providing optimal information about the trial within the space constraints of the abstract format. A properly constructed and well-written abstract should also help individuals to assess quickly the validity and applicability of the findings and, in the case of abstracts of journal articles, aid the retrieval of reports from electronic databases.

Even if CONSORT for abstracts did not exist, we could argue that readers, starting with the editor and reviewers were faced with an abstract with extraordinary claims that required better substantiation. They were disarmed by a lack of basic details from evaluating these claims.

In effect, the abstract reduces the study to an experimercial for products about to be marketed in corporate talks and workshops, but let’s persist in evaluating it as an abstract as a scientific study.

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

I don’t believe I have ever encountered in an abstract the extravagant claims with which this abstract concludes. But readers are not provided any basis for evaluating the claim until the Methods section. Undoubtedly, many holding opinions about the paper did not read that far.

Namely:

Forty-two participants (26 females and 16 males) were randomly assigned to the high-power-pose or low-power-pose condition.

Testosterone levels were in the normal range at both Time 1 (M = 60.30 pg/ml, SD = 49.58) and Time 2 (M = 57.40 pg/ml, SD = 43.25). As would be suggested by appropriately taken and assayed samples (Schultheiss & Stanton, 2009), men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.2

Too small a study to provide an effect size

Hold on! First. Only 42 participants  (26 females and 16 males) would readily be recognized as insufficient for an RCT, particularly in an area of research without past RCTs.

After decades of witnessing the accumulation of strong effect sizes from underpowered studies, many of us have reacted by requiring 35 participants per group as the minimum acceptable level for a generalizable effect size. Actually, that could be an overly liberal criterion. Why?

Many RCTs are underpowered, yet a lack of enforcement of preregistration allows positive results by redefining the primary outcomes after results are known. A psychotherapy trial with 30 or less patients in the smallest cell has less than a 50% probability of detecting a moderate sized significant effect, even if it is present (Coyne,Thombs, & Hagedoorn, 2010). Yet an examination of the studies mustered for treatments being evidence supported by APA Division 12 indicates that many studies were too underpowered to be reliably counted as evidence of efficacy, but were included without comment about this problem. Taking an overview, it is striking the extent to which the literature continues depend on small, methodologically flawed RCTs conducted by investigators with strong allegiances to one of the treatments being evaluated. Yet, which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated (Luborsky et al., 2006).

Earlier my colleagues and I had argued for the non-accumulative  nature of evidence from small RCTs:

Kraemer, Gardner, Brooks, and Yesavage (1998) propose excluding small, underpowered studies from meta-analyses. The risk of including studies with inadequate sample size is not limited to clinical and pragmatic decisions being made on the basis of trials that cannot demonstrate effectiveness when it is indeed present. Rather, Kraemer et al. demonstrate that inclusion of small, underpowered trials in meta-analyses produces gross overestimates of effect size due to substantial, but unquantifiable confirmatory publication bias from non-representative small trials. Without being able to estimate the size or extent of such biases, it is impossible to control for them. Other authorities voice support for including small trials, but generally limit their argument to trials that are otherwise methodologically adequate (Sackett & Cook, 1993; Schulz & Grimes, 2005). Small trials are particularly susceptible to common methodological problems…such as lack of baseline equivalence of groups; undue influence of outliers on results; selective attrition and lack of intent-to-treat analyses; investigators being unblinded to patient allotment; and not having a pre-determined stopping point so investigators are able to stop a trial when a significant effect is present.

In the power posing paper, there was the control for sex in all analyses because a peek at the data revealed baseline sex differences in testosterone dwarfing any other differences. What do we make of investigators conducting a study depending on testosterone mediating a behavioral manipulation who did not anticipate large baseline sex differences in testosterone?

In a Pubpeer comment leading up to this post , I noted:

We are then told “men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.”

The findings alluded to in the abstract should be recognizable as weird and uninterpretable. Most basically, how could the 16 males be distributed across the two groups so that the authors could confidently say that differences held for both males and females? Especially when all analyses control for sex? Sex is highly correlated with testosterone and so an analysis that controlled for both the variables, sex and testosterone would probably not generalize to testosterone without such controls.

We are never given the basic statistics in the paper to independently assess what the authors are doing, not the correlation between cortisol and testosterone, only differences in time 2 cortisol controlling for time 1 cortisol, time 1 testosterone and gender. These multivariate statistics are not  very generalizable in a sample with 42 participants distributed across 2 groups. Certainly not for the 26 females and 16  males taken separately.

The behavioral manipulation

The original paper reports:

Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation.

And then elaborates:

To configure the test participants into the poses, the experimenter placed an electrocardiography lead on the back of each participant’s calf and underbelly of the left arm and explained, “To test accuracy of physiological responses as a function of sensor placement relative to your heart, you are being put into a certain physical position.” The experimenter then manually configured participants’ bodies by lightly touching their arms and legs. As needed, the experimenter provided verbal instructions (e.g., “Keep your feet above heart level by putting them on the desk in front of you”). After manually configuring participants’ bodies into the two poses, the experimenter left the room. Participants were videotaped; all participants correctly made and held either two high-power or two low-power poses for 1 min each. While making and holding the poses, participants completed a filler task that consisted of viewing and forming impressions of nine faces.

The behavioral task and subjective self-report assessment

Measure of risk taking and powerful feelings. After they finished posing, participants were presented with the gambling task. They were endowed with $2 and told they could keep the money—the safe bet—or roll a die and risk losing the $2 for a payoff of $4 (a risky but rational bet; odds of winning were 50/50). Participants indicated how “powerful” and “in charge” they felt on a scale from 1 (not at all) to 4 (a lot).

An imagined bewildered review from someone accustomed to evaluating clinical trials

Although the authors don’t seem to know what they’re doing, we have an underpowered therapy analogue study with extraordinary claims. It’s unconvincing  that the 2 1-minute behavioral manipulations would change subsequent psychological states and behavior with any extralaboratory implications.

The manipulation poses a puzzle to research participants, challenging them to figure out what is being asked of them. The $2 gambling task presumably is meant to simulate effects on real-world behavior. But the low stakes could mean that participants believed the task evaluated whether they “got” the purpose of the intervention and behaved accordingly. Within that perspective, the unvalidated subjective self-report rating scale would serve as a clue to the intentions of the experimenter and an opportunity to show the participants were smart. The  manipulation of putting participants  into a low power pose is even more unconvincing as a contrasting active intervention or a control condition.  Claims that this manipulation did anything but communicate experimenter expectancies are even less credible.

This is a very weak form of evidence: A therapy analogue study with such a brief, low intensity behavioral manipulation followed by assessments of outcomes that might just inform participants of what they needed to do to look smart (i.e., demand characteristics). Add in that the experimenters were unblinded and undoubted had flexibility in how they delivered the intervention and what they said to participants. As a grossly underpowered trial, the study cannot make a contribution to the literature and certainly not an effect size.

Furthermore, if the authors had even a basic understanding of gender differences in social status or sex differences in testosterone, they would have stratified the study with respect to participate gender, not attempted to obtain control by post hoc statistical manipulation.

I could comment on signs of p-hacking and widespread signs of inappropriate naming, use, and interpretation of statistics, but why bother? There are no vital signs of a publishable paper here.

Is power posing salvaged by fashionable hormonal measures?

 Perhaps the skepticism of the editor and reviewers was overcome by the introduction of mind-body explanations  of what some salivary measures supposedly showed. Otherwise, we would be left with a single subjective self-report measure and a behavioral task susceptible to demand characteristics and nonspecific effects.

We recognize that the free availability of powerful statistical packages risks people using them without any idea of the appropriateness of their use or interpretation. The same observation should be made of the ready availability of means of collecting spit samples from research participants to be sent off to outside laboratories for biochemical analysis.

The clinical health psychology literature is increasingly filled with studies incorporating easily collected saliva samples intended to establish that psychological interventions influence mind-body relations. These have become particularly applied in attempts to demonstrate that mindfulness meditation and even tai chi can have beneficial effects on physical health and even cancer outcomes.

Often inaccurately described as as “biomarkers,” rather than merely as biological measurements, there is seldom little learned by inclusion of such measures that is generalizable within participants or across studies.

Let’s start with salivary-based cortisol measures.

A comprehensive review  suggests that:

  • A single measurement on a participant  or a pre-post pair of assessments would not be informative.
  • Single measurements are unreliable and large intra-and inter-individual differences not attributable to intervention can be in play.
  • Minor variations in experimental procedures can have large, unwanted effects.
  • The current standard is cortisol awakening response in the diurnal slope over more than one day, which would not make sense for the effects of 2 1-minute behavioral manipulations.
  • Even with sophisticated measurement strategies there is low agreement across and even within studies and low agreement with behavioral and self-report data.
  • The idea of collecting saliva samples would serve the function the investigators intended is an unscientific, but attractive illusion.

Another relevant comprehensive theoretical review and synthesis of cortisol reactivity was available at the time the power pose study was planned. The article identifies no basis for anticipating that experimenters putting participants into a 1-minute expansive poses would lower cortisol. And certainly no basis for assuming that putting participants into a 1-minute slumped position would raise cortisol. Or what such findings could possibly mean.

But we are clutching at straws. The authors’ interpretations of their hormonal data depend on bizarre post hoc decisions about how to analyze their data in a small sample in which participant sex is treated in incomprehensible  fashion. The process of trying to explain spurious results risks giving the results a credibility that authors have not earned for them. And don’t even try to claim we are getting signals of hormonal mediation from this study.

Another system failure: The incumbent advantage given to a paper that should not have been published.

Even when publication is based on inadequate editorial oversight and review, any likelihood or correction is diminished by published results having been blessed as “peer reviewed” and accorded an incumbent advantage over whatever follows.

A succession of editors have protected the power pose paper from post-publication peer review. Postpublication review has been relegated to other journals and social media, including PubPeer and blogs.

Soon after publication of  the power pose paper, a critique was submitted to Psychological Science, but it was desk rejected. The editor informally communicated to the author that the critique read like a review and teh original article had already been peer reviewed.

The critique by Steven J. Stanton nonetheless eventually appeared in Frontiers in Behavioral Neuroscience and is worth a read.

Stanton took seriously the science being invoked in the claims of the power pose paper.

A sampling:

Carney et al. (2010) collapsed over gender in all testosterone analyses. Testosterone conforms to a bimodal distribution when including both genders (see Figure 13; Sapienza et al., 2009). Raw testosterone cannot be considered a normally distributed dependent or independent variable when including both genders. Thus, Carney et al. (2010) violated a basic assumption of the statistical analyses that they reported, because they used raw testosterone from pre- and post-power posing as independent and dependent variables, respectively, with all subjects (male and female) included.

And

^Mean cortisol levels for all participants were reported as 0.16 ng/mL pre-posing and 0.12 ng/mL post-posing, thus showing that for all participants there was an average decrease of 0.04 ng/mL from pre- to post-posing, regardless of condition. Yet, Figure 4 of Carney et al. (2010) shows that low-power posers had mean cortisol increases of roughly 0.025 ng/mL and high-power posers had mean cortisol decreases of roughly 0.03 ng/mL. It is unclear given the data in Figure 4 how the overall cortisol change for all participants could have been a decrease of 0.04 ng/mL.

Another editor of Psychological Science received a critical comment from Marcus Crede and Leigh A. Phillips. After the first round of reviews, the Crede and Philips removed references to changes in the published power pose paper from earlier drafts that they had received from the first author, Dana Carney. However, Crede and Phillips withdrew their critique when asked to respond to a review by Amy Cuddy in a second resubmission.

The critique is now forthcoming in Social Psychological and Personality Science

Revisiting the Power Pose Effect: How Robust Are the Results Reported by Carney, Cuddy and Yap (2010) to Data Analytic Decisions

The article investigates effects of choices made in p-hacking in the original paper. An excerpt from the abstract

In this paper we use multiverse analysis to examine whether the findings reported in the original paper by Carney, Cuddy, and Yap (2010) are robust to plausible alternative data analytic specifications: outlier identification strategy; the specification of the dependent variable; and the use of control variables. Our findings indicate that the inferences regarding the presence and size of an effect on testosterone and cortisol are  highly sensitive to data analytic specifications. We encourage researchers to routinely explore the influence of data analytic choices on statistical inferences and also encourage editors and  reviewers to require explicit examinations of the influence of alternative data analytic  specifications on the inferences that are drawn from data.

Dana Carney, the first author of the has now posted an explanation why she no longer believes the originally reported findings are genuine and why “the evidence against the existence of power poses is undeniable.” She discloses a number of important confounds and important “researcher degrees of freedom in the analyses reported in the published paper.

Coming Up Next

A different view of the Amy Cuddy’s Ted talk in terms of its selling of pseudoscience to consumers and its acknowledgment of a strong debt to Cuddy’s adviser Susan Fiske.

A disclosure of some of the financial interests that distort discussion of the scientific flaws of the power pose.

How the reflexive response of the replicationados inadvertently reinforced the illusion that the original pose study provided meaningful effect sizes.

How Amy Cuddy and her allies marshalled the resources of the Association for Psychological Science to vilify and intimidate critics of bad science and of the exploitation of consumers by psychological pseudoscience.

How journalists played into this vilification.

What needs to be done to avoid a future fiasco for psychology like the power pose phenomenon and protect reformers of the dissemination of science.

Note: Time to reiterate that all opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.