Power Poseur: The lure of lucrative pseudoscience and the crisis of untrustworthiness of psychology

This is the second of two segments of Mind the Brain aimed at redirecting the conversation concerning power posing to the importance of conflicts of interest in promoting and protecting its scientific status. 

The market value of many lines of products offered to consumers depends on their claims of being “science-based”. Products from psychologists that invoke wondrous mind-body or brain-behavior connections are particularly attractive. My colleagues and I have repeatedly scrutinized such claims, sometimes reanalyzing the original data, and consistently find the claims false or premature and exaggerated.

There is so little risk and so much money and fame to be gained in promoting questionable and even junk psychological science to lay audiences. Professional organizations confer celebrity status on psychologists who succeed, provide them with forums and free publicity that enhance their credibility, and protect their claims of being “science-based” from critics.

How much money academics make from popular books, corporate talks, and workshops and how much media attention they garner serve as alternative criteria for a successful career, sometimes seeming to be valued more than the traditional ones of quality and quantity of publications and the amount of grant funding obtained.

Efforts to improve the trustworthiness of what psychologists publish in peer-reviewed have no parallel in any efforts to improve the accuracy of what psychologists say to the public outside of the scientific literature.

By the following reasoning, there may be limits to how much the former efforts at reform can succeed without the latter. In the hypercompetitive marketplace, only the most dramatic claims gain attention. Seldom are the results of rigorously done, transparently reported scientific work sufficiently strong and  unambiguous enough to back up the claims with the broadest appeal, especially in psychology. Psychologists who remain in academic setting but want to sell market their merchandise to consumers face a dilemma: How much do they have to hype and distort their findings in peer-reviewed journals to fit with what they say to the public?

It important for readers of scientific articles to know that authors are engaged in these outside activities and have pressure to obtain particular results. The temptation of being able to make bold claims clash with the requirements to conduct solid science and report results transparently and completely. Let readers decide if this matters for their receptivity to what authors say in peer-reviewed articles by having information available to them. But almost never is a conflict of interest declared. Just search articles in Psychological Science and see if you can find a single declaration of a COI, even when the authors have booking agents and give high priced corporate talks and seminars.

The discussion of the quality of science backing power posing should have been shorter.

Up until now, much attention to power posing in academic circles has been devoted to the quality of the science behind it, whether results can be independently replicated, and whether critics have behaved badly. The last segment of Mind the Brain examined the faulty science of the original power posing paper in Psychological Science and showed why it could not contribute a credible effect size to the literature.

The discussion of the science behind power posing should have been much shorter and should have reached a definitive conclusion: the original power posing paper should never have been published in Psychological Science. Once the paper had been published, a succession of editors failed in their expanded Pottery-Barn responsibility to publish critiques by Steven J. Stanton  and by Marcus Crede and Leigh A. Phillips that were quite reasonable in their substance and tone. As is almost always the case, bad science was accorded an incumbent advantage once it was published. Any disparagement or criticism of this paper would be held by editors to strict and even impossibly high standards if it were to be published. Let’s review the bad science uncovered in the last blog. Readers who are familiar with that post can skip to the next section.

A brief unvarnished summary of the bad science of the original power posing paper has a biobehavioral intervention study

Reviewers of the original paper should have balked at the uninformative and inaccurate abstract. Minimally, readers need to know at the outset that there were only 42 participants (26 females and 16 males) in the study comparing high power versus low-power poses. Studies with so few participants cannot be expected to provide reproducible effect sizes. Furthermore, there is no basis for claiming that results held for both men and women because that claim depended on analyses with even smaller numbers. Note the 16 males were distributed in some unknown way across the two conditions. If power is fixed by the smaller cell size, even the optimal 8 males/cell is well below contributing an effect size. Any apparent significant effects in this study are likely to be meaning imposed on noise.

The end sentence in the abstract is an outrageously untrue statement of results. Yet, as we will see, it served as the basis of a product launch worth in the seven-figure range that was already taking shape:

That a person can, by assuming two simple 1-minute poses, embody power and instantly become more powerful has real-world, actionable implications.

Aside from the small sample size, as an author, editor and critic for in clinical and health psychology for over 40 years, I greet a claim of ‘real-world actionable implications’ from two one-minute manipulations of participants’ posture with extreme skepticism. My skepticism grows as we delve into the details of the study.

Investigators’ collecting a single pair of pre-post assessments of salivary cortisol is at best a meaningless ritual, and can contribute nothing to understanding what is going on in the study at a hormonal level.

Men in this age range of participants in this study have six times more testosterone than women. Statistical “control” of testosterone by controlling for gender is a meaningless gesture producing uninterpretable results. Controlling for baseline testosterone in analyses of cortisol and vice versa eliminates any faint signal in the loud noise of the hormonal data.

Although it was intended as a manipulation check (and subsequently as claimed as evidence of the effect of power posing on feelings),  the crude subjective self-report ratings of how “powerful” and “in charge” on a 1-4 scale could simply communicate the experimenters’ expectancies to participants. Endorsing whether they felt more powerful indicated how smart participants were and if they were go along with the purpose of the study. Inferences beyond that uninteresting finding require external validation.

In clinical and health psychology trials, we are quite wary of simple subjective self-report analogue scales, particularly when there is poor control of the unblinded experimenters’ behavior and what they communicate to participants.

The gambling task lacks external validation. Low stakes could simply reduce it to another communication of experimenters’ expectancies. Note that the saliva assessments were obtained after completion of the task and if there is any confidence left in the assessments of hormones, this is an important confound.

The unblinded experimenters’ physically placing participants in either 2 1-minute high power or 2 1-minute low-power poses is a weird, unvalidated experimental manipulation that could not have the anticipated effects on hormonal levels. Neither high- nor low-power poses are credible, but the hypothesis particularly strains credibility that they low-power pose would actually raise cortisol, if cortisol assessments in the study had any meaning at all.

Analyses were not accurately described, and statistical controls of any kind with such a small sample  are likely to add to spurious findings. The statistical controls in this study were particularly inappropriate and there is evidence of the investigators choosing the analyses to present after the results were known.

There is no there there: The original power pose paper did not introduce a credible effect size into the literature.

The published paper cannot introduce a credible effect size into the scientific literature. Power posing may be an interesting and important idea that deserves careful scientific study but if any future study of the idea would be “first ever,” not a replication of the  Psychological Science article. The two commentaries that were blocked from publication in Psychological Science but published elsewhere amplify any dismissal of the paper, but we are already well over the top. But then there is the extraordinary repudiation of the paper by the first author and her exposure of the exploitation of investigator degrees of freedom and outright p-hacking.  How many stakes do you have to plunge into the heart of a vampire idea?

Product launch

 Even before the power posing article appeared in Psychological Science, Amy Cuddy was promoting it at Harvard, first  in Power Posing: Fake It Until You Make It  in Harvard Business School’s Working Knowledge: Business Research for Business Leaders. Shortly afterwards was the redundant but elaborated article in Harvard Magazine, subtitled Amy Cuddy probes snap judgments, warm feelings, and how to become an “alpha dog.”

Amy Cuddy is the middle author on the actual Psychological Science between first author Dana Carney and third author, Dana Carney’s graduate student Andy J Yap. Yet, the Harvard Magazine article lists Cuddy first. The Harvard Magazine article is also noteworthy in unveiling what would grow into Cuddy’s redemptive self narrative, although Susan Fiske’s role as  as the “attachment figure” who nurtures Cuddy’s  realization of her inner potential was only hinted.

QUITE LITERALLY BY ACCIDENT, Cuddy became a psychologist. In high school and in college at the University of Colorado at Boulder, she was a serious ballet dancer who worked as a roller-skating waitress at the celebrated L.A. Diner. But one night, she was riding in a car whose driver fell asleep at 4:00 A.M. while doing 90 miles per hour in Wyoming; the accident landed Cuddy in the hospital with severe head trauma and “diffuse axonal injury,” she says. “It’s hard to predict the outcome after that type of injury, and there’s not much they can do for you.”

Cuddy had to take years off from school and “relearn how to learn,” she explains. “I knew I was gifted–I knew my IQ, and didn’t think it could change. But it went down by two standard deviations after the injury. I worked hard to recover those abilities and studied circles around everyone. I listened to Mozart–I was willing to try anything!” Two years later her IQ was back. And she could dance again.

Yup, leading up to promoting the idea that overcoming circumstances and getting what you want is as simple as adopitng these 2 minutes of  behavioral manipulation.

The last line of the Psychological Science abstract was easily fashioned into the pseudoscientific basis for this ease of changing behavior and outcomes, which now include the success of venture-capital pitches:

 

Tiny changes that people can make can lead to some pretty dramatic outcomes,” Cuddy reports. This is true because changing one’s own mindset sets up a positive feedback loop with the neuroendocrine secretions, and also changes the mindset of others. The success of venture-capital pitches to investors apparently turns, in fact, on nonverbal factors like “how comfortable and charismatic you are.”

Soon, The New York Times columnist David Brooks   placed power posing solidly within the positive thinking product line of positive psychology, even if Cuddy had no need to go out on that circuit: “If you act powerfully, you will begin to think powerfully.”

In 2011, both first author Dana Carney and Amy Cuddy received the Rising Star Award from the Association for Psychological Science (APS) for having “already made great advancements in science” Carney cited her power posing paper as one that she liked. Cuddy didn’t nominate the paper, but reported er recent work examined “how brief nonverbal expressions of competence/power and warmth/connection actually alter the neuroendocrine levels, expressions, and behaviors of the people making the expressions, even when the expressions are “posed.”

The same year, In 2011, Cuddy also appeared at PopTech, which is a”global community of innovators, working together to expand the edge of change” with tickets selling for $2,000. According to an article in The Chronicle of Higher Education :

When her turn came, Cuddy stood on stage in front of a jumbo screen showing Lynda Carter as Wonder Woman while that TV show’s triumphant theme song announced the professor’s arrival (“All the world is waiting for you! And the power you possess!”). After the music stopped, Cuddy proceeded to explain the science of power poses to a room filled with would-be innovators eager to expand the edge of change.

But that performance was just a warm up for Cuddy’s TedGlobal Talk which has now received almost 42 million views.

A Ted Global talk that can serve as a model for all Ted talks: Your body language may shape who you are  

This link takes you not only to Amy Cuddy’s Ted Global talk but to a transcript in 49 different languages

 Amy Cuddy’s TedGlobal Talk is brilliantly crafted and masterfully delivered. It has two key threads. The first thread is what David McAdams has described as an obligatory personal narrative of a redeemed self.  McAdams summarizes the basic structure:

As I move forward in life, many bad things come my way—sin, sickness, abuse, addiction, injustice, poverty, stagnation. But bad things often lead to good outcomes—my suffering is redeemed. Redemption comes to me in the form of atonement, recovery, emancipation, enlightenment, upward social mobility, and/or the actualization of my good inner self. As the plot unfolds, I continue to grow and progress. I bear fruit; I give back; I offer a unique contribution.

This is interwoven with a second thread, the claims of the strong science of power pose derived from the Psychological Science article. Without the science thread, the talk is reduced to a motivational talk of the genre of Oprah Winfrey or Navy Seal Admiral William McRaven Sharing Reasons You Should Make Bed Everyday

It is not clear that we should hold the redeemed self of a Ted Talk to the criteria of historical truth. Does it  really matter whether  Amy Cuddy’s IQ temporarily fell two standard deviations after an auto accident (13:22)? That Cuddy’s “angel adviser Susan Fiske saved her from feeling like an imposter with the pep talk that inspired the “fake it until you make it” theme of power posing (17:03)? That Cuddy similarly transformed the life of her graduate student (18:47) with:

So I was like, “Yes, you are! You are supposed to be here! And tomorrow you’re going to fake it, you’re going to make yourself powerful, and, you know –

This last segment of the Ted talk is best viewed, rather than read in the transcript. It brings Cuddy to tears and the cheering, clapping audience to their feet. And Cuddy wraps up with her takeaway message:

The last thing I’m going to leave you with is this. Tiny tweaks can lead to big changes. So, this is two minutes. Two minutes, two minutes, two minutes. Before you go into the next stressful evaluative situation, for two minutes, try doing this, in the elevator, in a bathroom stall, at your desk behind closed doors. That’s what you want to do. Configure your brain to cope the best in that situation. Get your testosterone up. Get your cortisol down. Don’t leave that situation feeling like, oh, I didn’t show them who I am. Leave that situation feeling like, I really feel like I got to say who I am and show who I am.

So I want to ask you first, you know, both to try power posing, and also I want to ask you to share the science, because this is simple. I don’t have ego involved in this. (Laughter) Give it away. Share it with people, because the people who can use it the most are the ones with no resources and no technology and no status and no power. Give it to them because they can do it in private. They need their bodies, privacy and two minutes, and it can significantly change the outcomes of their life.

Who cares if the story is literal historical truth? Maybe we should not. But I think psychologists should care about the misrepresentation of the study. For that matter, anyone concerned with truth in advertising to consumers. Anyone who believes that consumers have the right to fair and accurate portrayal of science in being offered products, whether anti-aging cream, acupuncture, or self-help merchandise:

Here’s what we find on testosterone. From their baseline when they come in, high-power people experience about a 20-percent increase, and low-power people experience about a 10-percent decrease. So again, two minutes, and you get these changes. Here’s what you get on cortisol. High-power people experience about a 25-percent decrease, and the low-power people experience about a 15-percent increase. So two minutes lead to these hormonal changes that configure your brain to basically be either assertive, confident and comfortable, or really stress-reactive, and feeling sort of shut down. And we’ve all had the feeling, right? So it seems that our nonverbals do govern how we think and feel about ourselves, so it’s not just others, but it’s also ourselves. Also, our bodies change our minds.

Why should we care? Buying into such simple solutions prepares consumers to accept other outrageous claims. It can be a gateway drug for other quack treatments like Harvard psychologist Ellen Langer’s claims that changing mindset can overcome advanced cancer.

Unwarranted claims breaks down the barriers between evidence-based recommendations and nonsense. Such claims discourages consumers from accepting more deliverable promises that evidence-based interventions like psychotherapy can indeed make a difference, but they take work and effort, and effects can be modest. Who would invest time and money in cognitive behavior therapy, when two one-minute self-manipulations can transform lives? Like all unrealistic promises of redemption, such advice may ultimately lead people to blame themselves when they don’t overcome adversity- after all it is so simple  and just a matter of taking charge of your life. Their predicament indicates that they did not take charge or that they are simply losers.

But some consumers can be turned cynical about psychology. Here is a Harvard professor trying to sell them crap advice. Psychology sucks, it is crap.

Conflict of interest: Nothing to declare?

In an interview with The New York Times, Amy Cuddy said: “I don’t care if some people view this research as stupid,” she said. “I feel like it’s my duty to share it.”

Amy Cuddy may have been giving her power pose advice away for free in her Ted Talk, but she already had given it away at the $2,000 a ticket PopTech talk. The book contract for Presence: Bringing Your Boldest Self to Your Biggest Challenges was reportedly for around a million dollars.  And of course, like many academics who leave psychology for schools of management, Cuddy had a booking agency soliciting corporate talks and workshops. With the Ted talk, she could command $40,000-$100,000.

Does this discredit the science of power posing? Not necessarily, but readers should be informed and free to decide for themselves. Certainly, all this money in play might make Cuddy more likely to respond defensively to criticism of her work. If she repudiated this work the way that first author Dana Carey did, would there be a halt to her speaking gigs, a product recall, or refunds issued by Amazon for Presence?

I think it is fair to suggest that there is too much money in play for Cuddy to respond to academic debate.  Maybe things are outside that realm because of these stakes.

The replicationados attempt replications: Was it counterproductive?

 Faced with overwhelming evidence of the untrustworthiness of the psychological literature, some psychologists have organized replication initiatives and accumulated considerable resources for multisite replications. But replication initiatives are insufficient to salvage the untrustworthiness of many areas of psychology, particularly clinical and health psychology intervention studies, and may inadvertently dampen more direct attacks on bad science. Many of those who promote replication initiatives are silent when investigators refused to share data for studies with important clinical and public health implications. They are also silent when journals like Psychological Science fail to publish criticism of papers with blatantly faulty science.

Replication initiatives take time and results are often,but not always ultimately published outside of the journals where a flawed original work was published. But in important unintended consequence of them is they lend credibility to effect sizes that had no validity whatsoever when they occurred in the original papers. In debate attempting to resolve discrepancies between original studies and large scale replications, the original underpowered studies are often granted a more entrenched incumbent advantage.

It should be no surprise that in large-scale attempted  replication,  Ranehill , Dreber, Johannesson, Leiberg, Sul , and Weber failed to replicate the key, nontrivial findings of the original power pose study.

Consistent with the findings of Carney et  al., our results showed a significant effect of power posing on self-reported feelings of power. However, we found no significant effect of power posing on hormonal levels or in any of the three behavioral tasks.

It is also not surprising that Cuddy invoked her I-said-it-first-and-i-was-peer-reviewed incumbent advantage reasserting of her original claim, along with a review of 33 studies including the attempted replication:

The work of Ranehill et al. joins a body of research that includes 33 independent experiments published with a total of 2,521 research participants. Together, these results may help specify when nonverbal expansiveness will and will not cause embodied psychological changes.

Cuddy asserted methodological differences between their study and the attempted Ranehill replication, may have moderated the effects of posing. But no study has shown that putting participants into a power pose affects hormones.

Joe Simmons and Uri Simonsohn and performed a meta analysis of the studies nominated by Cuddy and ultimately published in Psychological Science. Their blog Data Colada succinctly summarized the results:

Consistent with the replication motivating this post, p-curve indicates that either power-posing overall has no effect, or the effect is too small for the existing samples to have meaningfully studied it. Note that there are perfectly benign explanations for this: e.g., labs that run studies that worked wrote them up, labs that run studies that didn’t, didn’t. [5]

While the simplest explanation is that all studied effects are zero, it may be that one or two of them are real (any more and we would see a right-skewed p-curve). However, at this point the evidence for the basic effect seems too fragile to search for moderators or to advocate for people to engage in power posing to better their lives.

Come on, guys, there was never a there there, don’t invent one, but keeping trying to explain it.

It is interesting that none of these three follow up articles in Psychological Science have abstracts, especially in contrast to the original power pose paper that effectively delivered its misleading message in the abstract.

Just as this blog post was being polished, a special issue of Comprehensive Results in Social Psychology (CRSP) on Power Poses was released.

  1. No preregistered tests showed positive effects of expansive poses on any behavioral or hormonal measures. This includes direct replications and extensions.
  2. Surprise: A Bayesian meta-analysis across the studies reveals a credible effect of expansive poses on felt power. (Note that this is described as a ‘manipulation check’ by Cuddy in 2015.) Whether this is anything beyond a demand characteristic and whether it has any positive downstream behavioral effects is unknown.

No, not a surprise, just an uninteresting artifact. But stay tuned for the next model of poser pose dropping the tainted name and focusing on “felt power.” Like rust, commercialization of bad psychological science never really sleeps, it only takes power naps.

Meantime, professional psychological organizations, with their flagship journals and publicity machines need to:

  • Lose their fascination with psychologists whose celebrity status depends on Ted talks and the marketing of dubious advice products grounded in pseudoscience.
  • Embrace and adhere to an expanded Pottery Barn rule that covers not only direct replications, but corrections to bad science that has been published.
  • Make the protection of  consumers from false and exaggerated claims a priority equivalent to the vulnerable reputations of academic psychologists in efforts to improve the trustworthiness of psychology.
  • Require detailed conflicts of interest statements for talks and articles.

All opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Disclosure:

I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

 

 

 

 

Unmasking Jane Brody’s “A Positive Outlook May Be Good for Your Health” in The New York Times

A recipe for coercing ill people with positive psychology pseudoscience in the New York Times

  • Judging by the play she gets in social media and the 100s of comments on her articles in the New York Times, Jane Brody has a successful recipe for using positive psychology pseudoscience to bolster down-home advice you might’ve gotten from your grandmother.
  • Her recipe might seem harmless enough, but her articles are directed at people struggling with chronic and catastrophic physical illnesses. She offers them advice.
  • The message is that persons with physical illness should engage in self-discipline, practice positive psychology exercises – or else they are threatening their health and shortening their lives.
  • People struggling with physical illness have enough to do already. The admonition they individually and collectively should do more -they should become more self-disciplined- is condescending and presumptuous.
  • Jane Brody’s carrot is basically a stick. The implied threat is simply coercive: that people with chronic illness are not doing what they can to improve the physical health unless they engage in these exercises.
  • It takes a careful examination Jane Brody’s sources to discover that the “scientific basis” for this positive psychology advice is quite weak. In many instances it is patently junk, pseudoscience.
  • The health benefits claimed for positivity are unfounded.
  • People with chronic illness are often desperate or simply vulnerable to suggestions that they can and should do more.  They are being misled by this kind of article in what is supposed to be the trusted source of a quality news outlet, The New York Times, not The Daily News.
  • There is a sneaky, ill-concealed message that persons with chronic illness will obtain wondrous benefits by just adopting a positive attitude – even a hint that cancer patients will live longer.

In my blog post about positive psychology and health, I try to provide  tools so that consumers can probe for themselves the usually false and certainly exaggerated claims that are being showered on them.

However, in the case of Jane Brody’s articles, we will see that the task is difficult because she draws on a selective sampling of the literature in which researchers generate junk self-promotional claims.

That’s a general problem with the positive psychology “science” literature, but the solution for journalists like Jane Brody is to seek independent evaluation of claims from outside the positive psychology community. Journalists, did you hear that message?

The article, along with its 100s of comments from readers, is available here:

A Positive Outlook May Be Good for Your Health by Jane E.Brody

The article starts with some clichéd advice about being positive. Brody seems to be on the side of the autonomy of her  readers. She makes seemingly derogatory comments  that the advice is “cockeyed optimism” [Don’t you love that turn of phrase? I’m sure to borrow it in the future]

“Look on the sunny side of life.”

“Turn your face toward the sun, and the shadows will fall behind you.”

“Every day may not be good, but there is something good in every day.”

“See the glass as half-full, not half-empty.”

Researchers are finding that thoughts like these, the hallmarks of people sometimes called “cockeyed optimists,” can do far more than raise one’s spirits. They may actually improve health and extend life.

See?  The clever putdown of this advice was just a rhetorical device, just a set up for what follows. Very soon Brody is delivering some coercive pseudoscientific advice, backed by the claim that “there is no longer any doubt” and that the links between positive thinking and health benefits are “indisputable.”

There is no longer any doubt that what happens in the brain influences what happens in the body. When facing a health crisis, actively cultivating positive emotions can boost the immune system and counter depression. Studies have shown an indisputable link between having a positive outlook and health benefits like lower blood pressure, less heart disease, better weight control [Emphasis added.].

I found the following passage particularly sneaky and undermining of people with cancer.

Even when faced with an incurable illness, positive feelings and thoughts can greatly improve one’s quality of life. Dr. Wendy Schlessel Harpham, a Dallas-based author of several books for people facing cancer, including “Happiness in a Storm,” was a practicing internist when she learned she had non-Hodgkin’s lymphoma, a cancer of the immune system, 27 years ago. During the next 15 years of treatments for eight relapses of her cancer, she set the stage for happiness and hope, she says, by such measures as surrounding herself with people who lift her spirits, keeping a daily gratitude journal, doing something good for someone else, and watching funny, uplifting movies. Her cancer has been in remission now for 12 years.

“Fostering positive emotions helped make my life the best it could be,” Dr. Harpham said. “They made the tough times easier, even though they didn’t make any difference in my cancer cells.”

Sure, Jane Brody is careful to avoid the explicit claim the positive attitude somehow is connected to the cancer being in remission for 12 years, but the implication is there. Brody pushes the advice with a hint of the transformation available to cancer patients, only if they follow the advice.

After all, Jane Brody had just earlier asserted that positive attitude affects the immune system and this well-chosen example happens to be a cancer of the immune system.

Jane Brody immediately launches into a description of a line of research conducted by a positive psychology group at Northwestern University and University of California San Francisco.

Taking her cue from the investigators, Brody blurs the distinction between findings based in correlational studies and the results of intervention studies in which patients actually practiced positive psychology exercises.

People with new diagnoses of H.I.V. infection who practiced these skills carried a lower load of the virus, were more likely to take their medication correctly, and were less likely to need antidepressants to help them cope with their illness.

But Brody sins as a journalist are worse than that. With a great deal of difficulty, I have chased her claims back into the literature. I found some made up facts.

In my literature search, I could find only one study from these investigators that seemed directly related to these claims. The mediocre retrospective correlational study was mainly focused on use of psychostimulants, but it included a crude 6-item summary measure  of positive states of mind.

The authors didn’t present the results in a simple way that allows direct independent examination of whether indeed positive affect is related to other outcomes in any simple fashion. They did not allow check of simple correlations needed to determine whether their measure was not simply a measure of depressive symptoms turned on its head. They certainly had the data, but did not report it. Instead, they present some multivariate analyses that do not show impressive links. Any direct links to viral load are not shown and presumably are not there, although the investigators tested statistically for them. Technically speaking, I would write off the findings to measurement and specification error, certainly not worthy of reporting in The New York Times.

Less technically speaking, Brody is leading up to using HIV as an exemplar illness where cultivating positivity can do so much. But if this study is worth anything at all, it is to illustrate that even correlationally, positive affect is not related to much, other than – no surprise – alternative measures of positive affect.

Brody then goes on to describe in detail an intervention study. You’d never know from her description that her source of information is not a report of the results of the intervention study, but a promissory protocol that supposedly describes how the intervention study was going to be done.

I previously blogged about this protocol. At first, I thought it was praiseworthy that a study of a positive psychology intervention for health had even complied with the requirement that studies be preregistered and have a protocol available. Most such studies do not, but they are supposed to do that. In plain English, protocols are supposed to declare ahead of time what researchers are going to do and precisely how they are going to evaluate whether an intervention works. That is because, notoriously, researchers are inclined to say later they were really trying to do something else and to pick another outcome that makes the intervention look best.

But then I got corrected by James Heathers on Facebook. Duh, he had looked at the date the protocol was published.

He pointed out that this protocol was actually published years after collection of data had begun. The researchers already had a lot to peek at. Rather than identifying just a couple of variables on which the investigators were prepared to stake their claim the intervention was affected, the protocol listed 25 variables that would be examined as outcomes (!) in order to pick one or two.

So I updated what I said in my earlier blog. I pointed out that the published protocol was misleading. It was posted after the fact of the researchers being able to see how their study was unfolding and to change their plains accordingly.  The vagueness of the protocol gave the authors lots of wiggle room for selectively reporting and hyping their findings with the confirmation bias. They would later take advantage of this when they actually published the results of their study.

The researchers studied 159 people who had recently learned they had H.I.V. and randomly assigned them to either a five-session positive emotions training course or five sessions of general support. Fifteen months past their H.I.V. diagnosis, those trained in the eight skills maintained higher levels of positive feelings and fewer negative thoughts related to their infection.

Brody is not being accurate here. When the  authors finally got around to publishing the results, they told a very different story if you probe carefully. Even with the investigators doing a lot of spinning, they showed null results, no effects for the intervention. Appearances the contrary were created by the investigators ignoring what they actually reported in their tables. If you go to my earlier blog post, I point this out in detail, so you can see for yourself.

Brody goes on to describe the regimen that was not shown in the published study validation to be effective.

An important goal of the training is to help people feel happy, calm and satisfied in the midst of a health crisis. Improvements in their health and longevity are a bonus. Each participant is encouraged to learn at least three of the eight skills and practice one or more each day. The eight skills are:

■ Recognize a positive event each day.

■ Savor that event and log it in a journal or tell someone about it.

■ Start a daily gratitude journal.

■ List a personal strength and note how you used it.

■ Set an attainable goal and note your progress.

■ Report a relatively minor stress and list ways to reappraise the event positively.

■ Recognize and practice small acts of kindness daily.

■ Practice mindfulness, focusing on the here and now rather than the past or future.

For chrissakes, this is a warmed over version of Émile Coué de la Châtaigneraie’s autosuggestion “Every day in every way, I’m getting better and better. Surely, contemporary positive psychology’s science of health can do better than that. To Coué’s credit, he gave away his advice for free. He did not charge for his coaching, even if he was giving away something for which he had no evidence would improve people’s physical health.

Dr. Moskowitz said she was inspired by observations that people with AIDS, Type 2 diabetes and other chronic illnesses lived longer if they demonstrated positive emotions. She explained, “The next step was to see if teaching people skills that foster positive emotions can have an impact on how well they cope with stress and their physical health down the line.”

She listed as the goals improving patients’ quality of life, enhancing adherence to medication, fostering healthy behaviors, and building personal resources that result in increased social support and broader attention to the good things in life.

Let me explain why I am offended here. None of these activities have been shown to improve the health of persons with newly diagnosed HIV. It’s reasonable to assume that newly diagnosed persons have a lot with which to contend. It’s a bad time to give them advice to clutter their life with activities that will not make a difference in their health.

The published study was able to recruit and retain a sample of persons with newly diagnosed HIV because it paid them well to keep coming. I’ve worked with this population before, in a study aiming at helping them solve specific practical problems that that they said got in the way of their adherence.

Many persons with newly diagnosed HIV are low income and are unemployed or marginally employed. They will enroll in studies to get the participant fees. When I lived in the San Francisco Bay area, I recall one patient telling a recruiter from UCSF that he was too busy and unable to make a regular visit to the medical center for the intervention, but he would be willing to accept being in the study if he was assigned to the control group. It did not involve attending intervention sessions and would give him a little cash.

Based on my clinical and research experience, I don’t believe that such patients would regularly show up for this kind of useless positive psychology treatment without getting paid. Paticularly if they were informed of the actual results of this misrepresented study.

Gregg De Meza, a 56-year-old architect in San Francisco who learned he was infected with H.I.V. four years ago, told me that learning “positivity” skills turned his life around. He said he felt “stupid and careless” about becoming infected and had initially kept his diagnosis a secret.

“When I entered the study, I felt like my entire world was completely unraveling,” he said. “The training reminded me to rely on my social network, and I decided to be honest with my friends. I realized that to show your real strength is to show your weakness. No pun intended, it made me more positive, more compassionate, and I’m now healthier than I’ve ever been.”

I object to this argument by quotes-from-an-unrepresentative-patient. The intervention did not have the intended effect, and it is misleading to find somebody who claim to turn their life around.

Jane Brody proceeds with some more fake facts.

In another study among 49 patients with Type 2 diabetes, an online version of the positive emotions skills training course was effective in enhancing positivity and reducing negative emotions and feelings of stress. Prior studies showed that, for people with diabetes, positive feelings were associated with better control of blood sugar, an increase in physical activity and healthy eating, less use of tobacco and a lower risk of dying.

The study was so small and underpowered, aside from being methodologically flawed, that even if such effects were actually present, most of the time they would be missed because the study did not have enough patients to achieve significance.

In a pilot study of 39 women with advanced breast cancer, Dr. Moskowitz said an online version of the skills training decreased depression among them. The same was true with caregivers of dementia patients.

“None of this is rocket science,” Dr. Moskowitz said. “I’m just putting these skills together and testing them in a scientific fashion.”

It’s not rocket science, it’s misleading hogwash.

In a related study of more than 4,000 people 50 and older published last year in the Journal of Gerontology, Becca Levy and Avni Bavishi at the Yale School of Public Health demonstrated that having a positive view of aging can have a beneficial influence on health outcomes and longevity. Dr. Levy said two possible mechanisms account for the findings. Psychologically, a positive view can enhance belief in one’s abilities, decrease perceived stress and foster healthful behaviors. Physiologically, people with positive views of aging had lower levels of C-reactive protein, a marker of stress-related inflammation associated with heart disease and other illnesses, even after accounting for possible influences like age, health status, sex, race and education than those with a negative outlook. They also lived significantly longer.

This is even deeper into the woo. Give me a break, Jane Brody. Stop misleading people with chronic illness with false claims and fake facts. Adopting these attitudes will not prevent dementia.

Don’t believe me? I previously debunked these patently false claims in detail. You can see my critique here.

Here is what the original investigators claimed about Alzheimer’s:

We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

I exposed some analysis of voodoo statistics on which this claim is based. I concluded:

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

Nonsense, utter nonsense.

Let chronically ill people and those facing cancer adopt any attitude is comfortable or natural for them. It’s a bad time to ask for change, particularly when there isn’t any promised benefit in improved health or prolonged life.

Rather than Jane Brody’s recipe for positive psychology improving your health, I strongly prefer Lilia Downe’s  La Cumbia Del Mole.

It is great on chicken. If it does not extend your life, It will give you some moments of happiness, but you will have to adjust the spices to your personal taste.

I will soon be offering e-books providing skeptical looks at positive psychology, as well as mindfulness. As in this blog post, I will take claims I find in the media and trace them back to the scientific studies on which they are based. I will show you what I see so you can see it too.

 Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. You can even advance order one or all of the e-books.

 Lots to see at CoyneoftheRealm.com. Come see…

Were any interventions to prevent teen suicide effective in the SEYLE trial?

Disclaimer: I’ve worked closely with some of the SEYLE investigators on other projects. I have great respect for their work. Saving and Empowering Young Lives in Europe was a complex, multisite suicide prevention project of historical size and scale that was exceptionally well implemented.

However, I don’t believe that The Lancet article reported primary outcomes in a way that their clinical and public health significance can be fully and accurately appreciated. Some seemingly positive results were reported with a confirmation bias. Important negative findings were reported in ways that they are likely to be ignored, losing important lessons for the future.

I don’t think we benefit from minmizing the great difficulty in showing that any interventions work to prevent death by suicide, particularly in a relatively low risk group like teens. We don’t benefit from exaggerating the strength of evidence for particular approaches.

The issue of strength of evidence is compounded by Danuta Wasserman, the first author also being among the authors of a systematic review.

Zalsman G, Hawton K, Wasserman D, van Heeringen K, Arensman E, Sarchiapone M, Carli V, Höschl C, Barzilay R, Balazs J, Purebl G. Suicide prevention strategies revisited: 10-year systematic review. The Lancet Psychiatry. 2016 Jul 31;3(7):646-59.

In a post at Mental Elf, psychiatrist and expert on suicidology  Stanley Kutcher pointed to a passage in the abstract of the systematic review:

The review’s abstract notes that YAM (one of the study arms) “was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0.45, 95% CI 0.24 to 0.85; p=0.014) and severe suicidal ideation (0.50, 0.27 to 0.92; p=0.025)”. If this analysis seems familiar to the reader that is because this is the information also provided in the Zalsman abstract! This analysis refers to the SELYE study ONLY! However, the way in which the Zalsman abstract is written suggests this analysis refers to all school based suicide awareness programs the reviewers evaluated. Misleading at best. Conclusion supporting, not at all.

[Another reminder that authors of major studies should not also be authors on systematic reviews and meta analyses that review their work. But tell that to Cochrane Collaboration, which now has a policy of inviting authors of studies from which individual data are needed. But that is for another blog post.]

The article reporting the trial is currently available open access here.

Wasserman D, Hoven CW, Wasserman C, Wall M, Eisenberg R, Hadlaczky G, Kelleher I, Sarchiapone M, Apter A, Balazs J, Bobes J. School-based suicide prevention programmes: the SEYLE cluster-randomised, controlled trial. The Lancet. 2015 Apr 24;385(9977):1536-44.

The trial protocol is available here.

Wasserman D, Carli V, Wasserman C, et al. Saving and empowering young lives in Europe (SEYLE): a randomized controlled trial. BMC Public Health 2010; 10: 192.

seyle protocol

 

 

From the abstract of the Lancet paper:

Methods. The Saving and Empowering Young Lives in Europe (SEYLE) study is a multicentre, cluster-randomised controlled trial. The SEYLE sample consisted of 11 110 adolescent pupils, median age 15 years (IQR 14–15), recruited from 168 schools in ten European Union countries. We randomly assigned the schools to one of three interventions or a control group. The interventions were: (1) Question, Persuade, and Refer (QPR), a gatekeeper training module targeting teachers and other school personnel, (2) the Youth Aware of Mental Health Programme (YAM) targeting pupils, and (3) screening by professionals (ProfScreen) with referral of at-risk pupils. Each school was randomly assigned by random number generator to participate in one intervention (or control) group only and was unaware of the interventions undertaken in the other three trial groups. The primary outcome measure was the number of suicide attempt(s) made by 3 month and 12 month follow-up…

No significant differences between intervention groups and the control group were recorded at the 3 month follow-up. At the 12 month follow-up, YAM was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0·45, 95% CI 0·24–0·85; p=0·014) and severe suicidal ideation (0·50, 0·27–0·92; p=0·025), compared with the control group. 14 pupils (0·70%) reported incident suicide attempts at the 12 month follow-up in the YAM versus 34 (1·51%) in the control group, and 15 pupils (0·75%) reported incident severe suicidal ideation in the YAM group versus 31 (1·37%) in the control group. No participants completed suicide during the study period.

What can be noticed right away: (1) this is a four-armed study in which three interventions are compared to the control group; (2) apparently there were no effects observed at three months; (3) results are not reported for three of the four interventions at 12 months, only differences for one of the intervention group versus the control group; (4) the differences between the intervention group and the control group were numerically small; (5) despite enrolling over 11,000 students, no suicides were observed in any of the groups.

[A curious thing about the abstract to be discussed later in the post. What is identified as the statistical effect of YAM on self-reported suicide attempts is expressed in an odds ratio and statistical significance. No actual number are given. Yet, e

Effects on suicidal ideation are expressed in absolute numbers, with a small number of students identified as having severe ideation and a small absolute difference between YAM and the control group. Presumably, there were fewer suicide attempts than students with severe ideation. Like me, are you wondering how may self-reported attempts we are talking about?]

This study did not target actual suicides. That decision is appropriate, because even with 11,000 students there were no suicides. The significance of the lack of suicides is even with this many students followed for a year, one might not even have a single suicide, and so one cannot expect to observe an actual decrease in suicides, and certainly not a statistically significant decrease.

We should keep this in mind the next time we encounter claims about teen suicides being an epidemic or expectation that an intervention a particular community will lead to an observable reduction in teen suicides.

We should also keep this in mind when we see in the future that a community implemented suicide prevention programs after some spike in suicides. It’s very likely that a reduction in suicides will be observed, but that’s simply regression to the mean, the community returned to more typical rates of suicide.

hilda surrogate outcomesRather than actual suicides, the study specified suicidal ideation and self-reported suicidal acts. We have to be cautious about inferring changes in suicide from changes in these surrogate outcomes. Changes in surrogate outcomes don’t necessarily translate into changes in the outcomes that we are most interested in, but for whatever reason are not measuring. In this study, investigators were convinced with even such a large sample, a reduction in suicides would not be observed. Hardly a reason to argue that  whatever reduction in surrogate outcomes is observed would translate into a reduction in deaths.

Let’s temporarily put aside the issue of suicidal acts being self-reported and subject to both on unreliability and a likely overestimate of life-threatening acts. I would estimate from other studies that one would have to prevent hundred documented attempts at suicide in order to prevent one actual suicide.

But these are self-report measures.

Pupils  were identified as having severe suicidal ideation, if they answered: “sometimes, often, very often or always”  to the question: “during the past 2 weeks, have you  reached the point where you seriously considered  taking your life, or perhaps made plans how you would go about doing it?”

So any endorsement  of any of these categories were lumped together as “severe ideation.” We might not agree with that designation, but without this lumping, a sample of 11,000 students does not yield differences in occurrences of “severe suicidal ideation.”

Readers are not given a breakdown of the endorsements of suicidality across categories, but I think we can reasonably make some extrapolations about the skewness of the distribution from a study that I blogged about of the screening of 10,000 postpartum women  with a single item question:

In the sample of 10 000 women who underwent screening, 319 (3.2%) had thoughts of self-harm, including 8 who endorsed “yes, quite often”; 65, “sometimes”; and 246, “hardly ever.”

We can be confident that most instances of “severe suicidal ideation” in the SEYLE study did not indicate a strong likelihood of a teen making a suicide attempt. Such self-report measures are more related to other depressive symptoms than to attempted suicide.

This is all yet a reminder of the difficulty targeting suicide as a public health outcome. It’s very difficult to show an effect.

The abstract of the article prominently features a claim that one of three interventions was different than the control group in severe suicidal ideation and suicide attempts at 12 months, but not at three months.

We should be left pondering what happened at 12 months with respect to two of the three interventions. The interventions were carefully selected and we have the opportunity to examine what effect they had. After all, we may not get another opportunity to evaluate such interventions in such a large sample in the near future. We might simply assume these interventions had no effect at 12 months, but the abstract is written to distract from that potentially important finding that has significance for future trials.

But there is another problem in the reporting of outcomes. The results section states:

Analyses of the interaction between intervention groups and time (3 months and 12 months) showed no significant effect on incident suicide attempts in the three intervention groups, compared with the control group at the 3 month follow-up.

And

After analyses of the interaction between intervention groups and time (3 months and 12 months), we noted the following results for severe suicidal ideation: at the 3 month follow-up, there were no signifi cant effects of QPR, YAM, or ProfScreen compared with the control group.

It’s not appropriate to focus on the difference between one of the interventions and the control group without taken into account the context of it being a four-armed trial, a a 4 (conditions)  x  2 (3 or 12 follow up) design.

In the absence of a clearly specified a priori hypothesis, we should first look to the condition x time interaction effect. If we can reject the null hypothesis of no interaction effect having occurred, we should then examine where the effect occurred, more confident that there is something to be explained. However, if we do what was done in the abstract, we need to appreciate the high likelihood of spurious effects when we single out one difference between one of the intervention groups and the control group at one of the two times.

Let’s delve into a table of results for suicide attempts:

self-report attempts

These results demonstrate  we should not make too much of YAM being statistically significant, compared to compared to the two other active intervention groups.

We’re talking about a difference of only a few numbers in suicide attempts of students assigned to YAM versus the other two active intervention groups.

On this basis of theses differences, are we willing to say that YAM represents best practices, an empirically based approach to preventing suicides in schools, whereas the other two interventions are ineffective?

Note that even the difference between YAM in the control group has a broad confidence interval around a different significant at the level of p<.014.

It gets worse. Note that these are not differences in actual attempts but results obtained with an imputation:

A multiple imputation procedure  35(50 imputations with full conditional specification for dichotomous variables)36was used to manage missing values of individual characteristics  (<1% missing for each individual characteristic), so that all pupils with an outcome at 3 months or 12 months  were included in the GLMMs. Additional models,  including sex-by-intervention group interactions, and age-by-intervention group interactions were tested for differential intervention effects by sex and age. To assess the robustness of the findings, tests for intervention group differences were redone including only the subset of pupils with complete outcome data at both 3 months and 12 months.

Overall, we are dealing with small numbers of events that likely assessed with considerable error of measurement occurring with multiple imputation procedures, with the possibility of specification error based on false assumptions that cannot be tested with such a small number of events. Then, we have the broad overlapping confidence intervals for the three interventions. Finally, there is the problem of not taking into account the multiple pairwise comparisons that were possible in this 3x (2) design in which the critical overall treatment x time interaction was not significant.

Misclassification of just a couple of events or  a recovery of data that were thought to be lost and therefore had to be estimated with imputation could alter significance levels – as if they really matter in such a large trial, anyway.

Let’s return to the issue of the systematic review in which the senior author of the SEYLE trial participated. The text in the abstract borrowed without attribution from the abstract of this SEYLE study reflects a bit of overenthusiasm or at least premature enthusiasm for the senior author’s own results.

Let’s look at the interventions that were actually evaluated. The three active interventions:

The Screening by Professionals programme (ProfScreen)…is a selective or indicated intervention based on responses to the SEYLE baseline questionnaire. When pupils had completed the baseline assessment, health professionals reviewed their answers and pupils who screened at or above pre-established cutoff points were invited to participate in a professional mental health clinical assessment and subsequently referred to clinical services, if needed.3

Question, Persuade, and Refer (QPR) is a manualized gatekeeper programme, developed in the USA.28 In SEYLE, QPR was used to train teachers and other school personnel to recognise the risk of suicidal behaviour in pupils and to enhance their communication skills to motivate and help pupils at risk of suicide to seek professional care. QPR training materials included standard power point presentations and a 34-page booklet distributed to all trainees.

Teachers were also given cards with local health-care contact information for distribution to pupils identified by them as being at risk. Although QPR targeted all school staff, it was, in effect, a selective approach, because only pupils recognised as being at suicidal risk were approached by the gatekeepers (trained school personnel).

YAM

The Youth Aware of Mental Health Programme (YAM) was developed for the SEYLE study29 and is a manualised, universal intervention targeting all pupils, which includes 3 h of role-play sessions with interactive workshops combined with a 32-page booklet that pupils could take home, six educational posters displayed in each participating classroom and two 1 h interactive lectures about mental health at the beginning and end of the intervention. YAM aimed to raise mental health awareness about risk and protective factors associated with suicide, including knowledge about depression and anxiety, and to enhance the skills needed to deal with adverse life events, stress, and suicidal behaviours.

This programme was implemented at each site by instructors trained in the methodology through a detailed 31 page instruction manual.

I of course could be criticized as offering my predictions about effects of these interventions after results are known. Nonetheless, I think my skepticism is well known and the criticisms I have of these interventions might be anticipated.

ProfScreen is basically a screening and referral effort. Its vulnerability is the lack of evidence that screening instruments have adequate positive predictive value. None of the available screening measures proved useful in a recent large-scale study. Armed with screening instruments that don’t work particularly well, the health professionals are going to be referring a lot of students for further evaluation and treatment with a lot of false positives. I would anticipate that is already difficult getting a timely appointment for adolescent mental health treatment. These referrals could only further clog the system. Given the performance of the instruments, is not clear that students who screen positive should be given priority over other adolescents with known serious mental health problems.

I am sure a lot of activists and advocates for reducing teen suicide were rooting for screening and referral efforts. A clearer statement of the lack of any evidence in this large-scale study for the effectiveness of such an approach is invaluable and might prevent misdirection of resources.

The effectiveness of QPR would depend on raising the awareness of a school gatekeeper so that the gatekeeper was in a position at a rare, but decisive moment with a student otherwise inclined to life-threatening self harm, and prevent the progression to self harm from occurring.

Observing such a sequence and being able to intervene is going to be an infrequent occurrence. Of course, there’s the further doubtful assumption that suicidality is going to be so obvious that it can be recognized.

The YAM intervention is the only one that actually involves live interaction with students, but it is only 3 hours of role playing, added to lectures and posters. Nice, but I would not think that would have prevented suicide attempts, although maybe it would affect self-reports.

I recall way back when I was asked by NIMH program officers to apply for funding for intervention study of suicide prevention intervention targeting primary care physicians serving older adults. That focus was specifically being required by at the time House Majority Leader Senate Majority Leader Harry Reid (Nevada, Democrat, whose father had died from suicide after an encounter with a primary care physician in which the father being at risk was not uncovered. Senator Reid was demanding that NIMH conduct a clinical trial showing that such strategies could be averted. I told the program officers that I was sorry for the loss of Senator Reid’s father, but that given the rate of suicide even is relatively high risk group of elderly men, a primary care physician with only have a relevant encounter with an elderly, potentially suicidal patient about once every 18 months. It was difficult to conceive of an intervention they could demonstrate effectiveness in reducing suicide under those circumstances. I didn’t believe that suicidal ideation was a suitable surrogate, but the trial that got funded focused on reducing suicidal ideation as its primary outcome. The entire large, multisite trial only had one suicide during the trial and follow-up period, and happened to be someone who was in the intervention group. Not much that can be inferred from that.

What can we learn from SEYLE, given that it cannot define best practices for preventing teen suicide?

Do we undertake a bigger trial and hope the stars align so that one intervention is shown to be better than others? If we don’t get that result, do we resort to hocus pocus multiple imputation methods and insist the result is really there, we just can’t see it?

Of course, some will say we have to do something, we just can’t let more teens die by suicide. So, do we proceed without the benefit  of strong evidence?

I will soon be offering e-books providing skeptical looks at mindfulness and positive psychology, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

 

1 billion views! Why we should be concerned about PR campaign for 2 RCTs of psilocybin for cancer patients

According to the website of an advocacy foundation, coverage of two recent clinical trials published in in Journal of Psychopharmacology evaluating psilocybin for distress among cancer patients garnered over 1 billion views in the social media. To put that in context, the advocacy group claimed that this is one sixth of the attention that the Super Bowl received.

In this blog post I’ll review the second of the two clinical trials. Then, I will discuss some reasons why we should be concerned about the success of this public relations campaign in terms of what it means for both the integrity of scientific publishing, as well as health and science journalism.

The issue is not doubt that cancer patients will find benefit from the ingesting psychedelic mushroom in a safe environment. Nor that sale and ingestion of psilocybin is currently criminalized (Schedule 1, classified same as heroin).

We can appreciate the futility of the war on drugs, and the absurdity of the criminalization of psilocybin, but still object to how, we were strategically and effectively manipulated by this PR campaign.

Even if we approve of a cause, we need to be careful about subordinating the peer-review process and independent press coverage to the intended message of advocates.

Tolerating causes being promoted in this fashion undermines the trustworthiness of peer review and of independent press coverage of scientific papers.

To contradict a line from the 1964 acceptance speech of Republican Presidential Candidate Barry Goldwater, “Extremism in pursuit of virtue is no [a] vice. “

In this PR campaign –

We witnessed the breakdown of expected buffer of checks and balances between:

  • An advocacy group versus reporting of clinical trials in a scientific journal evaluating its claims.
  • Investigators’ exaggerated self-promotional claims versus editorial review and peer commentary.
  • Materials from the publicity campaign versus supposedly independent evaluation by journalists.

What if the next time the object of promotion is pharmaceuticals or medical devices by authors with conflicts of interest? But wait! Isn’t that what we’ve seen in JAMA Network journals on a smaller scale? Such as dubious claims about the wondrous effects of deep brain stimulation in JAMA: Psychiatry by promoters who “disappeared” failed trials? And claims in JAMA itself that suicides were eliminated at a behavioral health organization outside Detroit?

Is this part of a larger trend, where advocacy and marketing shape supposedly peer-reviewed publications in prestigious medical journals?

The public relations campaign for the psilocybin RCTs also left in tatters the credibility of altmetrics as an alternative to journal impact factors. The orchestrating of 1 billion views is a dramatic demonstration how altmetrics can be readily gamed. Articles published in a journal with a modest impact factor scored spectacularly, as seen in these altmetrics graphics the Journal of Psychopharmacology posted.

I reviewed in detail one of the clinical trials in my last blog post and will review the second in this one. They are both mediocre, poorly designed clinical trials that got lavishly praised as being highest quality by an impressive panel of commentators. I’ll suggest that in particular the second trial is best seen as what Barney Caroll has labeled  an experimercial, a clinical trial aimed at generating enthusiasm for a product, rather than a dispassionate evaluation undertaken with some possibility of not been able to reject the null hypothesis. If this sounds harsh, please indulge me and read on and be entertained and I think persuaded that this was not a clinical trial but an elaborate ritual, complete with psychobabble woo that has no place in the discussion of the safety and effectiveness of medicine.

After skeptically scrutinizing the second trial, I’ll consider the commentaries and media coverage of the two trials.

I’ll end with a complaint that this PR effort is only aimed at securing the right of wealthy people with cancer to obtain psilocybin under supervision of a psychiatrist and in the context of woo psychotherapy. The risk of other people in other circumstances ingesting psilocybin is deliberately exaggerated. If psilocybin is as safe and beneficial as claimed by these articles, why should use remain criminalized for persons who don’t have cancer or don’t want to get a phony diagnosis from a psychiatrist or don’t want to submit to woo psychotherapy?

The normally pay walled Journal of Psychopharmacology granted free access to the two articles, along with most but not all of the commentaries. However, extensive uncritical coverage in Medscape Medical News provides a fairly accurate summary, complete with direct quotes of lavish self-praise distributed by the advocacy-affiliated investigators and echoed in seemingly tightly coordinated commentaries.

The praise one of the two senior authors heaped upon their two studies as captured in Medscape Medical News and echoed elsewhere:

The new findings have “the potential to transform the care of cancer patients with psychological and existential distress, but beyond that, it potentially provides a completely new model in psychiatry of a medication that works rapidly as both an antidepressant and anxiolytic and has sustained benefit for months,” Stephen Ross, MD, director of Substance Abuse Services, Department of Psychiatry, New York University (NYU), Langone Medical Center, told Medscape Medical News.

And:

“That is potentially earth shattering and a big paradigm shift within psychiatry,” Dr Ross told Medscape Medical News.

The Hopkins Study

Griffiths RR, Johnson MW, Carducci MA, Umbricht A, Richards WA, Richards BD, Cosimano MP, Klinedinst MA. Psilocybin produces substantial and sustained decreases in depression and anxiety in patients with life-threatening cancer: A randomized double-blind trial. Journal of Psychopharmacology. 2016 Dec 1;30(12):1181-97.

The trial’s available registration is at ClinicalTrial.gov is available here.

The trial’s website is rather drab and typical for clinical trials. It contrasts sharply with the slick PR of the website for the NYU trial . The latter includes a gushy, emotional  video from a clinical psychologist participating as a patient in the study.  She delivers a passionate pitch for the “wonderful ritual” of the transformative experimental session. You can also get a sense of how session monitor structured the session and cultivated positive expectations. You also get a sense of the psilocybin experience being slickly marketed to appeal to the same well-heeled patients who pay out-of-pocket for complementary and alternative medicine at integrative medicine centers.

Conflict of interest

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Roland Griffiths is on the Board of Directors of the Heffter Research Institute.

Heffter Research Institute is listed as one of the funders of the study.

The introduction.

 The Hopkins study starts with some familiar claims from psycho-oncology ] that portray cancer as a mental health issue. The exaggerated estimates of 40% of cancer patients experiencing a mood disorder is arrived at by lumping adjustment reactions with a smaller proportion of diagnoses of generalized anxiety and major depression.

The introduction contradicts a large body of literature that suggests that the prevalence of mental disorder in cancer patients is no greater than other chronic health conditions  and may approximate what is found in primary care waiting rooms . There is also a fundamental confusion of psychological distress associated with diagnosis of cancer with psychiatric disorder in need of treatment. Much of the initial psychological distress in cancer patients resolves in a short time, making it difficult to demonstrate benefits of treatment beyond this natural trajectory of decline. Prescription of an antidepressant would be ineffective and inappropriate.

The introduction ends with a strong claim to the rigor and experimental control exercised in the clinical trial:

The present study provides the most rigorous evaluation to date of the efficacy of a classic hallucinogen for treatment of depressed mood and anxiety in psychologically distressed cancer patients. The study evaluated a range of clinically relevant measures using a double-blind cross-over design to compare a very low psilocybin dose (intended as a placebo) to a moderately high psilocybin dose in 51 patients under conditions that minimized expectancy effects.

The methods and results

In a nutshell: Despite claims to the contrary, this study cannot be considered a blinded study. At the six month follow-up, which is the outcome assessment point of greatest interest, it could no longer meaningfully considered a randomized trial. All benefits of randomization were lost. In addition, the effects of psilocybin were confounded with a woo psychotherapy in which positive expectations and support were provided and reinforced in a way that likely influenced assessments of outcome. Outcomes at six months also reflected changes in distress which would’ve occurred in the absence of treatment. The sample is inappropriate for generalizations about the treatment of major depression and generalized anxiety. The characterization of patients as facing impending death is inaccurate.

 The study involved a crossover design, which provides a lower level of evidence than a placebo controlled comparison study. The study compared a high psilocybin dose (22 or 30 mg/70 kg) with a low dose (1 or 3 mg/70 kg) administered in identically appearing capsules. While the low dose might not be homeopathic, it can be readily distinguished soon after administration from the larger dosage. The second drug administration occurred approximately 5 weeks later. Not surprisingly, with the high difference in dosage, session monitors who were supposedly blinded readily identified the group to which the participant they were observing had been assigned.

Within a cross over design, the six month follow-up data basically attributed any naturalistic decline in distress to the drug treatments. As David Colquhoun would argue, any estimate of the effects of the drug was inflated by including regression to the mean and get-better anyway effects. Furthermore, the focus on outcomes at six months meant patients assigned to either group in the crossover design had received high dosage psilocybin by at least five weeks into the study. Any benefits of randomization were lost.

Like the NYU study, the study Johns Hopkins involves selecting a small, unrepresentative sample of a larger group responding to a mixed recruitment strategy utilizing flyers, the internet, and physician referral.

  • Less than 10% of the cancer patients calling in were randomized.
  • Almost half of the final sample were currently using marijuana and, similarly, almost half had used hallucinogens in the past.
  • The sample is relatively young for cancer patients and well educated. More than half had postgraduate education, almost all were white, but there were two black people.
  • The sample is quite heterogeneous with respect to psychiatric diagnoses, with almost half having an adjustment disorder, and the rest anxiety and mood disorders.
  • In terms of cancer diagnoses and staging, it was also a select and heterogeneous group with only about a quarter having recurrent/metastatic disease with less than two years of expected survival. This suggests the odd “life-threatening” in the title is misleading.

Any mental health effects of psilocybin as a drug are inseparable from the effects of accompanying psychotherapy designed by a clinical psychologist “with extensive experience in studies of classic hallucinogens.” Participants met with that “session monitor” several times before the session in which the psilocybin was ingested in the monitor guided and aided in the interpretation of the drug experience. Aside from providing therapy, the session monitor instructed the patient to have positive expectations before the ingestion of the drug and work to maintain these expectations throughout the experience.

I found this psychotherapeutic aspect of the trial strikingly similar to one that was included in a trial of homeopathy in Germany that I accepted for publication in PLOS One. [See here for my rationale for accepting the trial and the ensuing controversy.] Trials of alternative therapies notoriously have such an imbalance of nonspecific placebo factors favoring the intervention group.

The clinical trial registration indicates that the primary outcome was the Pahnke-Richards Mystical Experience Questionnaire. This measure is included among 20 participant questionnaires listed in the Table 3 in the article as completed seven hours after administration of psilocybin. Although I haven’t reviewed all of these measures, I’m skeptical about their psychometric development, intercorrelation, and validation beyond face validity. What possibly could be learned from administering such a battery?

The authors make unsubstantiated assumptions in suggesting that these measures either individually or collectively capture mediation of later response assessed by mental health measures. A commentary echoed this:

Mediation analysis indicates that the mystical experience was a significant mediator of the effects of psilocybin dose on therapeutic outcomes.

But one of the authors of the commentary later walked that back with a statement to Medscape Medical News:

As for the mystical experiences that some patients reported, it is not clear whether these are “a cause, consequence or corollary of the anxiolytic effect or unconstrained cognition.”

Clinical outcomes at six months are discussed in terms of multiple measures derived from the unblinded, clinician-rated Hamilton scales. However, there are repeated references to box scores of the number of significant findings from at least 17 clinical measures (for instance, significant effects for 11 of the 17 measures), in addition to other subjective patient and significant-other measures. It is unclear why the authors would choose to administer so many measures that are highly likely intercorrelated.

There were no adverse events attributed to administration of psilocybin, and while there were a number of adverse psychological effects during the session with the psilocybin, none were deemed serious.

My summary evaluation

The clinical trial registration indicates broad inclusion criteria which may suggest the authors anticipated difficulty in recruiting patients that had significant psychiatric disorder for which psychotropic medication would be appropriate, as well as difficulty obtaining cancer patients that actually had poorer prognoses. Regardless, descriptions of the study is focusing on anxiety and depression and on “life-threatening” cancer seem to be marketing. You typically do not see a mixed sample with a large proportion of adjustment reaction characterized in the title of a psychiatric journal as treatment of “anxiety” and “depression”. You typically do not see a the adjective “life-threatening” in the title of an oncology article with such a mixed sample of cancer patients.

The authors could readily have anticipated that at the six-month assessment point of interest that they no longer had a comparison they could have been described as a rigorous double-blind, randomized trial. They should have thought through exactly what was being controlled by a control comparison group of a minimal dose of psilocybin. They should have been clearer that they were not simply evaluating psilocybin, but psilocybin administered in the context of a psychotherapy and an induction of strong positive expectations and promise of psychological support.

The finding of a lack of adverse events is consistent with a large literature, but is contradicted in the way the study is described to the media.

The accompanying editorial and commentary

Medscape Medical News reports the numerous commentaries accompanies these two clinical trials were hastily assembled. Many of the commentaries read that way, with the authors uncritically passing on the psilocybin authors’ lavish self praise of their work, after a lot of redundant recounts of the chemical nature of psilocybin and its history in psychiatry. When I repeatedly encountered claims that these trials represented rigorous, double blinded clinical trials or suggestions that the cancer was in a terminal phase, I assumed that the authors had not read the studies, only the publicity material, or simply had suspended all commitment to truth.

harmsI have great admiration for David Nutt  and respect his intellectual courage in campaigning for the decriminalization of recreational drugs, even when he knew that it would lead to his dismissal as chairman of the UK’s Advisory Council on the Misuse of Drugs (ACMD). He has repeatedly countered irrationality and prejudice with solid evidence. His graph depicting the harms of various substances to the uses and others deserves the wide distribution that it has received.

He ends his editorial with praise for the two trials as “the most rigorous double-blind placebo-controlled trials of a psychedelic drug in the past 50 years.” I’ll give him a break and assume that that reflects his dismal assessment of the quality of the other trials. I applaud his declaration, available nowhere else in the commentaries that:

There was no evidence of psilocybin being harmful enough to be controlled when it was banned, and since then, it has continued to be used safely by millions of young people worldwide with a very low incidence of problems. In a number of countries, it has remained legal, for example in Mexico where all plant products are legal, and in Holland where the underground bodies of the mushrooms (so-called truffles) were exempted from control.

His description of the other commentaries accompanying the two trials is apt:

The honours list of the commentators reads like a ‘who’s who’ of American and European psychiatry, and should reassure any waverers that this use of psilocybin is well within the accepted scope of modern psychiatry. They include two past presidents of the American Psychiatric Association (Lieberman and Summergrad) and the past-president of the European College of Neuropsychopharmacology (Goodwin), a previous deputy director of the Office of USA National Drug Control Policy (Kleber) and a previous head of the UK Medicines and Healthcare Regulatory Authority (Breckenridge). In addition, we have input from experienced psychiatric clinical trialists, leading pharmacologists and cancer-care specialists. They all essentially say the same thing..

The other commentaries. I do not find many of the commentaries worthy of further comment. However, one by Guy M Goodwin, Psilocybin: Psychotherapy or drug? Is unusual in offering even mild skepticism about the way the investigators are marketing their claims:

The authors consider this mediating effect as ‘mystical’, and show that treatment effects correlate with a subjective scale to measure such experience. The Oxford English Dictionary defines mysticism as ‘belief that union with or absorption into the Deity or the absolute, or the spiritual apprehension of knowledge inaccessible to the intellect, may be attained through contemplation and self-surrender’. Perhaps a scale really can measure a relevant kind of experience, but it raises the caution that the investigation of hallucinogens as treatments may be endangered by grandiose descriptions of their effects and unquestioning acceptance of their value.

The commentary by former president of the American Psychiatric Association Paul Summergrad, Psilocybin in end of life care: Implications for further research shamelessly echoes the psychobabble and self-promotion of the authors of the trials:

The experiences of salience, meaningfulness, and healing that accompanied these powerful spiritual experiences and that were found to be mediators of clinical response in both of these carefully performed studies are also important to understand in their own right and are worthy of further study and contemplation. None of us are immune from the transitory nature of human life, which can bring fear and apprehension or conversely a real sense of meaning and preciousness if we carefully number our days. Understanding where these experiences fit in healing, well-being, and our understanding of consciousness may challenge many aspects of how we think about mental health or other matters, but these well-designed studies build upon a recent body of work that confronts us squarely with that task.

Coverage in of the two studies in the media

The website for Heffter Research Institute  provides a handy set of links to some of the press coverage of the studies have received. There’s remarkable sameness to the portrayal of the study in the media, suggesting that journalists stayed closely to the press releases, except occasionally supplementing these with direct quotes from the authors. The appearance of a solicitation of independent evaluation of the trial almost entirely dependent on the commentaries published with the two articles.

There’s a lot of slick marketing by the two studies’ authors. In addition to what I wrote noted earlier in the blog, there are recurring unscientific statements marketing the psilocybin experience:

“They are defined by a sense of oneness – people feel that their separation between the personal ego and the outside world is sort of dissolved and they feel that they are part of some continuous energy or consciousness in the universe. Patients can feel sort of transported to a different dimension of reality, sort of like a waking dream.

There are also recurring distinct efforts to keep the psilocybin experience under the control of psychiatrists and woo clinical psychologists:

The new studies, however, suggest psilocybin be used only in a medical setting, said Dr. George Greer, co-founder, medical director and secretary at the Heffter Research Institute in Santa Fe, New Mexico, which funded both studies.

“Our focus is scientific, and we’re focused on medical use by medical doctors,” Greer said at the news conference. “This is a special type of treatment, a special type of medicine. Its use can be highly controlled in clinics with specially trained people.”

He added he doubts the drug would ever be distributed to patients to take home.

There are only rare admissions from an author of one of the studies that:

The results were similar to those they had found in earlier studies in healthy volunteers. “In spite of their unique vulnerability and the mood disruption that the illness and contemplation of their death has prompted, these participants have the same kind of experiences, that are deeply meaningful, spiritually significant and producing enduring positive changes in life and mood and behaviour,” he said.

If psilocybin is so safe and pleasant to ingest…

I think the motion of these studies puts ingestion of psilocybin on the path to being allowed in nicely furnished integrative cancer centers. In that sense psilocybin could become a gateway drug to quack services such as acupuncture, reiki, and energy-therapy therapeutic touch.

I’m not sure that demand would be great except among previous users of psychedelics and current users of cannabis.

But should psilocybin remain criminalized outside of cancer centers where wealthy patients can purchase a diagnosis of adjustment reaction from a psychiatrist? Cancer is not especially traumatic and PTSD is almost as common in the waiting rooms of primary care physicians. Why not extend to primary care physicians the option of prescribing psilocybin to their patients? What would be accomplished is that the purity could be assured. But why should psilocybin use being limited to mental health conditions, once we accept that a diagnosis of adjustment reaction is such a distorted extension of the term? Should we exclude patients who are atheists and only wants a satisfying experience, not a spiritual one?

Experience in other countries suggests that psilocybin can safely be ingested in a supportive, psychologically safe environment. Why not allow cancer patients and others to obtain psilocybin with assured purity and dosage? They could then ingest it in the comfort of friends and intimate partners who have been briefed on how the experience needs to be managed. The patients in the studies were mostly not facing immediate death from terminal cancer. But should we require that persons need to be dying in order to have a psilocybin experience without the risk of criminal penalties? Why not allow psilocybin to be ingested in the presence of pastoral counselors or priests whose religious beliefs are more congruent with the persons seeking such experiences than are New York City psychiatrists?

 

 

 

 

 

Unintended consequences of universal mindfulness training for schoolchildren?

the mindful nationThis is the first installment of what will be a series of occasional posts about the UK Mindfulness All Party Parliamentary Group report,  Mindful Nation.

  • Mindful Nation is seriously deficient as a document supposedly arguing for policy based on evidence.
  • The professional and financial interests of lots of people involved in preparation of the document will benefit from implementation of its recommendations.
  • After an introduction, I focus on two studies singled in Mindful Nation out as offering support for the benefits of mindfulness training for school children.
  • Results of the group’s cherrypicked studies do not support implementation of mindfulness training in the schools, but inadvertently highlight some issues.
  • Investment in universal mindfulness training in the schools is unlikely to yield measurable, socially significant results, but will serve to divert resources from schoolchildren more urgently in need of effective intervention and support.
  • Mindfulness Nation is another example of  delivery of  low intensity  services to mostly low risk persons to the detriment of those in greatest and most urgent need.

The launch event for the Mindful Nation report billed it as the “World’s first official report” on mindfulness.

Mindful Nation is a report written by the UK Mindfulness All-Party Parliamentary Group.

The Mindfulness All-Party Parliamentary Group (MAPPG)  was set up to:

  • review the scientific evidence and current best practice in mindfulness training
  • develop policy recommendations for government, based on these findings
  • provide a forum for discussion in Parliament for the role of mindfulness and its implementation in public policy.

The Mindfulness All-Party Parliamentary Group describes itself as

Impressed by the levels of both popular and scientific interest, and launched an inquiry to consider the potential relevance of mindfulness to a range of urgent policy challenges facing government.

Don’t get confused by this being a government-commissioned report. The report stands in sharp contrast to one commissioned by the US government in terms of unbalanced constitution of the committee undertaking the review, and lack  of transparency in search for relevant literature,  and methodology for rating and interpreting of the quality of available evidence.

ahrq reportCompare the claims of Mindful Nation to a comprehensive systematic review and meta-analysis prepared for the US Agency for Healthcare Research and Quality (AHRQ) that reviewed 18,753 citations, and found only 47 trials (3%) that included an active control treatment. The vast majority of studies available for inclusion had only a wait list or no-treatment control group and so exaggerated any estimate of the efficacy of mindfulness.

Although the US report was available to those  preparing the UK Mindful Nation report, no mention is made of either the full contents of report or a resulting publication in a peer-reviewed journal. Instead, the UK Mindful Nation report emphasized narrative and otherwise unsystematic reviews, and meta-analyses not adequately controlling for bias.

When the abridged version of the AHRQ report was published in JAMA: Internal Medicine, an accompanying commentary raises issues even more applicable to the Mindful Nation report:

The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?

The members of the UK Mindfulness All-Party Parliamentary Group were selected for their positive attitude towards mindfulness. The collection of witnesses they called to hearings were saturated with advocates of mindfulness and those having professional and financial interests in arriving at a positive view. There is no transparency in terms of how studies or testimonials were selected, but the bias is notable. Many of the scientific studies were methodologically poor, if there was any methodology at all. Many were strongly stated, but weakly substantiated opinion pieces. Authors often included those having  financial interests in obtaining positive results, but with no acknowledgment of conflict of interest. The glowing testimonials were accompanied by smiling photos and were unanimous in their praise of the transformative benefits of mindfulness.

As Mark B. Cope and David B. Allison concluded about obesity research, such a packing of the committee and a highly selective review of the literature leads to a ”distortion of information in the service of what might be perceived to be righteous ends.” [I thank Tim Caulfield for calling this quote to my attention].

Mindfulness in the schools

The recommendations of Mindfulness Nation are

  1. The Department for Education (DfE) should designate, as a first step, three teaching schools116 to pioneer mindfulness teaching,co-ordinate and develop innovation, test models of replicability and scalability and disseminate best practice.
  2. Given the DfE’s interest in character and resilience (as demonstrated through the Character Education Grant programme and its Character Awards), we propose a comparable Challenge Fund of £1 million a year to which schools can bid for the costs of training teachers in mindfulness.
  3. The DfE and the Department of Health (DOH) should recommend that each school identifies a lead in schools and in local services to co-ordinate responses to wellbeing and mental health issues for children and young people117. Any joint training for these professional leads should include a basic training in mindfulness interventions.
  4. The DfE should work with voluntary organisations and private providers to fund a freely accessible, online programme aimed at supporting young people and those who work with them in developing basic mindfulness skills118.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.
Payoff of Mindful Nation to Oxford Mindfulness Centre will be huge.

Leading up to these recommendations, the report outlined an “alarming crisis” in the mental health of children and adolescents and proposes:

Given the scale of this mental health crisis, there is real urgency to innovate new approaches where there is good preliminary evidence. Mindfulness fits this criterion and we believe there is enough evidence of its potential benefits to warrant a significant scaling-up of its availability in schools.

Think of all the financial and professional opportunities that proponents of mindfulness involved in preparation of this report have garnered for themselves.

Mindfulness to promote executive functioning in children and adolescents

For the remainder of the blog post, I will focus on the two studies cited in support of the following statement:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

The terms “executive control” and “emotional stability” were clarified:

Many argue that the most important prerequisites for child development are executive control (the management of cognitive processes such as memory, problem solving, reasoning and planning) and emotion regulation (the ability to understand and manage the emotions, including and especially impulse control). These main contributors to self-regulation underpin emotional wellbeing, effective learning and academic attainment. They also predict income, health and criminality in adulthood69. American psychologist, Daniel Goleman, is a prominent exponent of the research70 showing that these capabilities are the biggest single determinant of life outcomes. They contribute to the ability to cope with stress, to concentrate, and to use metacognition (thinking about thinking: a crucial skill for learning). They also support the cognitive flexibility required for effective decision-making and creativity.

Actually, Daniel Goleman is the former editor of the pop magazine Psychology Today and an author of numerous pop books.

The first cited paper.

73 Flook L, Smalley SL, Kitil MJ, Galla BM, Kaiser-Greenland S, Locke J, et al. Effects of mindful  awareness practices on executive functions in elementary school children. Journal of Applied School Psychology. 2010;26(1):70-95.

Journal of Applied School Psychology is a Taylor-Francis journal, formerly known as Special Services in the Schools (1984 – 2002).  Its Journal Impact Factor is 1.30.

One of the authors of the article, Susan Kaiser-Greenland is a mindfulness entrepreneur as seen in her website describing her as an author, public speaker, and educator on the subject of sharing secular mindfulness and meditation with children and families. Her books are The Mindful Child: How to Help Your Kid Manage Stress and Become Happier, Kinder, and More Compassionate and Mindful Games: Sharing Mindfulness and Meditation with Children, Teens, and Families and the forthcoming The Mindful Games Deck: 50 Activities for Kids and Teens.

This article represents the main research available on Kaiser-Greenfield’s Inner Kids program and figures prominently in her promotion of her products.

The sample consisted of 64 children assigned to either mindful awareness practices (MAPs; n = 32) or a control group consisting of a silent reading period (n = 32).

The MAPs training used in the current study is a curriculum developed by one of the authors (SKG). The program is modeled after classical mindfulness training for adults and uses secular and age appropriate exercises and games to promote (a) awareness of self through sensory awareness (auditory, kinesthetic, tactile, gustatory, visual), attentional regulation, and awareness of thoughts and feelings; (b) awareness of others (e.g., awareness of one’s own body placement in relation to other people and awareness of other people’s thoughts and feelings); and (c) awareness of the environment (e.g., awareness of relationships and connections between people, places, and things).

A majority of exercises involve interactions among students and between students and the instructor.

Outcomes.

The primary EF outcomes were the Metacognition Index (MI), Behavioral Regulation Index (BRI), and Global Executive Composite (GEC) as reported by teachers and parents

Wikipedia presents the results of this study as:

The program was delivered for 30 minutes, twice per week, for 8 weeks. Teachers and parents completed questionnaires assessing children’s executive function immediately before and following the 8-week period. Multivariate analysis of covariance on teacher and parent reports of executive function (EF) indicated an interaction effect baseline EF score and group status on posttest EF. That is, children in the group that received mindful awareness training who were less well regulated showed greater improvement in EF compared with controls. Specifically, those children starting out with poor EF who went through the mindful awareness training showed gains in behavioral regulation, metacognition, and overall global executive control. These results indicate a stronger effect of mindful awareness training on children with executive function difficulties.

The finding that both teachers and parents reported changes suggests that improvements in children’s behavioral regulation generalized across settings. Future work is warranted using neurocognitive tasks of executive functions, behavioral observation, and multiple classroom samples to replicate and extend these preliminary findings.”

What I discovered when I scrutinized the study.

 This study is unblinded, with students and their teachers and parents providing the subjective ratings of the students well aware of which group students are assigned. We are not given any correlations among or between their ratings and so we don’t know whether there is just a global subjective factor (easy or difficult child, well-behaved or not) operating for either teachers or parents, or both.

It is unclear for what features of the mindfulness training the comparison reading group offers control or equivalence. The two groups are  different in positive expectations and attention and support that are likely to be reflected the parent and teacher ratings. There’s a high likelihood of any differences in outcomes being nonspecific and not something active and distinct ingredient of mindfulness training. In any comparison with the students assigned to reading time, students assigned to mindfulness training have the benefit of any active ingredient it might have, as well as any nonspecific, placebo ingredients.

This is exceedingly weak design, but one that dominates evaluations of mindfulness.

With only 32 students per group, note too that this is a seriously underpowered study. It has less than a 50% probability of detecting a moderate sized effect if one is present. And because of the larger effect size needed to achieve statistical significance with such a small sample size, and statistically significant effects will be large, even if unlikely to replicate in a larger sample. That is the paradox of low sample size we need to understand in these situations.

Not surprisingly, there were no differences between the mindfulness and reading control groups on any outcomes variable, whether rated by parents or teachers. Nonetheless, the authors rescued their claims for an effective intervention with:

However, as shown by the significance of interaction terms, baseline levels of EF (GEC reported by teachers) moderated improvement in posttest EF for those children in the MAPs group compared to children in the control group. That is, on the teacher BRIEF, children with poorer initial EF (higher scores on BRIEF) who went through MAPs training showed improved EF subsequent to the training (indicated by lower GEC scores at posttest) compared to controls.

Similar claims were made about parent ratings. But let’s look at figure 3 depicting post-test scores. These are from the teachers, but results for the parent ratings are essentially the same.

teacher BRIEF quartiles

Note the odd scaling of the X axis. The data are divided into four quartiles and then the middle half is collapsed so that there are three data points. I’m curious about what is being hidden. Even with the sleight-of-hand, it appears that scores for the intervention and control groups are identical except for the top quartile. It appears that just a couple of students in the control group are accounting for any appearance of a difference. But keep in mind that the upper quartile is only a matter of eight students in each group.

This scatter plot is further revealing:

teacher BRIEF

It appears that the differences that are limited to the upper quartile are due to a couple of outlier control students. Without them, even the post-hoc differences that were found in the upper quartile between intervention control groups would likely disappear.

Basically what we are seeing is that most students do not show any benefit whatsoever from mindfulness training over being in a reading group. It’s not surprising that students who were not particularly elevated on the variables of interest do not register an effect. That’s a common ceiling effect in such universally delivered interventions in general population samples

Essentially, if we focus on the designated outcome variables, we are wasting the students’ time as well as that of the staff. Think of what could be done if the same resources could be applied in more effective ways. There are a couple of students in in this study were outliers with low executive function. We don’t know how else they otherwise differ.Neither in the study, nor in the validation of these measures is much attention given to their discriminant validity, i.e., what variables influence the ratings that shouldn’t. I suspect strongly that there are global, nonspecific aspects to both parent and teacher ratings such that they are influenced by the other aspects of these couple of students’ engagement with their classroom environment, and perhaps other environments.

I see little basis for the authors’ self-congratulatory conclusion:

The present findings suggest that mindfulness introduced in a general  education setting is particularly beneficial for children with EF difficulties.

And

Introduction of these types of awareness practices in elementary education may prove to be a viable and cost-effective way to improve EF processes in general, and perhaps specifically in children with EF difficulties, and thus enhance young children’s socio-emotional, cognitive, and academic development.

Maybe the authors stared with this conviction and it was unshaken by disappointing findings.

Or the statement made in Mindfulness Nation:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

But we have another study that is cited for this statement.

74. Huppert FA, Johnson DM. A controlled trial of mindfulness training in schools: The importance of practice for an impact on wellbeing. The Journal of Positive Psychology. 2010; 5(4):264-274.

The first author, Felicia Huppert is a  Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University.

This study involved 173 14- and 15- year old  boys from a private Catholic school.

The Journal of Positive Psychology is not known for its high methodological standards. A look at its editorial board suggests a high likelihood that manuscripts submitted will be reviewed by sympathetic reviewers publishing their own methodologically flawed studies, often with results in support of undeclared conflicts of interest.

The mindfulness training was based on the program developed by Kabat-Zinn and colleagues at the University of Massachusetts Medical School (Kabat-Zinn, 2003). It comprised four 40 minute classes, one per week, which presented the principles and practice of mindfulness meditation. The mindfulness classes covered the concepts of awareness and acceptance, and the mindfulness practices included bodily awareness of contact points, mindfulness of breathing and finding an anchor point, awareness of sounds, understanding the transient nature of thoughts, and walking meditation. The mindfulness practices were built up progressively, with a new element being introduced each week. In some classes, a video clip was shown to highlight the practical value of mindful awareness (e.g. “The Last Samurai”, “Losing It”). Students in the mindfulness condition were also provided with a specially designed CD, containing three 8-minute audio files of mindfulness exercises to be used outside the classroom. These audio files reflected the progressive aspects of training which the students were receiving in class. Students were encouraged to undertake daily practice by listening to the appropriate audio files. During the 4-week training period, students in the control classes attended their normal religious studies lessons.

A total of 155 participants had complete data at baseline and 134 at follow-up (78 in the mindfulness and 56 in the control condition). Any student who had missing data are at either time point was simply dropped from the analysis. The effects of this statistical decison are difficult to track in the paper. Regardless, there was a lack of any difference between intervention and control group and any of a host of outcome variables, with none designated as primary outcome.

Actual practicing of mindfulness by students was inconsistent.

One third of the group (33%) practised at least three times a week, 34.8% practised more than once but less than three times a week, and 32.7% practised once a week or less (of whom 7 respondents, 8.4%, reported no practice at all). Only two students reported practicing daily. The practice variable ranged from 0 to 28 (number of days of practice over four weeks). The practice variable was found to be highly skewed, with 79% of the sample obtaining a score of 14 or less (skewness = 0.68, standard error of skewness = 0.25).

The authors rescue their claim of a significant effect for the mindfulness intervention with highly complex multivariate analyses with multiple control variables in which outcomes within-group effects for students assigned to mindfulness  were related to the extent of students actually practicing mindfulness. Without controlling for the numerous (and post-hoc) multiple comparisons, results were still largely nonsignificant.

One simple conclusion that can be drawn is that despite a lot of encouragement, there was little actual practice of mindfulness by the relatively well-off students in a relatively highly resourced school setting. We could expect results to improve with wider dissemination to schools with less resources and less privileged students.

The authors conclude:

The main finding of this study was a significant improvement on measures of mindfulness and psychological well-being related to the degree of individual practice undertaken outside the classroom.

Recall that Mindful Nation cited the study in the following context:

What is of particular interest is that those with the lowest levels of executive control73 and emotional stability74 are likely to benefit most from mindfulness training.

These are two methodologically weak studies with largely null findings. They are hardly the basis for launching a national policy implementing universal mindfulness in the schools.

As noted in the US AHRQ report, despite a huge number of studies of mindfulness having been conducted, few involved a test with an adequate control group, and so there’s little evidence that mindfulness has any advantage over any active treatment. Neither of these studies disturbed that conclusion, although they are spun both in the original studies and in the Mindful Nation report to be positive. Both papers were published in journals where the reviewers were likely to be overly sympathetic and not at him tentative to serious methodological and statistical problems.

The committee writing Mindful Nation arrived at conclusions consistent with their prior enthusiasm for mindfulness and their vested interest in it. They sorted through evidence to find what supported their pre-existing assumptions.

Like UK resilience programs, the recommendations of Mindful Nation put considerable resources in the delivery of services to a large population and likely to have the threshold of need to register a socially in clinically significant effect. On a population level, results of the implementation are doomed to fall short of its claims. Those many fewer students in need more timely, intensive, and tailored services are left underserved. Their presence is ignored or, worse, invoked to justify the delivery of services to the larger group, with the needy students not benefiting.

In this blog post, I mainly focused on two methodologically poor studies. But for the selection of these particular studies, I depended on the search of the authors of Mindful Nation and the emphasis that were given to these two studies for some sweeping claims in the report. I will continue to be writing about the recommendations of Mindful Nation. I welcome reader feedback, particularly from readers whose enthusiasm for mindfulness is offended. But I urge them not simply to go to Google and cherry pick an isolated study and ask me to refute its claims.

Rather, we need to pay attention to the larger literature concerning mindfulness, its serious methodological problems, and the sociopolitical forces and vested interests that preserve a strong confirmation bias, both in the “scientific” literature and its echoing in documents like Mindful Nation.

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.

 

An open-minded, skeptical look at the success of “zero suicides”: Any evidence beyond the rhetoric?

  • Claims are spreading across social media that a goal of zero suicides can be achieved by radically re-organizing resources in health systems and communities. Extraordinary claims require extraordinary evidence.
  • I thoroughly searched for evidence backing claims of “zero suicides” being achieved.
  • The claims came up short, after expectations were initially raised by some statistics and a provocative graph. But any persuasiveness to these details quickly dissipated when they were scrutinized. Lesson: Abstract numbers and graphs are not necessarily quality evidence and dazzling ones can obscure a lack of evidence.
  • The goal of “zero suicides” has attracted support of Pharma and generated programs around the world, with little fidelity to the original concept developed in the  Henry Ford Health System in Detroit. In many contexts in which it is now being invoked, “zero suicides” is a vacuous buzz term, not a coherent, organizational strategy
  • Preventing suicide is a noble goal to which a lot of emotion gets attached. It also creates lucrative financial opportunities and attracts vested interests which often simply repackage existing programs for resale.
  • How can anyone oppose the idea that we should eliminate suicide? Clever sloganeering can stifle criticism and suppress embarrassing evidence to the contrary
  • Yet, we should not be bullied, nor distracted by slogans from our usual, skeptical insistence on those who make strong claims having the burden to provide strong evidence.
  • Deaths by suicide are statistically infrequent, poorly predicted events that occur in troubled contexts of interpersonal and institutional breakdown. These aspects can frustrate efforts to eliminate suicide entirely – or even accurately track these deaths.
  • Eliminating deaths by suicide is only very loosely analogous to wiping out polio and lots of pitfalls await those who get confused by a false equivalence.
  • Pursuit of the goal of “zero suicides,” particularly in under-resourced and not well-organized community settings can have unintended, negative consequences.
  • “Zero suicides” is likely a fad, to be replaced by next year’s fashion or maybe a few years after.
  • We need to step back and learn from the rise and fall of slogans and the unintended impact on distribution of scarce resources and the costs to human well-being.
  • My take away message is that increasingly sophisticated and even coercive communications about clinical and public health policies often harness the branding of prestigious medical journals. Interpreting these claims require a matching skepticism, critical thinking skills, and renewed demands for evidence.

Beginning the search for evidence for the slogan “Zero Sucide.”

zero tweetNumerous gushy tweets about achieving “zero suicides” drew me into a search for more information. I easily traced the origins of the campaign to a program at the Henry Ford Health System, a Detroit-based HMO, but the concept has now gone thoroughly international. My first Google Scholar search did not yield quality evidence from any program evaluations, but a subsequent Google search produced exceptionally laudatory and often self-congratulatory statements.

I briefly diverted my efforts to contacting authorities whom I expected might comment about “zero suicides.” Some indicated a lack of familiarity prevented them from commenting, but others were as evasive as establishment Republicans asked about Donald Trump. One expert, however, was forthcoming with an interesting article, which proved to have just right tone.  I recommend:

Kutcher S, Wei Y, Behzadi P. School-and Community-Based Youth Suicide Prevention Interventions Hot Idea, Hot Air, or Sham?. The Canadian Journal of Psychiatry. 2016 Jul 12:0706743716659245.

Continuing my search, I found numerous links to other articles, including a laudatory, Medical News and Perspectives opinion piece in JAMA behind a readily circumvented pay wall. There was also a more accessible source with a branding by New England Journal of Medicine.

Clicking on these links, I found editorial and even blatantly promotional material, not randomized trials or other quality evidence.

This kind of non-evidence-based publicity in highly visible medical journals is extraordinary in itself, although not unprecedented. Increasingly, the brand of particular medical journals is sold and harnessed to bestow special credibility on political and financial interests, has seen in 1 and 2.

NEJM Catalyst: How We Dramatically Reduced Suicide.

 NEJM Catalyst is described as bringing

Health care executives, clinician leaders, and clinicians together to share innovative ideas and practical applications for enhancing the value of health care delivery.

0 suicide takeaway
From NEJM Catalyst

The claim of “zero suicides” originated in the Perfect Care for Depression in a division of the Henry Ford Health System.

The audacious goal of zero suicides was part of the Behavioral Health Services division’s larger goal to develop a system of perfect care for depression. Our roadmap for transformation was the Quality Chasm report, which defined six dimensions of perfect care: safety, timeliness, effectiveness, efficiency, equity, and patient-centeredness. We set perfection goals and metrics for each dimension, with zero suicides being the perfection goal for effectiveness. Very quickly, however, our team seized on zero suicides as the overarching goal for our entire transformation.

The strategies:

We used three key strategies to achieve this goal. The first two — improving access to care and restricting access to lethal means of suicide — are evidence-based interventions to reduce suicide risk. While we had pursued these strategies in the past, setting the target at zero suicides injected our team with gumption. To improve access to care, we developed, implemented, and tested new models of care, such as drop-in group visits, same-day evaluations by a psychiatrist, and department-wide certification in cognitive behavior therapy. This work, once messy and arduous for the PDC team, became creative, fun, and focused. To reduce access to lethal means of suicide, we partnered with patients and families to develop new protocols for weapons removal. We also redesigned the structure and content of patient encounters to reflect the assumption that every patient with a mental illness, even if that illness is in remission, is at increased risk of suicide. Therefore, we eliminated suicide screens and risk stratification tools that yielded non-actionable results, freeing up valuable time. Eventually, each of these approaches was incorporated into the electronic health record as decision support.

The third strategy:

…The pursuit of perfection was not possible without a just culture for our internal team. Ultimately, we found this the most important strategy in achieving zero suicides. Since our goal was to achieve radical transformation, not just to tweak the margins, PDC staff couldn’t justly be punished if they came up short on these lofty goals. We adopted a root cause analysis process that treated suicide events equally as tragedies and learning opportunities.

Process of patient care described in JAMA

What happens to a patient being treated in the context of Perfect Depression Care is described in the JAMA  piece:

Each patient seen through the BHS is first assessed and stratified on the basis of suicide risk: acute, moderate, or low. “Everyone is at risk. It’s just a matter of whether it’s acute or whether it requires attention but isn’t emergent,” said Coffey. A patient considered to be at high risk undergoes a psychiatric evaluation the same day. A patient at low risk is evaluated within 7 days. Group sessions for patients also allow individuals to connect and offer support to one another, not unlike the supportive relationships between sponsors and “sponsees” in 12-step programs

The claim of Zero Suicides, in numbers and a graph

…A dramatic and statistically significant 80% reduction in suicide that has been maintained for over a decade, including one year (2009) when we actually achieved the perfection goal of zero suicides (see the figure below). During the PDC initiative, the annual HMO network membership ranged from 182,183 to 293,228, of which approximately 60% received care through Behavioral Health Services. From 1999 to 2010, there were 160 suicides among HMO members. In 1999, as we launched PDC, the mean annual suicide rate for these mental health patients was 110.3 per 100,000. During the 11 years of the initiative, the mean annual suicide rate dropped to 36.21 per 100,000. This decrease is statistically significant and, moreover, took place while the suicide rate actually increased among non–mental health patients and among the general population of the state of Michigan.

Improved_Suicide_Rates_Among_Henry_Ford_Medical_Group_HMO_Members

[This graph conflicts a bit with a graph in NEJM Catalyst that indicates suicides in the health care system were 0 suicides for 2008 and this continued through the first quarter of 2010]

It is clear that rates of suicide fluctuate greatly from year-to-year in the health system. It also appears from the graph that for most years during the program, rates of suicide among patients in the Henry Ford Health System were substantially greater than those of the general population in Michigan, which were relatively flat. Any comparisons between the program and the general statistics for the state of Michigan are not particularly informative. Michigan is a state of enormous health care disparities. During this period, there was a large insured population. Demographics differ greatly, but patients receiving care within an HMO were a substantially more privileged group than the general population of Michigan. During this time, there were many uninsured and a lot of annual movement in and out of the Henry Ford Health System. At any one time, only 60% of the patients within the health system were enrolled in the behavioral health system in which the depression program occurred.

A substantial proportion of suicides occur with individuals who are not previously known to health systems. Such persons are more represented in the statistics for the state of Michigan. Another substantial proportion of suicides occur in individuals with weakened or recently broken contact with health systems. We don’t know how the statistics reported for the health system accommodated biased departures from the health system or simply missing data. We don’t know whether behavior related to risk of suicide affected migration into the health care system or to the small group receiving behavioral healthcare through the health system. For instance, what became of patients with a psychiatric disorder in a comorbid substance use disorder? Those who were incarcerated?

Basically, the success of the program is not obvious within the noisy fluctuation of suicides within the Henry Ford Health System or the smaller behavioral health program. We cannot control for basic confounding factors or selective enrollment and disenrollment in the health care system, or even expelling from the behavioral health system of persons at risk.

 “Zero suicides” as a literal and serious goal?

The NEJM Catalyst article gave the originator of the program free reign for self-praise.

The most unexpected hurdles were skepticism that perfection goals like zero suicides were reasonable or feasible (some objected that it was “setting us up for failure”), and disbelief in the dramatic improvements obtained (we heard comments like “results from quality improvement projects aren’t scientifically rigorous”). We addressed these concerns by ensuring the transparency of our results and lessons, by collaborating with others to continually improve our methodological issues, and by supporting teams across the world who wish to pursue similar initiatives.

Our team challenged this assumption and asked, If zero is not the right goal for suicide occurrence, then what number is? Two? Twelve? Which twelve? In spite of its radicalism — indeed because of it — the goal of zero suicides became the galvanizing force behind an effort that achieved one of the most dramatic and sustained reductions in suicide in the clinical literature.

Will the Henry Ford program prove sustainable?

Edward Coffey moved to  President, CEO, and Chief of Staff at the Menninger Clinic 18 months before his article in the NEJM Catalyst. I am curious to what aspects of his Zero Suicides/Perfect Depression Care Program are still maintained at Henry Ford. As it is described, the program was designed with admirably short waiting times for referral to behavioral healthcare. If the program persists as originally described, many professionals are kept vigilant and engaged in activities to reduce suicide without any statistical likelihood of having the opportunity to actually prevent one.

In decades of work within health systems, I have found that once demonstration projects have run their initial course, their goals are replaced by new organizational  ones and resources are redistributed. Sooner or later, competing demands for scarce resources  are promoted by new slogans.

What if Perfect Depression Care has to compete for scarce resources with Perfect Diabetes Care or alleviation of gross ethnic disparities in cardiovascular outcomes?

A lot of well-meant slogans ultimately have unintended, negative consequences. “Make pain the 5th vital sign” led to more attention being paid to previously ignored and poorly managed pain. This was followed by mandated routine assessment and intervention, which led to unnecessary procedures and unprecedented epidemic of addiction and death from prescribed opioids. “Stamp out distress” has led to mandated screening and intervention programs for psychological distress in cancer care, with high rates of antidepressant prescription without proper diagnosis or follow-up.

If taken literally and seriously, a lofty, but abstract goal like Zero Suicide becomes a threat to any “just culture” in healthcare organization. If the slogan is taken seriously as resources are inevitably withdrawn, a culture of blame will emerge and pressures to distort easily manipulated statistics. Patients posing threats to the goal of zero suicide will be excluded from the system with an unknown, but negative consequences for their morbidity and mortality.

 Bottom line – we can’t have slogan-driven healthcare policies that will likely have negative implications and conflict with evidence.

 Enter Big Pharma

Not unexpectedly, Big Pharma is getting involved in promoting Zero Suicides:

Eli Lilly and Company Foundation donates $250,000 to expand Community Health Network’s Zero Suicides prevention initiative,

Major gift will save Hoosier lives through a suicide prevention network that responds to a critical Indiana healthcare issue.

 According to press coverage, the funds will go to:

The Lilly Foundation donation also provides resources needed to build a Central Indiana crisis network that will include Indiana’s schools, foster care system, juvenile justice program, primary and specialty healthcare providers, policy makers and suicide survivors. These partners will be trained to identify people at risk of attempting suicide, provide timely intervention and quickly connect them with Community’s crisis providers. Indiana’s state government is a key partner in building the statewide crisis network.

I’m sure this effort is good for  the profits of Pharma. Dissemination of screening programs into settings that are not directly connected to quality depression care is inevitably ineffective. The main healthcare consequences are an increase in antidepressant prescriptions without appropriate diagnoses, patient education, and follow-up. Substantial overtreatment results from people being identified without proper diagnosis who otherwise would not be seeking treatment. Care for depression in the community is hardly Perfect Depression Care.

It is great publicity for Eli Lilly and the community receiving the gift will surely be grateful.

Launching Zero Suicides in English communities and elsewhere

My academic colleagues in the UK assure me that we can simply dismiss an official UK government press release about the goal of zero suicides from Nick Clegg. It has been rendered obsolete by subsequent political events. A number commented that they never took it seriously, regardless.

Nick Clegg calls for new ambition for zero suicides across the NHS

The claims in the press release stand in stark contrast to long waiting times for mental health services and important gaps in responses to serious mental health crises, including lethal suicide attempts. However, another web link is to an announcement:

Centre for Mental Health was commissioned by the East of England Strategic Clinical Networks to evaluate activity taking place in four local areas in the region through a pilot programme to extend suicide prevention into communities.

The ‘zero suicide’ initiative is based on an approach developed by Dr Ed Coffey in Detroit, Michigan. The approach aims to prevent suicides by creating a more open environment for people to talk about suicidal thoughts and enabling others to help them. It particularly aims to reach people who have not been reached through previous initiatives and to address gaps in existing provision.

Four local areas in the East of England (Bedfordshire, Cambridgeshire & Peterborough, Essex and Hertfordshire) were selected in 2013 as pathfinder sites to develop new approaches to suicide prevention. Centre for Mental Health evaluated the work of the sites during 2015.

The evaluation found an impressive range of activities that had taken suicide prevention activities out into local communities. They included:

• Training key public service staff such as GPs, police officers, teachers and housing officers
• Training others who may encounter someone at risk of taking their own life, such as pub landlords, coroners, private security staff, faith groups and gym workers
• Creating ‘community champions’ to put local people in control of activities
• Putting in place practical suicide prevention measures in ‘hot spots’ such as bridges and railways
• Working with local newspapers, radio and social media to raise awareness in the wider community
• Supporting safety planning for people at risk of suicide, involving families and carers throughout the process
• Linking with local crisis services to ensure people get speedy access to evidence-based treatments.

The report noted that some of the people who received the training had already saved lives:

“I saved a man’s life using the skills you taught us on the course. I cannot find words to properly express the gratitude I have for that. Without the training I would have been in bits. It was a very public place, packed with people – but, to onlookers, we just looked like two blokes sitting on a bench talking.”

“Déjà vu all over again”, as Yogi Berra would say. This effort also recalls Bill Murray in the movie Groundhog Day, where he is trapped into repeating the same day over and over again.

A few years ago I was a scientific advisor for European Union funded project to disseminate multilevel suicide prevention programs across Europe. One UK site was among those targeted in this report. Implementation of the EU program had already failed before the plate of snacks was being removed from a poorly attended event. The effort quickly failed because it failed to attract the support of local GPs.

Years later, I recognize many of the elements of what we tried to implement, described in language almost identical to ours. There is no mention of the training materials we left behind or of the quick failure of our attempt at implementation.

Many of the proposed measures in the UK plan serve to generate publicity and do not any evidence that they reduce suicides. For instance, training people in the community who might conceivably come in contact with a suicidal person accomplishes little other than producing good publicity. Uptake of such training is abysmally low and is not likely to affect the probability that a person in a suicidal crisis will encounter anyone who can make a difference

Broad efforts to increase uptake of mental health services in the UK strain a system already suffer from unacceptably long waiting times for services. People with any likelihood of attempting suicide, however poorly predicted, are likely to be lost among persons seeking services with less serious or pressing needs.

Thoughts I have accumulated from years of evaluating depression screening programs and suicide intervention efforts

 Staying mobilized around preventing suicide is difficult because it is an infrequent event and most activations of resources will prove to false positives.

It can be tedious and annoying for both staff and patients to keep focused on an infrequent event, particularly for the vast majority of patients who rightfully believe they are not at risk for suicide.

Resources can be drained off from less frequent, but more high risk situations that require sustained intensity of response, pragmatic innovation, and flexibility of rules.

Heightened efforts to detect mental health problems increase access for people already successfully accessing services and decrease resources for those needing special efforts. The net result can be an increase in disparities.

Suicide data are easily manipulated by ignoring selective loss to follow-up. Many suicides occur at breaks in the system, where getting follow-up data is also problematic.

Finally, death by suicide is a health outcomes that is multiply determined. It does not lend itself to targeted public health approaches like eliminating polio, tempting though invoking the analogy may be.

Postscript

It is likely  that I exposed anyone reaching this postscript to a new and disconcerting perspective. What I have been saying is  discrepant with the publicity about “zero suicides” available in the media. The portrayal of “zero suicides” is quite persuasive because it is sophisticated and well-crafted. Its dissemination is well resourced and often financed by individuals and institutions with barely discernible – if at all – conflicts of financial and political interests. Just try to find any dissenters or skeptical assessments.

My takeaway message: It’s best to process claims about suicide prevention with a high level of skepticism, an insistent demand for evidence, and a preparedness for discovering that seemingly well trusted sources are not without agendas. They are usually  providing propaganda rather than evidence-based arguments.