Creating TED talks from peer-reviewed growth mindset research papers with colored brain pictures

The TED talk fallacy – When you confuse what presenters say about a peer-reviewed article – the breathtaking, ‘breakthrough’ strength of findings demanded for a TED talk – with what a transparent, straightforward analysis and reporting of relevant findings would reveal. 

mind the brain logo

The TED talk fallacy – When you confuse what presenters say about a peer-reviewed article – the breathtaking, ‘breakthrough’ strength of findings demanded for a TED talk – with what a transparent, straightforward analysis and reporting of relevant findings would reveal. 

 fixed vs growth mind setA reminder that consumers, policymakers, and other stakeholders should not rely on TED talks for their views of what constitutes solid “science’ or “best evidence,” even when presenters are established scientists.

The authors of this modest, but overhyped paper do not give TED talks. But this article became the basis for a number of TED and TED-related talks by a psychologist who integrated a story of its findings with stories about her own publications. She has a booking agent for expensive talks and a line of self-help products. This raises the question:  Should such information routinely be a reported conflict of interests in in publications?  

We will contrast the message of  the paper under discussion in this post, along with the TED talk with a new pair of comprehensive meta analyses. The meta analyses show that growth mindset and academic achievement are weak and interventions to improve mindset are ineffectual.

The study

 Moser JS, Schroder HS, Heeter C, Moran TP, Lee YH. Mind your errors: Evidence for a neural mechanism linking growth mind-set to adaptive posterror adjustments. Psychological Science. 2011 Dec;22(12):1484-9.

 Key issues with the study.

The abstract is uninformative as a guide to what was done and what was found in this study. It ends with a rousing promotion of growth mind set as a way of understanding and improving academic achievement.

A study with N = 25 is grossly underpowered for most purposes and should not be used to generate estimates of associations.

Key details of methods and results needed for independent evaluation are not available in article.

The colored brain graphics in the article were labeled “for illustrative purposes only.”

Where would you find such images of the brain not tied to the data in a credible neuroscience journal?  Articles in real such journals are increasingly retracted because of the discovery of suspected pasted-in or altered brain graphics.

The discussion has a strong confirmation bias, ignoring relevant literature and overselling the use of event-related potentials for monitoring and evaluating the determinants of academic achievement.

The press release issued by Association for Psychological Science.

How Your Brain Reacts To Mistakes Depends On Your Mindset

Concludes:

The research shows that these people are different on a fundamental level, Moser says. “This might help us understand why exactly the two types of individuals show different behaviors after mistakes.” People who think they can learn from their mistakes have brains that are tuned to pay more attention to mistakes, he says. This research could help in training people to believe that they can work harder and learn more, by showing how their brain is reacting to mistakes.

The abstract.

The abstract does not report basic details of methods and results, except what is consistent with the authors’ intended message. The crucial final sentence is quote worthy and headed for clickbait. When we look at what was done and what was found in this study, this conclusion is grossly overstated.

How well people bounce back from mistakes depends on their beliefs about learning and intelligence. For individuals with a growth mind-set, who believe intelligence develops through effort, mistakes are seen as opportunities to learn and improve. For individuals with a fixed mind-set, who believe intelligence is a stable characteristic, mistakes indicate lack of ability. We examined performance-monitoring event-related potentials (ERPs) to probe the neural mechanisms underlying these different reactions to mistakes. Findings revealed that a growth mind-set was associated with enhancement of the error positivity component (Pe), which reflects awareness of and allocation of attention to mistakes. More growth-minded individuals also showed superior accuracy after mistakes compared with individuals endorsing a more fixed mind-set. It is critical to note that Pe amplitude mediated the relationship between mind-set and posterror accuracy. These results suggest that neural mechanisms indexing on-line awareness of and attention to mistakes are intimately involved in growth-minded individuals’ ability to rebound from mistakes.

The introduction.

The introduction opens with:

Decades of research by Dweck and her colleagues indicate that academic and occupational success depend not only on cognitive ability, but also on beliefs about learning and intelligence (e.g., Dweck, 2006).

This sentence echoes the Amazon blurb for the pop psychology book  that is being cited:

After decades of research, world-renowned Stanford University psychologist Carol S. Dweck, Ph.D., discovered a simple but groundbreaking idea: the power of mindset. In this brilliant book, she shows how success in school, work, sports, the arts, and almost every area of human endeavor can be dramatically influenced by how we think about our talents and abilities.

Nowhere in the introduction are there balancing references to studies investigating Carol Dweck’s theory independently, from outside her group, nor any citing of any inconsistent findings. This is a selective, strongly confirmation-driven review of the relevant literature. (Contrast this view with an independent assessment from a recent comprehensive meta analysis at the end of the this post).

The method.

Twenty-five native-English-speaking undergraduates (20 female, 5 male; mean age = 20.25 years) participated for course credit.

There is no discussion of why a sample of only 25 participants was chosen or any mention of a power analysis.

If we stick to simple bivariate correlations with the full sample of N= 25:

R = .40 p <.05  (p= 0.0475)

R=  .51  p <.01 (p = 0.0092)

N = 25 does not allow reliable detection of a small to moderate sized,  statistically significant relationship where one exists.

Any significant findings will of necessity be large, r >.40 for p<.05 and  r> .51 for p<.01.

As been noted elsewhere:

In systematic studies of psychological and biomedical effect sizes (e.g., Meyer et al., 2001)  one rarely encounters correlations greater than .4.

How growth mindset scores were calculated is crucially important, but the information that is presented about the measure is inadequate. There is no reference to an established scale with psychometric data and cross validation. Rather:

Following the flanker [a noise letter version of the Eriksen flanker task (Eriksen & Eriksen,  1974)  task, participants completed a TOI scale that asked respondents to rate the extent to which they agreed with four fixed-mind-set statements on a 6-point Likert-type scale (1 = strongly disagree, 6 = strongly agree). These statements (e.g., “You have a certain amount of intelligence and you really cannot do much to change it”) were drawn from previous studies measuring TOI (e.g., Hong, Chiu, Dweck, Lin, & Wan, 1999). TOI items were reverse-scored so that higher scores indicated more endorsement of a growth mind-set, and lower scores indicated more of a fixed mind-set,

Details in the referenced Hong et al (1999) study are difficult to follow, but the paper lays out the following requirement:

Those participants who believe that intelligence is fixed (entity theorists) should consistently endorse responses at the lower (agree) end of the scale (yielding a mean score of 3.0 or lower), whereas participants who believe that intelligence is malleable (incremental theorists) should consistently endorse responses at the upper (disagree) end of the scale (yielding a mean score of 4.0 or above).

If this distribution occurred naturally, it would be an extraordinary set of questions. In the Hong et al (1999) study, this distribution was achieved by throwing away data in the middle of the distribution that didn’t fit the investigators’ preconceived notion.

Excluding the middle third of a distribution of scores with only N = 25 compounds the errors associated with the practice with a larger sample. With the small number of scores now reduced to N= 17, the influence of single outlier participant would be increased. Any generalization to the larger population would be even more problematic.  We cannot readily evaluate whether scores in the present sample were neatly and naturally bimodal. We are not provided the basic data, not even the means and standard deviations in text or table. However, as we will see, one graphic representation leaves some doubts.

Overview of data analyses.

Repeated measures analyses of variance (ANOVAs) were first conducted on behavioral and ERP measures without regard to individual differences in TOIs in order to establish baseline experimental effects. ANOVAs conducted on behavioral measures and the ERN included one 2-level factor: accuracy (error vs. correct response). The Pe [error positivity component ]was analyzed using a 2 (accuracy: error vs. correct response) × 2 (time window: 150–350 ms vs. 350–550 ms) ANOVA. Subsequently, TOI scores were entered into ANOVAs as covariates to assess the main and interactive effects of mind-set on behavioral and ERP measures. When significant effects of TOI score were detected, we conducted follow-up correlational analyses to aid in the interpretation of results.

Thus, multiple post hoc analyses examine the effects of the growth mindset (TOI), based on whether significant main and interaction effects were obtained in other analyses, which in turn, were followed up with correlational analyses.

Highlights of the results.

 Only a few of numerous analyses produced significant results for TOI. Given the sample size and multiple tests without correction, we probably should not attach substantive interpretations to them.

Behavioral data.

Overall accuracy was not correlated with TOI (r = .06, p > .79).

[Speed on error vs correct trials]  trials] When TOI was entered into the ANOVA as a covariate, there were no significant effects (Fs < 1.78, ps > .19, ηp 2s < .08) [where ‘ps’ and ‘no significant effects’ refer to either a main or interaction effects].

[Posterror adjustments] When TOI was entered into the ANOVA as a covariate, there were no significant effects (Fs <1.15, ps > .29, ηp 2 s  < .05).

When entered into the ANOVA as a covariate, however, TOI scores interacted with postresponse accuracy, F(1, 23) = 5.22, p < .05, ηp2= .19. Correlational analysis showed that as TOI scores increased, indicating a growth mind-set, so did accuracy on trials immediately following errors relative to accuracy on trials immediately following correct responses (i.e., posterror accuracy – postcorrect-response accuracy; r = .43, p < .05).

ERPs (event-related potentials).

As expected, the ANOVA confirmed greater ERP negativity on error trials (M = –3.43 μV, SD = 4.76 μV) relative to correct trials (M = –0.23 μV, SD = 4.20 μV), F(1, 24) = 24.05, p < .001, ηp2 = .50, in the 0- to 100-ms postresponse time window. This result is consistent with the presence of an ERN. There were no significant effects involving TOI (Fs < 1.24, ps > .27, ηp2s < .06).

When entered as a covariate, TOI showed a significant interaction with accuracy, F(1, 23) = 8.64, p < .01, ηp2 = .27. Correlational analysis demonstrated that as TOI scores increased so did positivity on error trials relative to correct trials averaged across both time windows (i.e., error activity – correct-response activity; r = .52,1 p < .01)

Mediation analysis.

As Figure 2 illustrates, controlling for Pe amplitude significantly attenuated the relationship between TOI scores and posterror accuracy. The 95% confidence intervals derived from the bootstrapping test did not include zero (.01–.04), and thus indicated significant mediation.

So, a priori conditions for testing for a significant mediation was met because a statistical test barely excluded zero (.01–.04, with no correction for the many tests of TOI in the study. But what are we doing exploring mediation with N = 25?

Distribution of TOI [growth mindset] scores.

Let’s look at the distribution of TOI scores in a graph available as the x-axis in Figure 1.

graph with outlier

Any dichotomization of these continuous scores would be arbitrary. Close scores clustered around different sides of the median would  be considered  different, but  diverging  scores on the same side of the median  would be treated as the same.  Any association between TOI and ERPs (event-related potentials) could be due to one or a few interindividual differences in brains or intraindividual variability of ERP over occasions. These are not the kind of data from which generalizable estimates of effects can be obtained.

The depiction of brains with fixed versus growth mind sets.

The one picture of brains in the main body of this article supposedly contrasts fixed versus growth mindsets. The differences appear dramatic, in sharply contrasting colors. But in the article itself, no such dichotomization is discussed. Nor should it be. Furthermore, the simulation is based on an isolation of one of the few significant effects of TOI. Readers are cautioned that the picture is “for illustrative purposes only.”

fixed vs growth mind set

The discussion.

Similar to the introduction, there is a selective citation of the literature with a strong confirmation bias. There is no reference to weak or null findings or any controversy concerning growth mindset that might have accumulated over a decade of research. There is no acknowledgment of the folly of making substantive interpretations of significant findings from such a small, underpowered study. Results of the mediation analysis are confidently presented, with no indication of doubts whether they should even have been conducted. Or that, even under the best of circumstances, such mediational analyses remain correlational  and provide only weak evidence of causal mechanisms. Event-related evoked potentials are proposed as biomarkers and as surrogate outcomes in implementations of growth mindset interventions. A lot of misunderstanding and neurononsense are crammed into a few sentences. There is no mention of any limitations to the study.

The APS Observer press release revisited.

Why was this article recognized with a special press release by the APS? The press release is much more tied to the author’s claims about their study, rather than to their actual methods and results. The press release provides an opportunity to publicize the study with further exaggeration of what it accomplished.

This is an unfortunate message to authors about what they need to do to be promoted by APS. Your intended message can override your actual results if you strategically emphasize the message and downplay any discrepancy with the results. Don’t mention any limitations of your study.

The TED talks.

A number of TED and TED-related talks incorporate a discussion of the study, with its picture of fixed versus growth mindset brains. There is remarkable overlap among these talks. I have chosen TEDxNorrkoping The power of believing that you can improve  because it had a handy transcript available.

 same screenshot in TED talk1

On the left, you see the fixed-mindset students. There’s hardly any activity. They run from the error. They don’t engage with it. But on the right, you have the students with the growth mindset, the idea that abilities can be developed. They engage deeply. Their brain is on fire with yet. They engage deeply. They process the error. They learn from it and they correct it.

“On fire”? The presented exploits the arbitrary red color chosen for the for-illustrative-purposes-only picture.

The brain graphic is reduced to a cartoon in a comic book level account of action heroes engaging their errors deeply, learning from them, and correcting their next response when ordinary mortals are running, like cowards.

The presenter soon introduces another cartoon for her comic book depiction of the effects of growth mindset on the brain. But first, here is an overview of how this talk fits the predictable structure of a TED talk.

The TED talk begins with a personal testimony concerning  “a critical event early in my career, a real turning point.” It is recognizable to TED talk devotees as an epiphany (an “epiphimony” if you like ) through which the speaker shares a personal journey of insight and realisation, its triumphs and tribulations. In telling the story, the presenter introduces an epic struggle between the children of the darkness (the “now” of a fixed mindset) versus children of the light (the “yet” or “not yet” of a growth mindset).

There is much more of a sense of a televangelist than academic presenting an accurate summary of her research to a lay audience. Sure, the live audience and the millions of viewers of this and related talks were not seeking a colloquium or even a Cafe Scientifique. The audience came to be entertained with a good story. But how much license can be taken with the background science? After all, the information being discussed is relevant to their personal decisions as parents and as citizens and communities making important choices about how to improve academic performance. The issue becomes more serious when the presenter gets to claims of dramatic transformations of impoverished students in economically deprived school settings.

The presenter cites one of her studies for an account of what students “gripped with the tyranny of now” did in difficult learning experiences:

So what do they do next? I’ll tell you what they do next. In one study, they told us they would probably cheat the next time instead of studying more if they failed a test. In another study, after a failure, they looked for someone who did worse than they did so they could feel really good about themselves.

cheat vs study

We are encouraged to think ‘Students with a fixed mind set cheat instead of studying more. How horrible!’ But I looked up the study:

Blackwell LS, Trzesniewski KH, Dweck CS. Implicit Theories of Intelligence Predict Achievement Across an Adolescent Transition: A Longitudinal Study and an InterventionChild Development. 2007 Jan 1;78(1):246-63.

I searched for “cheat” and found one mention:

Students rated how likely they would be to engage in positive, effort-based strategies (e.g., ‘‘I would work harder in this class from now on’’ ‘‘I would spend more time studying for tests’’) or negative, effort-avoidant strategies (e.g., ‘‘I would try not to take this subject ever again’’ ‘‘I would spend less time on this subject from now on’’ ‘‘I would try to cheat on the next test’’). Positive and negative items were combined to form a mean Positive Strategies score.

All subsequent reporting of results was in terms of this composite Positive Strategies. So, I was unable to evaluate how common endorsement occurred of “I would try to cheat…”

Three minutes into the talk, the speaker introduces an element of moral panic about a threat to Western civilization as we know it:

How are we raising our children? Are we raising them for now instead of yet? Are we raising kids who are obsessed with getting As? Are we raising kids who don’t know how to dream big dreams? Their biggest goal is getting the next A, or the next test score? And are they carrying this need for constant validation with them into their future lives? Maybe, because employers are coming to me and saying, “We have already raised a generation of young workers who can’t get through the day without an award.”

Less than a minute later, the presenter gets ready to roll out her solution.

So what can we do? How can we build that bridge to yet?

Praising performance in terms of fixed characteristics like IQ or ability is ridiculed. However, great promises are made for praising process, regardless of outcome.

Here are some things we can do. First of all, we can praise wisely, not praising intelligence or talent. That has failed. Don’t do that anymore. But praising the process that kids engage in, their effort, their strategies, their focus, their perseverance, their improvement. This process praise creates kids who are hardy and resilient.

“Yet” or “not yet” becomes a magical incantation.  The presenter builds on her comic book science of the effects of growth mindset, by introducing by cartoon of a synapse (mislabeled as a neuron),  linked to her own research only by some wild speculation.

build stronger connections synapse

Just the words “yet” or “not yet,” we’re finding, give kids greater confidence, give them a path into the future that creates greater persistence. And we can actually change students’ mindsets. In one study, we taught them that every time they push out of their comfort zone to learn something new and difficult, the neurons in their brain can form new, stronger connections, and over time, they can get smarter.

I found no relevant measurements of brain activity in Dweck’s studies, but let’s not ruin a good story.

Look what happened: In this study, students who were not taught this growth mindset continued to show declining grades over this difficult school transition, but those who were taught this lesson showed a sharp rebound in their grades. We have shown this now, this kind of improvement, with thousands and thousands of kids, especially struggling students.

Up until now, we have disappointingly hyped and inaccurate accounts of how to foster academic achievement. But soon turns into a cruel hoax when claims are made about improving the performance of under privileged children in under resource settings.

So let’s talk about equality. In our country, there are groups of students who chronically underperform, for example, children in inner cities, or children on Native American reservations. And they’ve done so poorly for so long that many people think it’s inevitable. But when educators create growth mindset classrooms steeped in yet, equality happens. And here are just a few examples. In one year, a kindergarten class in Harlem, New York scored in the 95th percentile on the national achievement test. Many of those kids could not hold a pencil when they arrived at school. In one year, fourth-grade students in the South Bronx, way behind, became the number one fourth-grade class in the state of New York on the state math test. In a year, to a year and a half, Native American students in a school on a reservation went from the bottom of their district to the top, and that district included affluent sections of Seattle. So the Native kids outdid the Microsoft kids.

This happened because the meaning of effort and difficulty were transformed. Before, effort and difficulty made them feel dumb, made them feel like giving up, but now, effort and difficulty, that’s when their neurons are making new connections, stronger connections. That’s when they’re getting smarter.

So the Native kids outdid the Microsoft kids.” There is some kind of poetic license being taken here in describing the results of an intervention. The message is that subjective mindset can trump entrenched structural inequalities and accumulated deficits in skills and knowledge, as well as limits on ability. All school staff and parents need to do is wave the magic wand and recite the incantation “Not yet.” How reassuring to those in politics who control resources who don’t want to adequately fund the school settings. They just need to exhort anyone who wants to improve outcomes to recite the magic.

And what do we say when we don’t witness dramatic improvements? Who is to blame when such failures need to be explained. . The cruel irony is that school boards will blame principals, who blame teachers, and parents will blame schools and their children. All will be held to unrealistic expectations.

But it gets worse. The presenter ends with a call to action arguing that that not buying into her program would violate the human rights of vulnerable children.

Let’s not waste any more lives, because once we know that abilities are capable of such growth, it becomes a basic human right for children, all children, to live in places that create that growth, to live in places filled with “yet”.

Paradox: Do poor kids with a growth mindset suffer negative consequences?

Maybe so, suggests some recent research concerning the longer term outcomes of disadvantaged African American children.

A newly published study in the peer-reviewed journal Child Development …finds traditionally marginalized youth who grew up believing in the American ideal that hard work and perseverance naturally lead to success show a decline in self-esteem and an increase in risky behaviors during their middle-school years. The research is considered the first evidence linking preteens’ emotional and behavioral outcomes to their belief in meritocracy, the widely held assertion that individual merit is always rewarded.

“If you’re in an advantaged position in society, believing the system is fair and that everyone could just get ahead if they just tried hard enough doesn’t create any conflict for you … [you] can feel good about how [you] made it,” said Erin Godfrey, the study’s lead author and an assistant professor of applied psychology at New York University’s Steinhardt School. But for those marginalized by the system—economically, racially, and ethnically—believing the system is fair puts them in conflict with themselves and can have negative consequences.

We know surprisingly little about the adverse events associated with growth mindset interventions or their negative unintended consequences for children and school systems. Cost/benefit analyses of mindset interventions should be done with respect to academic interventions known to be effective when conducted with the equivalent resources, not no treatment.

Overall associations of growth mind set with academic achievement are weak and interventions are not effective.

Sisk VF, Burgoyne AP, Sun J, Butler JL, Macnamara BN. To What Extent and Under Which Circumstances Are Growth Mind-Sets Important to Academic Achievement? Two Meta-Analyses. Psychological Science. 2018 Mar 1:0956797617739704.

This newly published article published in Psychological Science started by noting  the influence of growth mind set.

These ideas have led to the establishment of nonprofit organizations (e.g., Project for Education Research that Scales [PERTS]), for-profit entities (e.g., Mindset Works, Inc.), schools purchasing mind-set intervention programs (e.g., Brainology), and millions of dollars in funding to individual researchers, nonprofit organizations, and for-profit companies (e.g., Bill and Melinda Gates Foundation,1 Department of Education,2 Institute of Educational Sciences3).

In our first meta-analysis (k = 273, N = 365,915), we examined the strength of the relationship between mind-set and academic achievement and potential moderating factors. In our second meta-analysis (k = 43, N = 57,155), we examined the effectiveness of mind-set interventions on academic achievement and potential moderating factors. Overall effects were weak for both meta-analyses.

The first meta analysis integrated 273 effect sizes. The overall effect was very weak, by conventional standards, hardly consistent with the TED talks.

The meta-analytic average correlation (i.e., the average of various population effects) between growth mind-set and academic achievement is r⎯⎯ = .10, 95% confidence interval (CI) = [.08, .13], p < .001.

The data set of effects of growth mindset interventions integrated 43 effect sizes and 37 of the 43 effect sizes (86%) are not significantly different from zero.

The authors conclude:

Some researchers have claimed that mind-set interventions can “lead to large gains in student achievement” and have “striking effects on educational achievement” (Yeager & Walton, 2011, pp. 267 and 268, respectively). Overall, our results do not support these claims. Mind-set interventions on academic achievement were nonsignificant for adolescents, typical students, and students facing situational challenges (transitioning to a new school, experiencing stereotype threat). However, our results support claims that academically high-risk students and economically disadvantaged students may benefit from growth-mind-set interventions (see Paunesku et al., 2015; Raizada & Kishiyama, 2010), although these results should be interpreted with caution because (a) few effect sizes contributed to these results, (b) high-risk students did not differ significantly from non-high-risk students, and (c) relatively small sample sizes contributed to the low-SES group.

Part of the reshaping effort has been to make funding mind-set research a “national education priority” (Rattan et al., 2015, p. 723) because mind-sets have “profound effects” on school achievement (Dweck, 2008, para. 2). Our meta-analyses do not support this claim.

And

From a practical perspective, resources might be better allocated elsewhere than mind-set interventions. Across a range of treatment types, Hattie, Biggs, and Purdie (1996) [https://www.teachertoolkit.co.uk/wp-content/uploads/2014/04/effect-of-learning-skills.pdf ] found that the meta-analytic average effect size for a typical educational intervention on academic performance is 0.57. All meta-analytic effects of mind-set interventions on academic performance were < 0.35, and most were null. The evidence suggests that the “mindset revolution” might not be the best avenue to reshape our education system.

The presenter’s speaker fees.

Presenters of TED talks are not paid, but a successful talk can lead to lucrative speaking engagements. It is informative to Google the speaking fees of the presenters of highly accessed Ted talks. In the case of Carol Dweck, I found the booking agency,  All American Speakers.

carol dweck speaking

fee range

Mindsetonline provides products for sale as well as success stories about people and organizations adopting a growth mindset.

buy the bookbuy the software

businessa nd leadership

There is even a 4-item measure of mindset you can complete on line.  Each of the items is some paraphrasing of ‘you can’t change your intelligence very much’ either stated straightforwardly or reverse, ‘you can.’

Consumers beware! TED talks are not reliable dissemination of best evidence.

TED talks are to best evidence like historical fiction is to history.

Even TED talks by eminent psychologists often are little more than informercials for the self-help and lucrative speaking engagements and workshops.

Academics are under increasing pressure to demonstrate that there is more to the  impact of their work, in terms of citations of publications in prestigious journals. Social impact is being used to balance journal impact factors.

It is also being recognized that outreach involves the need to equip lay audiences to be able to grasp what are initially difficult or confusing concepts.

But pictures of color brains can be used to dumb down consumers and to disarm their intuitive skepticism about behavioral science working magic and miracles. Even PhD psychologists are inclined to be  overly impressed with references to neuroscience and pictures of color brains are introduced into the discussion. The vulnerability of lay audiences to neurononsense or neurobollocks is even greater.

False and exaggerated claims about academic interventions harm school systems, teachers, and ultimately, students. In communicating to lay audiences, psychologists need to be sensitive to the possible misunderstandings they are reinforcing. They have an ethical responsibility to do their best to critical thinking skills of their audiences, not damage it.

TED talks and declarations of potential conflicts of interest.

Personally, I found that calling out the pseudoscience behind claims for unproven medicine like acupuncture or homeopathy does not produce much blowback except mostly from proponents of these treatments. Similarly, campaigning for better disclosure of potential conflicts of interest does not meet much resistance when the focus is on pharmaceutical companies.

However, it’s a whole different matter to call out the pseudoscience behind self-help and exaggerated outbreak false claims about behavioral science being able to work miracles and magic. It seems to be a double standard in psychology by which is inappropriate to exaggerate the strength of findings when communicating with other professionals. On the other hand, in communicating with lay audiences, it’s perfectly okay.

We need to think about TED talks more like we think about talks by opinion leaders with ties to the pharmaceutical industry. Presenters  should start with a standard slide disclosing financial interests that may influence opinions offered about specific products mentioned in the talk. Given the pressure to get findings that will fit into the next TED talk, presenters should routinely disclose in their peer review articles that they give TED talks or have a booking agent.

 

Science Media Centre concedes negative reaction from scientific community to coverage of Esther Crawley’s SMILE trial.

“It was the criticism from within the scientific community that we had not anticipated.”

mind the brain logo

Editorial from the

science media centre logo

eat-crow-humble-pieSEPTEMBER 28, 2017

Inconvenient truths

http://www.sciencemediacentre.org/inconvenient-truths/

 

“It was the criticism from within the scientific community that we had not anticipated.”

“This time the SMC also came under fire from our friends in science…Quack buster extraordinaire David Colquhoun tweeted, ‘More reasons to be concerned about @SMC_London?’

Other friends wrote to us expressing concern about the unintended consequences of SMC briefings – with one saying that policy makers were furious at having to deal with the fallout from our climate briefing and others worried that the briefing on the CFS/ME trial would allow the only private company offering the treatment to profit by over-egging preliminary findings.

Eat more crowThose of us who are accustomed to the Science Media Centre UK (SMC) highly slanted coverage of select topics  can detect a familiar defensive, yet self-congratulatory tone to an editorial put out by the SMC in reaction to its broad coverage of Esther Crawley’s SMILE trial of the quack treatment, Phil Parker’s Lightning Process. Once again, critics, both patients and professionals, of ineffectual treatments being offered for chronic fatigue syndrome/myalgic encephalomyelitis  are lumped with climate change deniers. Ho-hum, this comparison is getting so clichéd.

Perhaps even better, the SMC editorial’s concessions of poor coverage of the SMILE trial drew sharp amplifications from commentators that SMC had botched the job.

b1f9cdb8747b66edb7587c798153d4bfHere are some comments below, with emphases added. But let’s not be lulled by SMC into assuming that these intelligent, highly articulate comments, not necessarily from the professional community. I wouldn’t be surprised if hiding behind the pseudonyms are some of the excellent citizen scientists that the patient community has had to grow in the face of vilification and stigmatization led by SMC.

I actually think I recognize a spokesperson from the patient community writing under the pseudonym ‘Scary vocal critic.’

Scary vocal critic says:

September 29, 2017 at 5:59 am

The way that this blog glosses over important details in order to promote a simplistic narrative is just another illustration of why so many are concerned by Fiona Fox’s work, and the impact [of] the Science Media Centre.

Let’ s look in a bit more detail at the SMILE trial, from Esther Crawley at Bristol University. This trial was intended to assess the efficacy of Phil Parker’s Lightning Process©. Phil Parker has a history of outlandish medical claims about his ability to heal others, selling training in “the use of divination medicine cards and tarot as a way of making predictions” and providing a biography which claimed: “Phil Parker is already known to many as an inspirational teacher, therapist, healer and author. His personal healing journey began when, whilst working with his patients as an osteopath. He discovered that their bodies would suddenly tell him important bits of information about them and their past, which to his surprise turned out to be factually correct! He further developed this ability to step into other people’s bodies over the years to assist them in their healing with amazing results. After working as a healer for 20 years, Phil Parker has developed a powerful and magical program to help you unlock your natural healing abilities. If you feel drawn to these courses then you are probably ready to join.” https://web.archive.org/web/20070615014926/http://www.healinghawk.com/prospectushealing.htm

While much of the teaching materials for the Lightning Process are not available for public scrutiny (LP being copyrighted and controlled by Phil Parker), it sells itself as being founded on neurolinguistic programming and osteopathy, which are themselves forms of quackery. Those who have been on the course have described a combination of strange rituals, intensive positive affirmations, and pseudoscientific neuro-babble; all adding up to promote the view that an individual’s ill-health can be controlled if only they are sufficiently committed to the Lightning Programme. Bristol University appears to have embraced the neurobabble, and in their press release about the SMILE results they describe LP thus: “It is a three-day training programme run by registered practitioners and designed to teach individuals a new set of techniques for improving life and health, through consciously switching on health promoting neurological pathways.”

https://www.bristol.ac.uk/news/2017/september/lightning-process.html

Unsurprisingly, many patients have complained about paying for LP and receiving manipulative quackery. This can have unpredictable consequences. This article reports a child attempting to kill themselves after going on the Lightning Process:  Before conducting a trial, the researchers involved had a responsibility to examine the course and training materials and remove all pseudo-science, yet this was not done. Instead, those patient groups raising concerns about the trial were smeared, and presented as being opposed to science.

The SMILE trial was always an unethical use of research funding, but if it had followed its original protocol, it would have been less likely to generate misleading results and headlines. The Skeptics Dictionary’s page on the Lightning Process features a contribution which explains that: “the Lightning Process RCT being carried out by Esther Crawley changed its primary outcome measure from school attendance to scores on a self-report questionnaire. Given that LP involves making claims to patients about their own ability to control symptoms in exactly the sort of way likely to lead to response bias, it seems very likely that this trial will now find LP to be ‘effective’. One of the problems with EBM is that it is often difficult to reliably measure the outcomes that are important to patients and account for the biases that occur in non-blinded trials, allowing for exaggerated claims of efficacy to be made to patients.”

The SMILE trial was a nonblinded, A vs A+B design, testing a ‘treatment’ which included positive affirmations, and then used subjective self-report questionnaires as a primary outcome. This is not a sensible way of conducting a trial, as anyone who has looked at how junk-science can be used to promote quackery will be aware.

You can see the original protocol for the SMILE trial here (although this protocol refers to merely a feasibility study, this is the same research, with the same ethical review code, the feasibility study having seemingly been converted to a full trial a year into the research):

The protocol that: “The primary outcome measure for the interventions will be school attendance/home tuition at 6 months.” It is worth noting that the new SMILE paper reported that there was no significant difference between groups for what was the trial’s primary outcome. There was a significant difference at 12 months, but by this point data on school attendance was missing for one third of the participants of the LP arm. The SMC failed to inform journalists of this outcome switching, instead presenting Prof Crawley as a critic converted by a rigorous examination of the evidence, despite her having told the ethics review board in 2010 that “she has worked before with the Bath [LP] practitioner who is good”. https://meagenda.wordpress.com/2011/01/06/letter-issued-by-nres-following-scrutiny-of-complaints-in-relation-to-smile-lighting-process-pilot-study/

Also, while the original protocol, and a later analysis plan, refer to verifying self-reported school attendance with school records, I could see no mention of this in the final paper, so it may be that even this more objective outcome measure has been rendered less useful and more prone to problems with response bias.

Back to Fiona Fox’s blog: “If you had only read the headlines for the CFS/ME story you may conclude that the treatment tested at Bristol might be worth a try if you are blighted by the illness, when in truth the author said repeatedly that the findings would first have to be replicated in a bigger trial.”

How terrible of sloppy headline writers to misrepresent research findings. This is from the abstract of Esther Crawley’s paper: “Conclusion The LP is effective and is probably cost-effective when provided in addition to SMC for mild/moderately affected adolescents with CFS/ME.” http://adc.bmj.com/content/early/2017/09/20/archdischild-2017-313375

Fox complains of “vocal critics of research” in the CFS and climate change fields. There has been a prolong campaign from the SMC to smear those patients and academics who have been pointing out the problems with poor quality UK research into CFS, attempting to lump them with climate change deniers, anti-vaccinationists and animal rights extremists. The SMC used this campaign as an example of when they had “engineered the coverage” by “seizing the agenda”:

http://www.sciencemediacentre.org/wp-content/uploads/2013/03/Review-of-the-first-three-years-of-the-mental-health-research-function-at-the-Science-Media-Centre.pdf

Despite dramatic claims of a fearsome group of dangerous extremists (“It’s safer to insult the Prophet Mohammed than to contradict the armed wing of the ME brigade”), a Freedom of Information request helped us gain some valuable information about exactly what behaviour most concerned victimised researchers such as Esther Crawley:

“Minutes from a 2013 meeting held at the Science Media Centre, an organisation that played an important role in promoting misleading claims about the PACE trial to the UK media, show these CFS researchers deciding that “harassment is most damaging in the form of vexatious FOIs [Freedom of Information requests]”.[13,16, 27-31] The other two examples of harassment provided were “complaints” and “House of Lords debates”.[13] It is questionable whether such acts should be considered forms of harassment.

http://www.centreforwelfarereform.org/news/major-breaktn-pace-trial/00296.html

[A full copy of the minutes is included at the above address.]

Since then, a seriously ill patient managed to win a legal battle against researchers attempting to release key trial data, picking apart the prejudices that were promoted and left the Judge to state that “assessment of activist behaviour was, in our view, grossly exaggerated and the only actual evidence was that an individual at a seminar had heckled Professor Chalder.” http://www.informationtribunal.gov.uk/DBFiles/Decision/i1854/Queen%20Mary%20University%20of%20London%20EA-2015-0269%20(12-8-16).PDF

So why would there be an attempt to present request for information, complaints, and mere debate, as forms of harassment? Rather embarrassingly for Fiona and the SMC, it has since become clear. Following the release of (still only some of) the data from the £5 million PACE trial it is now increasingly recognised within the academic community that patients were right to be concerned about the quality of these researchers’ work, and the way in which people had been misled about the trial’s rsults. The New York Times reported on calls for the retraction of a key PACE paper (Robin Murray, the journal’s editor and a close friend of Simon Wessely’s, does not seem keen to discuss and debate the problems with this work): https://www.nytimes.com/2017/03/18/opinion/sunday/getting-it-wrong-on-chronic-fatigue-syndrome.html The Journal of Health Psychology has published as special issue devoted to the PACE trial debacle: http://journals.sagepub.com/doi/full/10.1177/1359105317722370 The CDC has dropped promotion of CBT and GET: https://www.statnews.com/2017/09/25/chronic-fatigue-syndrome-cdc/ And NICE has decided to a full review of its guidelines for CFS is necessary, citing concerns about research such as PACE as one of the key reasons for this: https://www.nice.org.uk/guidance/cg53/resources/surveillance-report-2017-chronic-fatigue-syndromemyalgic-encephalomyelitis-or-encephalopathy-diagnosis-and-management-2007-nice-guideline-cg53-4602203537/chapter/how-we-made-the-decision https://www.thetimes.co.uk/edition/news/mutiny-by-me-sufferers-forces-a-climbdown-on-exercise-treatment-npj0spq0w

The SMC’s response to this has not been impressive.

Fox writes: “Both briefings fitted the usual mould: top quality scientists explaining their work to smart science journalists and making technical and complex studies accessible to readers.”

I’d be interested to know how it was Fox decided that Crawley was a top quality scientist. Also, it is worrying that the culture of UK science journalism seems to assume that making technical and complex studies (like SMILE?!) accessible for readers is their highest goal. It is not a surprise that it is foreign journalists who have produced more careful and accurate coverage of the PACE trial scandal.

Unlike the SMC and some CFS researchers, I do not consider complaints or debate to be a form of harassment, and would be quite happy to respond to anyone who disagrees with the concerns I have laid out here. I have had to simplify things, but believe that I have not done so in a way which favours my case. It seems that there are few people willing to try to publicly defend the PACE trial anymore, and I have never seen anyone from the SMC attempt to respond to anything other than a straw-man representation of their critics. Lets see what response these inconvenient truths receive.

Reply

Michael Emmans-Dean says:

October 2, 2017 at 8:22 am

The only point I would add to this excellent post is to ask why on earth the SMC decided to feature such a small, poorly-designed trial as SMILE. The most likely explanation is that it was intended as a smokescreen for an inconvenient truth. NICE’s retrieval of their CFS guideline from the long grass (the “static list”) is a far bigger story and it was announced in the same week that SMILE was published.

Reply

Fiona Roberts says:

September 29, 2017 at 9:03 am

Hear hear!

Creating illusions of wondrous effects of yoga and meditation on health: A skeptic exposes tricks

The tour of the sausage factory is starting, here’s your brochure telling you’ll see.

 

A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.

What Is the Molecular Signature of Mind–Body Interventions? A Systematic Review of Gene Expression Changes Induced by Meditation and Related Practices.  Frontiers in Immunology. 2017;8.

Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release.  Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:

“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”

And

“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”

[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]

Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.

Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.

This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.

Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring  confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments.  Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:

More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.

Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review.  But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.

Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]

As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA.  We can be no more confident with what the trials provide than we would be without them ever having been done.

I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.

I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.

Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.

The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.

One giveaway is how the introductions  referred to the biggest, most badass, comprehensive and well-done review, that of Goyal and colleagues.

That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored.  If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing.  Only a handful  of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.

One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?

For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.

Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.

Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.

If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:

In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).

Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.

Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.

Some general rules:

  1. Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.

Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.

When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!

Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.

Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up.  If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.

  1. Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.

What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.

For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.

I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.

  1. Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.

In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants.  A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.

  1. Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.

Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).

  1. Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or  browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
  1. Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.

Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.

But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.

  1. Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
  1. Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.

Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.”  Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and  . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.

Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of  altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:

One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.

How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?

 

 

1 billion views! Why we should be concerned about PR campaign for 2 RCTs of psilocybin for cancer patients

According to the website of an advocacy foundation, coverage of two recent clinical trials published in in Journal of Psychopharmacology evaluating psilocybin for distress among cancer patients garnered over 1 billion views in the social media. To put that in context, the advocacy group claimed that this is one sixth of the attention that the Super Bowl received.

In this blog post I’ll review the second of the two clinical trials. Then, I will discuss some reasons why we should be concerned about the success of this public relations campaign in terms of what it means for both the integrity of scientific publishing, as well as health and science journalism.

The issue is not doubt that cancer patients will find benefit from the ingesting psychedelic mushroom in a safe environment. Nor that sale and ingestion of psilocybin is currently criminalized (Schedule 1, classified same as heroin).

We can appreciate the futility of the war on drugs, and the absurdity of the criminalization of psilocybin, but still object to how, we were strategically and effectively manipulated by this PR campaign.

Even if we approve of a cause, we need to be careful about subordinating the peer-review process and independent press coverage to the intended message of advocates.

Tolerating causes being promoted in this fashion undermines the trustworthiness of peer review and of independent press coverage of scientific papers.

To contradict a line from the 1964 acceptance speech of Republican Presidential Candidate Barry Goldwater, “Extremism in pursuit of virtue is no [a] vice. “

In this PR campaign –

We witnessed the breakdown of expected buffer of checks and balances between:

  • An advocacy group versus reporting of clinical trials in a scientific journal evaluating its claims.
  • Investigators’ exaggerated self-promotional claims versus editorial review and peer commentary.
  • Materials from the publicity campaign versus supposedly independent evaluation by journalists.

What if the next time the object of promotion is pharmaceuticals or medical devices by authors with conflicts of interest? But wait! Isn’t that what we’ve seen in JAMA Network journals on a smaller scale? Such as dubious claims about the wondrous effects of deep brain stimulation in JAMA: Psychiatry by promoters who “disappeared” failed trials? And claims in JAMA itself that suicides were eliminated at a behavioral health organization outside Detroit?

Is this part of a larger trend, where advocacy and marketing shape supposedly peer-reviewed publications in prestigious medical journals?

The public relations campaign for the psilocybin RCTs also left in tatters the credibility of altmetrics as an alternative to journal impact factors. The orchestrating of 1 billion views is a dramatic demonstration how altmetrics can be readily gamed. Articles published in a journal with a modest impact factor scored spectacularly, as seen in these altmetrics graphics the Journal of Psychopharmacology posted.

I reviewed in detail one of the clinical trials in my last blog post and will review the second in this one. They are both mediocre, poorly designed clinical trials that got lavishly praised as being highest quality by an impressive panel of commentators. I’ll suggest that in particular the second trial is best seen as what Barney Caroll has labeled  an experimercial, a clinical trial aimed at generating enthusiasm for a product, rather than a dispassionate evaluation undertaken with some possibility of not been able to reject the null hypothesis. If this sounds harsh, please indulge me and read on and be entertained and I think persuaded that this was not a clinical trial but an elaborate ritual, complete with psychobabble woo that has no place in the discussion of the safety and effectiveness of medicine.

After skeptically scrutinizing the second trial, I’ll consider the commentaries and media coverage of the two trials.

I’ll end with a complaint that this PR effort is only aimed at securing the right of wealthy people with cancer to obtain psilocybin under supervision of a psychiatrist and in the context of woo psychotherapy. The risk of other people in other circumstances ingesting psilocybin is deliberately exaggerated. If psilocybin is as safe and beneficial as claimed by these articles, why should use remain criminalized for persons who don’t have cancer or don’t want to get a phony diagnosis from a psychiatrist or don’t want to submit to woo psychotherapy?

The normally pay walled Journal of Psychopharmacology granted free access to the two articles, along with most but not all of the commentaries. However, extensive uncritical coverage in Medscape Medical News provides a fairly accurate summary, complete with direct quotes of lavish self-praise distributed by the advocacy-affiliated investigators and echoed in seemingly tightly coordinated commentaries.

The praise one of the two senior authors heaped upon their two studies as captured in Medscape Medical News and echoed elsewhere:

The new findings have “the potential to transform the care of cancer patients with psychological and existential distress, but beyond that, it potentially provides a completely new model in psychiatry of a medication that works rapidly as both an antidepressant and anxiolytic and has sustained benefit for months,” Stephen Ross, MD, director of Substance Abuse Services, Department of Psychiatry, New York University (NYU), Langone Medical Center, told Medscape Medical News.

And:

“That is potentially earth shattering and a big paradigm shift within psychiatry,” Dr Ross told Medscape Medical News.

The Hopkins Study

Griffiths RR, Johnson MW, Carducci MA, Umbricht A, Richards WA, Richards BD, Cosimano MP, Klinedinst MA. Psilocybin produces substantial and sustained decreases in depression and anxiety in patients with life-threatening cancer: A randomized double-blind trial. Journal of Psychopharmacology. 2016 Dec 1;30(12):1181-97.

The trial’s available registration is at ClinicalTrial.gov is available here.

The trial’s website is rather drab and typical for clinical trials. It contrasts sharply with the slick PR of the website for the NYU trial . The latter includes a gushy, emotional  video from a clinical psychologist participating as a patient in the study.  She delivers a passionate pitch for the “wonderful ritual” of the transformative experimental session. You can also get a sense of how session monitor structured the session and cultivated positive expectations. You also get a sense of the psilocybin experience being slickly marketed to appeal to the same well-heeled patients who pay out-of-pocket for complementary and alternative medicine at integrative medicine centers.

Conflict of interest

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Roland Griffiths is on the Board of Directors of the Heffter Research Institute.

Heffter Research Institute is listed as one of the funders of the study.

The introduction.

 The Hopkins study starts with some familiar claims from psycho-oncology ] that portray cancer as a mental health issue. The exaggerated estimates of 40% of cancer patients experiencing a mood disorder is arrived at by lumping adjustment reactions with a smaller proportion of diagnoses of generalized anxiety and major depression.

The introduction contradicts a large body of literature that suggests that the prevalence of mental disorder in cancer patients is no greater than other chronic health conditions  and may approximate what is found in primary care waiting rooms . There is also a fundamental confusion of psychological distress associated with diagnosis of cancer with psychiatric disorder in need of treatment. Much of the initial psychological distress in cancer patients resolves in a short time, making it difficult to demonstrate benefits of treatment beyond this natural trajectory of decline. Prescription of an antidepressant would be ineffective and inappropriate.

The introduction ends with a strong claim to the rigor and experimental control exercised in the clinical trial:

The present study provides the most rigorous evaluation to date of the efficacy of a classic hallucinogen for treatment of depressed mood and anxiety in psychologically distressed cancer patients. The study evaluated a range of clinically relevant measures using a double-blind cross-over design to compare a very low psilocybin dose (intended as a placebo) to a moderately high psilocybin dose in 51 patients under conditions that minimized expectancy effects.

The methods and results

In a nutshell: Despite claims to the contrary, this study cannot be considered a blinded study. At the six month follow-up, which is the outcome assessment point of greatest interest, it could no longer meaningfully considered a randomized trial. All benefits of randomization were lost. In addition, the effects of psilocybin were confounded with a woo psychotherapy in which positive expectations and support were provided and reinforced in a way that likely influenced assessments of outcome. Outcomes at six months also reflected changes in distress which would’ve occurred in the absence of treatment. The sample is inappropriate for generalizations about the treatment of major depression and generalized anxiety. The characterization of patients as facing impending death is inaccurate.

 The study involved a crossover design, which provides a lower level of evidence than a placebo controlled comparison study. The study compared a high psilocybin dose (22 or 30 mg/70 kg) with a low dose (1 or 3 mg/70 kg) administered in identically appearing capsules. While the low dose might not be homeopathic, it can be readily distinguished soon after administration from the larger dosage. The second drug administration occurred approximately 5 weeks later. Not surprisingly, with the high difference in dosage, session monitors who were supposedly blinded readily identified the group to which the participant they were observing had been assigned.

Within a cross over design, the six month follow-up data basically attributed any naturalistic decline in distress to the drug treatments. As David Colquhoun would argue, any estimate of the effects of the drug was inflated by including regression to the mean and get-better anyway effects. Furthermore, the focus on outcomes at six months meant patients assigned to either group in the crossover design had received high dosage psilocybin by at least five weeks into the study. Any benefits of randomization were lost.

Like the NYU study, the study Johns Hopkins involves selecting a small, unrepresentative sample of a larger group responding to a mixed recruitment strategy utilizing flyers, the internet, and physician referral.

  • Less than 10% of the cancer patients calling in were randomized.
  • Almost half of the final sample were currently using marijuana and, similarly, almost half had used hallucinogens in the past.
  • The sample is relatively young for cancer patients and well educated. More than half had postgraduate education, almost all were white, but there were two black people.
  • The sample is quite heterogeneous with respect to psychiatric diagnoses, with almost half having an adjustment disorder, and the rest anxiety and mood disorders.
  • In terms of cancer diagnoses and staging, it was also a select and heterogeneous group with only about a quarter having recurrent/metastatic disease with less than two years of expected survival. This suggests the odd “life-threatening” in the title is misleading.

Any mental health effects of psilocybin as a drug are inseparable from the effects of accompanying psychotherapy designed by a clinical psychologist “with extensive experience in studies of classic hallucinogens.” Participants met with that “session monitor” several times before the session in which the psilocybin was ingested in the monitor guided and aided in the interpretation of the drug experience. Aside from providing therapy, the session monitor instructed the patient to have positive expectations before the ingestion of the drug and work to maintain these expectations throughout the experience.

I found this psychotherapeutic aspect of the trial strikingly similar to one that was included in a trial of homeopathy in Germany that I accepted for publication in PLOS One. [See here for my rationale for accepting the trial and the ensuing controversy.] Trials of alternative therapies notoriously have such an imbalance of nonspecific placebo factors favoring the intervention group.

The clinical trial registration indicates that the primary outcome was the Pahnke-Richards Mystical Experience Questionnaire. This measure is included among 20 participant questionnaires listed in the Table 3 in the article as completed seven hours after administration of psilocybin. Although I haven’t reviewed all of these measures, I’m skeptical about their psychometric development, intercorrelation, and validation beyond face validity. What possibly could be learned from administering such a battery?

The authors make unsubstantiated assumptions in suggesting that these measures either individually or collectively capture mediation of later response assessed by mental health measures. A commentary echoed this:

Mediation analysis indicates that the mystical experience was a significant mediator of the effects of psilocybin dose on therapeutic outcomes.

But one of the authors of the commentary later walked that back with a statement to Medscape Medical News:

As for the mystical experiences that some patients reported, it is not clear whether these are “a cause, consequence or corollary of the anxiolytic effect or unconstrained cognition.”

Clinical outcomes at six months are discussed in terms of multiple measures derived from the unblinded, clinician-rated Hamilton scales. However, there are repeated references to box scores of the number of significant findings from at least 17 clinical measures (for instance, significant effects for 11 of the 17 measures), in addition to other subjective patient and significant-other measures. It is unclear why the authors would choose to administer so many measures that are highly likely intercorrelated.

There were no adverse events attributed to administration of psilocybin, and while there were a number of adverse psychological effects during the session with the psilocybin, none were deemed serious.

My summary evaluation

The clinical trial registration indicates broad inclusion criteria which may suggest the authors anticipated difficulty in recruiting patients that had significant psychiatric disorder for which psychotropic medication would be appropriate, as well as difficulty obtaining cancer patients that actually had poorer prognoses. Regardless, descriptions of the study is focusing on anxiety and depression and on “life-threatening” cancer seem to be marketing. You typically do not see a mixed sample with a large proportion of adjustment reaction characterized in the title of a psychiatric journal as treatment of “anxiety” and “depression”. You typically do not see a the adjective “life-threatening” in the title of an oncology article with such a mixed sample of cancer patients.

The authors could readily have anticipated that at the six-month assessment point of interest that they no longer had a comparison they could have been described as a rigorous double-blind, randomized trial. They should have thought through exactly what was being controlled by a control comparison group of a minimal dose of psilocybin. They should have been clearer that they were not simply evaluating psilocybin, but psilocybin administered in the context of a psychotherapy and an induction of strong positive expectations and promise of psychological support.

The finding of a lack of adverse events is consistent with a large literature, but is contradicted in the way the study is described to the media.

The accompanying editorial and commentary

Medscape Medical News reports the numerous commentaries accompanies these two clinical trials were hastily assembled. Many of the commentaries read that way, with the authors uncritically passing on the psilocybin authors’ lavish self praise of their work, after a lot of redundant recounts of the chemical nature of psilocybin and its history in psychiatry. When I repeatedly encountered claims that these trials represented rigorous, double blinded clinical trials or suggestions that the cancer was in a terminal phase, I assumed that the authors had not read the studies, only the publicity material, or simply had suspended all commitment to truth.

harmsI have great admiration for David Nutt  and respect his intellectual courage in campaigning for the decriminalization of recreational drugs, even when he knew that it would lead to his dismissal as chairman of the UK’s Advisory Council on the Misuse of Drugs (ACMD). He has repeatedly countered irrationality and prejudice with solid evidence. His graph depicting the harms of various substances to the uses and others deserves the wide distribution that it has received.

He ends his editorial with praise for the two trials as “the most rigorous double-blind placebo-controlled trials of a psychedelic drug in the past 50 years.” I’ll give him a break and assume that that reflects his dismal assessment of the quality of the other trials. I applaud his declaration, available nowhere else in the commentaries that:

There was no evidence of psilocybin being harmful enough to be controlled when it was banned, and since then, it has continued to be used safely by millions of young people worldwide with a very low incidence of problems. In a number of countries, it has remained legal, for example in Mexico where all plant products are legal, and in Holland where the underground bodies of the mushrooms (so-called truffles) were exempted from control.

His description of the other commentaries accompanying the two trials is apt:

The honours list of the commentators reads like a ‘who’s who’ of American and European psychiatry, and should reassure any waverers that this use of psilocybin is well within the accepted scope of modern psychiatry. They include two past presidents of the American Psychiatric Association (Lieberman and Summergrad) and the past-president of the European College of Neuropsychopharmacology (Goodwin), a previous deputy director of the Office of USA National Drug Control Policy (Kleber) and a previous head of the UK Medicines and Healthcare Regulatory Authority (Breckenridge). In addition, we have input from experienced psychiatric clinical trialists, leading pharmacologists and cancer-care specialists. They all essentially say the same thing..

The other commentaries. I do not find many of the commentaries worthy of further comment. However, one by Guy M Goodwin, Psilocybin: Psychotherapy or drug? Is unusual in offering even mild skepticism about the way the investigators are marketing their claims:

The authors consider this mediating effect as ‘mystical’, and show that treatment effects correlate with a subjective scale to measure such experience. The Oxford English Dictionary defines mysticism as ‘belief that union with or absorption into the Deity or the absolute, or the spiritual apprehension of knowledge inaccessible to the intellect, may be attained through contemplation and self-surrender’. Perhaps a scale really can measure a relevant kind of experience, but it raises the caution that the investigation of hallucinogens as treatments may be endangered by grandiose descriptions of their effects and unquestioning acceptance of their value.

The commentary by former president of the American Psychiatric Association Paul Summergrad, Psilocybin in end of life care: Implications for further research shamelessly echoes the psychobabble and self-promotion of the authors of the trials:

The experiences of salience, meaningfulness, and healing that accompanied these powerful spiritual experiences and that were found to be mediators of clinical response in both of these carefully performed studies are also important to understand in their own right and are worthy of further study and contemplation. None of us are immune from the transitory nature of human life, which can bring fear and apprehension or conversely a real sense of meaning and preciousness if we carefully number our days. Understanding where these experiences fit in healing, well-being, and our understanding of consciousness may challenge many aspects of how we think about mental health or other matters, but these well-designed studies build upon a recent body of work that confronts us squarely with that task.

Coverage in of the two studies in the media

The website for Heffter Research Institute  provides a handy set of links to some of the press coverage of the studies have received. There’s remarkable sameness to the portrayal of the study in the media, suggesting that journalists stayed closely to the press releases, except occasionally supplementing these with direct quotes from the authors. The appearance of a solicitation of independent evaluation of the trial almost entirely dependent on the commentaries published with the two articles.

There’s a lot of slick marketing by the two studies’ authors. In addition to what I wrote noted earlier in the blog, there are recurring unscientific statements marketing the psilocybin experience:

“They are defined by a sense of oneness – people feel that their separation between the personal ego and the outside world is sort of dissolved and they feel that they are part of some continuous energy or consciousness in the universe. Patients can feel sort of transported to a different dimension of reality, sort of like a waking dream.

There are also recurring distinct efforts to keep the psilocybin experience under the control of psychiatrists and woo clinical psychologists:

The new studies, however, suggest psilocybin be used only in a medical setting, said Dr. George Greer, co-founder, medical director and secretary at the Heffter Research Institute in Santa Fe, New Mexico, which funded both studies.

“Our focus is scientific, and we’re focused on medical use by medical doctors,” Greer said at the news conference. “This is a special type of treatment, a special type of medicine. Its use can be highly controlled in clinics with specially trained people.”

He added he doubts the drug would ever be distributed to patients to take home.

There are only rare admissions from an author of one of the studies that:

The results were similar to those they had found in earlier studies in healthy volunteers. “In spite of their unique vulnerability and the mood disruption that the illness and contemplation of their death has prompted, these participants have the same kind of experiences, that are deeply meaningful, spiritually significant and producing enduring positive changes in life and mood and behaviour,” he said.

If psilocybin is so safe and pleasant to ingest…

I think the motion of these studies puts ingestion of psilocybin on the path to being allowed in nicely furnished integrative cancer centers. In that sense psilocybin could become a gateway drug to quack services such as acupuncture, reiki, and energy-therapy therapeutic touch.

I’m not sure that demand would be great except among previous users of psychedelics and current users of cannabis.

But should psilocybin remain criminalized outside of cancer centers where wealthy patients can purchase a diagnosis of adjustment reaction from a psychiatrist? Cancer is not especially traumatic and PTSD is almost as common in the waiting rooms of primary care physicians. Why not extend to primary care physicians the option of prescribing psilocybin to their patients? What would be accomplished is that the purity could be assured. But why should psilocybin use being limited to mental health conditions, once we accept that a diagnosis of adjustment reaction is such a distorted extension of the term? Should we exclude patients who are atheists and only wants a satisfying experience, not a spiritual one?

Experience in other countries suggests that psilocybin can safely be ingested in a supportive, psychologically safe environment. Why not allow cancer patients and others to obtain psilocybin with assured purity and dosage? They could then ingest it in the comfort of friends and intimate partners who have been briefed on how the experience needs to be managed. The patients in the studies were mostly not facing immediate death from terminal cancer. But should we require that persons need to be dying in order to have a psilocybin experience without the risk of criminal penalties? Why not allow psilocybin to be ingested in the presence of pastoral counselors or priests whose religious beliefs are more congruent with the persons seeking such experiences than are New York City psychiatrists?

 

 

 

 

 

Deep Brain Stimulation: Unproven treatment promoted with a conflict of interest in JAMA: Psychiatry [again]

“Even with our noisy ways and cattle prods in the brain, we have to take care of sick people, now,” – Helen Mayberg

“All of us—researchers, journalists, patients and their loved ones–are desperate for genuine progress in treatments for severe mental illness. But if the history of such treatments teaches us anything, it is that we must view claims of dramatic progress with skepticism, or we will fall prey to false hopes.” – John Horgan

An email alert announced the early release of an article in JAMA: Psychiatry reporting effects of brain stimulation therapy for depression (DBS). The article was accompanied by an editorial commentary.

Oh no! Is an unproven treatment once again being promoted by one of the most prestigious psychiatry journals with an editorial commentary reeking of vested interests?

Indeed it is, but we can use the article and commentary as a way of honing our skepticism about such editorial practices and to learn better where to look to confirm or dispel our suspicions when they arise.

Xray depictionLike many readers of this blog, there was a time when I would turn to a trusted, prestigious source like JAMA: Psychiatry with great expectations. Not being an expert in a particular area like DBS, I would be inclined to accept uncritically what I read. But then I noticed how much of what I read conflicted with what I already knew about research design and basic statistics. Time and time again, this knowledge proved sufficient to detect serious hype, exaggeration, and simply false claims.

The problem was no longer simply one of the authors adopting questionable research practices. It expanded to journals and professional organizations adopting questionable publication practices that fit with financial, political, and other, not strictly scientific agendas.

What is found in the most prestigious biomedical journals is not necessarily the most robust and trustworthy of scientific findings. Rather, content is picked in terms of its ability to be portrayed as innovative and breakthrough medicine. But beyond that, the content is consistent with prevailing campaigns to promote particular viewpoints and themes. There is apparently no restriction on those who might most personally profit being selected for accompanying commentaries.

We need to recognize that editorial commentaries often receive weak or no peer review. Invitations from editors to provide commentaries are often a matter of sharing nonscientific agenda and simple cronyism.

Coming to these conclusions, I have been on a mission to learn better how to detect hype and hokum and I have invited readers of my blog posts to come along.

This installment builds on my recent discussion of an article claiming remission of suicidal ideation by magnetic seizure therapy. Like the editorial commentary accompanying previous JAMA: Psychiatry article, the commentary discussed here had an impressive conflict of interest disclosure. The disclosure probably would not have prompted me to search on the Internet for other material about one of the authors. Yet, a search revealed some information that is quite relevant to our interpretation of the new article and its commentary.  We can ponder whether this information should have been withheld. I think it should have been disclosed.

The lesson that I learned is a higher level of vigilance is needed to interpret highly touted article-commentary combos in prestigious journals. Unless we are going to simply dismiss them as advertisements or propaganda, rather than a highlighting of solid biomedical science.

Sadly, though, this exercise convinced me that efforts to scrutinize claims by turning to seemingly trustworthy supplementary sources can provide a misleading picture.

The article under discussion is:

Bergfeld IO, Mantione M, Hoogendoorn MC, et al. Deep Brain Stimulation of the Ventral Anterior Limb of the Internal Capsule for Treatment-Resistant Depression: A Randomized Clinical Trial. JAMA Psychiatry. Published online April 06, 2016. doi:10.1001/jamapsychiatry.2016.0152.

The commentary is:

Mayberg HS, Riva-Posse P, Crowell AL. Deep Brain Stimulation for Depression: Keeping an Eye on a Moving Target. JAMA Psychiatry. Published online April 06, 2016. doi:10.1001/jamapsychiatry.2016.0173.

The trial registration is

Deep Brain Stimulation in Treatment-refractory patients with Major Depressive Disorder.

Pursuing my skepticism by searching on the Internet, I immediately discovered a series of earlier blog posts about DBS by Neurocritic [1] [2] [3] that saved me a lot of time and directed me to still other useful sources. I refer to what I learned from Neurocritic in this blog post. But as always, all opinions are entirely my responsibility, along with misstatements and any inaccuracies.

But what I learned from immediately from Neurocritic is that BSD is a hot area of research, even if it continues to produce disappointing outcomes.

DBS had a commitment of $70 million from President Obama’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative . Premised on the causes of psychopathology being in precise, isolated neural circuitry, it is the poster children of the Research Domain Criteria (RDoC) of former NIMH director Thomas Insel. Neurocritic points to Insel promotion of “electroceuticals” like DBS in his NIMH Director’s Blog 10 Best of 2013:

The key concept: if mental disorders are brain circuit disorders, then successful treatments need to tune circuits with precision. Chemicals may be less precise than electrical or cognitive interventions that target specific circuits.

The randomized trial of deep brain stimulation for depression.

The objective of the trial was:

To assess the efficacy of DBS of the ventral anterior limb of the internal capsule (vALIC), controlling for placebo effects with active and sham stimulation phases.

Inclusion criteria were a diagnosis of major depressive disorder designated as being treatment resistant (TRD) on the basis of

A failure of at least 2 different classes of second-generation antidepressants (eg, selective serotonin reuptake inhibitor), 1 trial of a tricyclic antidepressant, 1 trial of a tricyclic antidepressant with lithium augmentation, 1 trial of a monoamine oxidase inhibitor, and 6 or more sessions of bilateral electroconvulsive therapy.

Twenty-five patients with TRD from 2 Dutch hospitals first received surgery that implanted four contact electrodes deep within their brains. The electrodes were attached to tiny wires leading to a battery-powered pulse generator implanted under their collar bones.

The standardized DBS treatment started after a three-week recovery from the surgery. Brain stimulation was continuous one week after surgery, but at three weeks, patients begin visits with psychiatrists or psychologists on what was at first a biweekly basis, but later less frequently.

deep brain stimulation teamAt the visits, level of depression was assessed and adjustments were made to various parameters of the DBS, such as the specific site targeted in the brain, voltage, and pulse  frequency and amplitude. Treatment continued until optimization – either four weeks of sustained improvement on depression rating scales or the end of the 52 week period. In the original protocol, this this phase of the study was limited to six months, but was extended after experience with a few patients. Six patients went even longer than the 52 weeks to achieve optimization.

Once optimization was achieved, patients were randomized to a crossover phase in which they received two blocks of six weeks of either continued active or sham treatment that involved simply turning off the stimulation. Outcomes were classified in terms of investigator-rated changes in the 17-item Hamilton Depression Rating Scale.

The outcome of the open-label phase of the study was the change of the investigator-rated HAM-D-17 score (range, 0-52) from baseline to T2. In addition, we classified patients as responders (≥50% reduction of HAM-D-17 score at T2 compared with baseline) or nonresponders (<50% reduction of HAM-D-17 score atT2 compared with baseline). Remission was defined as a HAM-D-17 score of 7 or less at T2. The primary outcome measure of the randomized, double-blind crossover trial was the difference in HAM-D-17 scores between the active and sham stimulation phases. In a post hoc analysis, we tested whether a subset of nonresponders showed a partial response (≥25% but <50% reduction of HAM-D-17 score at T2 compared with baseline).

Results

Clinical outcomes. The mean time to first response in responders was 53.6 (50.6) days (range, 6-154 days) after the start of treatment optimization. The mean HAM-D-17 scores decreased from 22.2 (95%CI, 20.3-24.1) at baseline to 15.9 (95% CI, 12.3-19.5) at T2.

An already small sample shrank further from initial assessment of eligibility until retention at the end of the cross over study. Of the 52 patients assessed for eligibility, 23 were in eligible and four refused. Once the optimization phase of the trial started, four patients withdrew for lack of effect. Another five could not be randomized in the crossover phase, three because of an unstable psychiatric status, one because of fear of worsening symptoms, and one because of their physical health. So, the randomized phase of the trial consisted of nine patients randomized to the active treatment and then the sham and another seven patients randomized to the sham and then active treatment.

The crossover to sham treatment did not go as planned. Of the nine (three responders and six nonresponders) randomized to the active-then-sham condition, all had to be crossed over early – one because the patient requested a crossover, two because of a gradual increase in symptoms, and three because of logistics. Of the seven patients assigned to sham- first (four responders and three nonresponders), all had to be crossed over within a day because of increases in symptoms.

I don’t want to get lost in the details here. But we are getting into small numbers with nonrandom attrition, imbalanced assignment of responders versus nonresponders in the randomization, and the breakdown of the planned sham treatment. From what I’ve read elsewhere about DBS, I don’t think that providers or patients were blinded to the sham treatment. Patients should be able to feel the shutting off of the stimulator.

Adverse events. DBS has safety issues. Serious adverse events included severe nausea during surgery (1 patient), suicide attempt (4 patients), and suicidal ideation (2 patients). Two nonresponders died several weeks after they withdrew from the study and DBS had been stopped (1 suicide, 1 euthanasia). Two patients developed full blown mania during treatment and another patient became hypomanic.

The article’s Discussion claims

We found a significant reduction of depressive symptoms following vALIC DBS, resulting in response in 10 patients (40%) and partial response in 6 (24%) patients with TRD.

Remission was achieved in 5 (20%) patients. The randomized active-sham phase study design indicates that reduction of depressive symptoms cannot be attributed to placebo effects…

Conclusions

This trial shows efficacy of DBS in patients with TRD and supports the possible benefits of DBS despite a previous disappointing randomized clinical trial. Further specification of targets and the most accurate setting optimization as well as larger randomized clinical trials are necessary.

A clinical trial with starting with 25 patients does not have much potential to shift our confidence in the efficacy of DBS. Any hope of doing so was further dashed when the sample was reduced to 17 patients who remained for the investigators’ attempted randomization to an active treatment versus sham comparison (seven responders and nine nonresponders). Then sham condition could not be maintained as planed in the protocol for any patients.

The authors interpreted the immediate effects of shifting to sham treatment as ruling out any placebo effect. However, it’s likely that shutting off the stimulator was noticeable to the patients and the immediacy of effects speaks to likelihood an effect due to the strong expectations of patients with intolerable depression having their hope taken away. Some of the immediate response could’ve been a nocebo response.

Helen Mayberg and colleagues’ invited commentary

The commentary attempted to discourage a pessimistic assessment of DBS based on the difficulties implementing the original plans for the study as described in the protocol.

A cynical reading of the study by Bergfeld et al1 might lead to the conclusion that the labor-intensive and expert-driven tuning of the DBS device required for treatment response makes this a nonviable clinical intervention for TRD. On the contrary, we see a tremendous opportunity to retrospectively characterize the various features that best define patients who responded well to this treatment. New studies could test these variables prospectively.

The substantial deviation from protocol that occurred after only two patients were entered into the trial was praised in terms of the authors’ “tenacious attempts to establish a stable response”:

We appreciate the reality of planning a protocol with seemingly conservative time points based on the initial patients, only to find these time points ultimately to be insufficient. The authors’ tenacious attempts to establish a stable response by extending the optimization period from the initial protocol using 3 to 6 months to a full year is commendable and provides critical information for future trials.

Maybe, but I think the need for this important change, along with the other difficulties that were encountered in implementing the study, speak to a randomized controlled trial of DBS being premature.

Conflict of Interest Disclosures: Dr Mayberg has a paid consulting agreement with St Jude Medical Inc, which licensed her intellectual property to develop deep brain stimulation for the treatment of severe depression (US 2005/0033379A1). The terms of this agreement have been reviewed and approved by Emory University in accordance with their conflict of interest policies. No other disclosures were reported.

Helen Mayberg’s declaration of interest clearly identifies her as someone who is not a detached observer, but who would benefit financially and professionally from any strengthening the claims for the efficacy of DBS. We are alerted by this declaration, but I think there were some things that were not mentioned in the article or editorial about Helen Mayberg’s work that would influence her credibility even more if they were known.

Helen Mayberg’s anecdotes and statistics about the success of DBS

Mayberg has been attracting attention for over a decade with her contagious exuberance for DBS. A 2006 article in the New York Times by David Dobbs started with a compelling anecdote of one of Mayberg’s patients being able to resume a normal life after previous ineffective treatments for severe depression. The story reported the success with 8 of12 patients treated with DBS:

They’ve re-engaged their families, resumed jobs and friendships, started businesses, taken up hobbies old and new, replanted dying gardens. They’ve regained the resilience that distinguishes the healthy from the depressed.

Director of NIMH Tom Insel chimed in:

“People often ask me about the significance of small first studies like this,” says Dr. Thomas Insel, who as director of the National Institute of Mental Health enjoys an unparalleled view of the discipline. “I usually tell them: ‘Don’t bother. We don’t know enough.’ But this is different. Here we know enough to say this is something significant. I really do believe this is the beginning of a new way of understanding depression.”

A 2015 press release from Emory University, Targeting depression with deep brain stimulation, gives another anecdote of a dramatic treatment success.

Okay, we know to be skeptical about University press releases, but then there are the dramatic anecdotes and even numbers in a news article in Science, Short-Circuiting Depression that borders on an infomercial for Mayberg’s work.

short-circuiting depression

Since 2003, Mayberg and others have used DBS in area 25 to treat depression in more than 100 patients. Between 30% and 40% of patients do “extremely well”—getting married, going back to work, and reclaiming their lives, says Sidney Kennedy, a psychiatrist at Toronto General Hospital in Canada who is now running a DBS study sponsored by the medical device company St. Jude Medical. Another 30% show modest improvement but still experience residual depression. Between 20% and 25% do not experience any benefit, he says. People contemplating brain surgery might want better odds, but patients with extreme, relentless depression often feel they have little to lose. “For me, it was a last resort,” Patterson says.

By making minute adjustments in the positions of the electrodes, Mayberg says, her team has gradually raised its long-term response rates to 75% to 80% in 24 patients now being treated at Emory University.

A chronically depressed person or someone who cares for someone who is depressed might be motivated to go on the Internet and try to find more information about Mayberg’s trial. A website for Mayberg’s BROADEN (BROdmann Area 25 DEep brain Neuromodulation) study once provided a description of the study, answers to frequently asked questions, and an opportunity to register for screening for the study. However, it’s no longer accessible through Google or other search engines. But you can reach an archived website with a link provided by Neurocritic, but the click links are no longer functional.

Neurocritic’s blog posts about Mayberg and DBS

If you are lucky, a Google search for Mayberg deep brain stimulation, might bring you to any of three blog posts by Neurocritic [1] [2] [3] that have rich links and provide a very different story of Mayberg and DBS.

One link takes you to the trial registration for Mayberg’s BROADEN study: A Clinical Evaluation of Subcallosal Cingulate Gyrus Deep Brain Stimulation for Treatment-Resistant Depression. The updated file registration indicates that the study will end in September 2017, and that the study is ongoing but not recruiting participants.

This information should have been updated, as should other publicity about Mayberg’s BROADEN study. Namely, as Neurocritic documents, the company attempting to commercialize DBS by funding the study, St. Jude Medical terminated after futility analyses indicated that further enrollment of patients had only a 17% probability of achieving a significant effect. At the point of terminating the trial, 125 patients had been role.

Neurocritic also provides a link to an excellent, open access review paper:

Morishita T, Fayad SM, Higuchi MA, Nestor KA, Foote KD. Deep brain stimulation for treatment-resistant depression: systematic review of clinical outcomes. Neurotherapeutics. 2014 Jul 1;11(3):475-84.

The article reveals that although there are 22 published studies of DBS for treatment-resistant depression, only three are randomized trials, one of which was completed with null results. Two – including Mayberg’s BROADEN trial – were discontinued because futility analyses indicate that a finding of efficacy for the treatment was unlikely.

Finally, Neurocritic  also provides a link to a Neurotech Business Report, Depressing Innovation:

The news that St. Jude Medical failed a futility analysis of its BROADEN trial of DBS for treatment of depression cast a pall over an otherwise upbeat attendance at the 2013 NANS meeting [see Conference Report, p7]. Once again, the industry is left to pick up the pieces as a promising new technology gets set back by what could be many years.

It’s too early to assess blame for this failure. It’s tempting to wonder if St. Jude management was too eager to commence this trial, since that has been a culprit in other trial failures. But there’s clearly more involved here, not least the complexity of specifying the precise brain circuits involved with major depression. Indeed, Helen Mayberg’s own thinking on DBS targeting has evolved over the years since the seminal paper she and colleague Andres Lozano published in Neuron in 2005, which implicated Cg25 as a lucrative target for depression. Mayberg now believes that neuronal tracts emanating from Cg25 toward medial frontal areas may be more relevant [NBR Nov13 p1]. Research that she, Cameron McIntyre, and others are conducting on probabilistic tractography to identify the patient-specific brain regions most relevant to the particular form of depression the patient is suffering from will likely prove to be very fruitful in the years ahead.

So, we have a heavily hyped unproven treatment for which the only clinical trials have either been null or terminated following a futility analysis. Helen Mayberg, a patent holder associated with one of these trials was inappropriate to be recruited for commentary on another, more modestly sized trial that also ran into numerous difficulties that can be taken to suggest it was premature. However, I find it outrageous that so little effort has been made to correct the record concerning her BROADEN trial or even to acknowledge its closing in the JAMA: Psychiatry commentary.

Untold numbers of depressed patients who don’t get expected benefits from available treatments are being misled with false hope from anecdotes and statistics from a trial that was ultimately terminated.

I find troubling what my exercise showed might happen when someone who is motivated by the skepticism goes to the Internet and tries to get additional information about the JAMA: Psychiatry paper. They could be careful to rely on only seemingly credible sources – a trial registration and a Science article.  The Science article is not peer-reviewed but nonetheless has a credibility conveyed appearing in the premier and respected Science. The trial registration has not been updated with valuable information and the Science article gives no indication how it is contradicted by better quality evidence. So, they would be misled.

 

 

Is risk of Alzheimer’s Disease reduced by taking a more positive attitude toward aging?

Unwarranted claims that “modifiable” negative beliefs cause Alzheimer’s disease lead to blaming persons who develop Alzheimer’s disease for not having been more positive.

Lesson: A source’s impressive credentials are no substitute for independent critical appraisal of what sounds like junk science and is.

More lessons on how to protect yourself from dodgy claims in press releases of prestigious universities promoting their research.

If you judge the credibility of health-related information based on the credentials of the source, this article  is a clear winner:

Levy BR, Ferrucci L, Zonderman AB, Slade MD, Troncoso J, Resnick SM. A Culture–Brain Link: Negative Age Stereotypes Predict Alzheimer’s Disease Biomarkers. Psychology and Aging. Dec 7 , 2015, No Pagination Specified. http://dx.doi.org/10.1037/pag0000062

alzheimers
From INI

As noted in the press release from Yale University, two of the authors are from Yale School of Medicine, another is a neurologist at Johns Hopkins School of Medicine, and the remaining three authors are from the US National Institute on Aging (NIA), including NIA’s Scientific Director.

The press release Negative beliefs about aging predict Alzheimer’s disease in Yale-led study declared:

“Newly published research led by the Yale School of Public Health demonstrates that                   individuals who hold negative beliefs about aging are more likely to have brain changes associated with Alzheimer’s disease.

“The study suggests that combatting negative beliefs about aging, such as elderly people are decrepit, could potentially offer a way to reduce the rapidly rising rate of Alzheimer’s disease, a devastating neurodegenerative disorder that causes dementia in more than 5 million Americans.

The press release posited a novel mechanism:

“We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

A Google search reveals over 40 stories about the study in the media. Provocative titles of the media coverage suggest a children’s game of telephone or Chinese whispers in which distortions accumulate with each retelling.

Negative beliefs about aging tied to Alzheimer’s (Waltonian)

Distain for the elderly could increase your risk of Alzheimer’s (FinancialSpots)

Lack of respect for elderly may be fueling Alzheimer’s epidemic (Telegraph)

Negative thoughts speed up onset of Alzheimer’s disease (Tech Times)

Karma bites back: Hating on the elderly may put you at risk of Alzheimer’s (LA Times)

How you feel about your grandfather may affect your brain health later in life (Men’s Health News)

Young people pessimistic about aging more likely to develop Alzheimer’s later on (Health.com)

Looking forward to old age can save you from Alzheimer’s (Canonplace News)

If you don’t like old people, you are at higher risk of Alzheimer’s, study says (RedOrbit)

If you think elderly people are icky, you’re more likely to get Alzheimer’s (HealthLine)

In defense of the authors of this article as well as journalists, it is likely that editors added the provocative titles without obtaining approval of the authors or even the journalists writing the articles. So, let’s suspend judgment and write off sometimes absurd titles to editors’ need to establish they are offering distinctive coverage, when they are not necessarily doing so. That’s a lesson for the future: if we’re going to criticize media coverage, better focus on the content of the coverage, not the titles.

However, a number of these stories have direct quotes from the study’s first author. Unless the media coverage is misattributing direct quotes to her, she must have been making herself available to the media.

Was the article such an important breakthrough offering new ways in which consumers could take control of their risk of Alzheimer’s by changing beliefs about aging?

No, not at all. In the following analysis, I’ll show that judging the credibility of claims based on the credentials of the sources can be seriously misleading.

What is troubling about this article and its well-organized publicity effort is that information is being disseminated that is misleading and potentially harmful, with the prestige of Yale and NIA attached.

Before we go any further, you can take your own look at a copy of the article in the American Psychological Association journal Psychology and Aging here, the Yale University press release here, and a fascinating post-publication peer review at PubPeer that I initiated as peer 1.

Ask yourself: if you encountered coverage of this article in the media, would you have been skeptical? If so what were the clues?

spoiler aheadcure within The article is yet another example of trusted authorities exploiting entrenched cultural beliefs about the mind-body connection being able to be harnessed in some mysterious way to combat or prevent physical illness. As Ann Harrington details in her wonderful book, The Cure Within, this psychosomatic hypothesis has a long and checkered history, and gets continually reinvented and misapplied.

We see an example of this in claims that attitude can conquer cancer. What’s the harm of such illusions? If people can be led to believe they have such control, they are set up for blame from themselves and from those around them when they fail to fend off and control the outcome of disease by sheer mental power.

The myth of “fighting spirit” overcoming cancer that has survived despite the accumulation of excellent contradictory evidence. Cancer patients are vulnerable to blaming themselves for being blamed by loved ones when they do not “win” the fight against cancer. They are also subject to unfair exhortations to fight harder as their health situation deteriorates.

onion composite
                                                        From the satirical Onion

 What I saw when I skimmed the press release and the article

  • The first alarm went off when I saw that causal claims were being made from a modest sized correlational study. This should set off anyone’s alarms.
  • The press release refers to this as a “first ever” d discussion section of the article refer to this as a “first ever” study. One does not seek nor expect to find robust “first ever” discoveries in such a small data set.
  • The authors do not provide evidence that their key measure of “negative stereotypes” is a valid measure of either stereotyping or likelihood of experiencing stress. They don’t even show it is related to concurrent reports of stress.
  • Like a lot of measures with a negative tone to items, this one is affected by what Paul Meehl calls the crud factor. Whatever is being measured in this study cannot be distinguished from a full range of confounds that not even being assessed in this study.
  • The mechanism by which effects of this self-report measure somehow get manifested in changes in the brain lacks evidence and is highly dubious.
  • There was no presentation of actual data or basic statistics. Instead, there were only multivariate statistics that require at least some access to basic statistics for independent evaluation.
  • The authors resorted to cheap statistical strategies to fool readers with their confirmation bias: reliance on one tailed rather than two-tailed tests of significance; use of a discredited backwards elimination method for choosing control variables; and exploring too many control/covariate variables, given their modest sample size.
  • The analyses that are reported do not accurately depict what is in the data set, nor generalize to other data sets.

The article

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

The authors used their measure of negative aging stereotypes in other studies, but do not provide the usual evidence of convergent  and discriminant validity needed to establish the measure assesses what is intended. Basically, we should expect authors to show that a measure that they have developed is related to existing measures (convergent validity) in ways that one would expect, but not related to existing measures (discriminate validity) with which it should have associations.

Psychology has a long history of researchers claiming that their “new” self-report measures containing negatively toned items assess distinct concepts, despite high correlations with other measures of negative emotion as well as lots of confounds. I poked fun at this unproductive tradition in a presentation, Negative emotions and health: why do we keep stalking bears, when we only find scat in the woods?

The article reported two studies. The first tested whether participants holding more negative age stereotypes would have significantly greater loss of hippocampal volume over time. The study involved 52 individuals selected from a larger cohort enrolled in the brain-neuroimaging program of the Baltimore Longitudinal Study of Aging.

Readers are given none of the basic statistics that would be needed to interpret the complex multivariate analyses. Ideally, we would be given an opportunity to see how the independent variable, negative age stereotypes, is related to other data available on the subjects, and so we could get some sense if we are starting with some basic, meaningful associations.

Instead the authors present the association between negative age stereotyping and hippocampal volume only in the presence of multiple control variables:

Covariates consisted of demographics (i.e., age, sex, and education) and health at time of baseline-age-stereotype assessment, (number of chronic conditions on the basis of medical records; well-being as measured by a subset of the Chicago Attitude Inventory); self-rated health, neuroticism, and cognitive performance, measured by the Benton Visual Retention Test (BVRT; Benton, 1974).

Readers get cannot tell why these variables and not others were chosen. Adding or dropping a few variables could produce radically different results. But there are just too many variables being considered. With only 52 research participants, spurious findings that do not generalize to other samples are highly likely.

I was astonished when the authors announced that they were relying on one-tailed statistical tests. This is widely condemned as unnecessary and misleading.

Basically, every time the authors report a significance level in this article, you need to double the number to get what is obtained with a more conventional two-tailed test. So, if they proudly declare that results are significant p = .046, then the results are actually (non)significant, p= .092. I know, we should not make such a fuss about significance levels, but journals do. We’re being set up to be persuaded the results are significant, when they are not by conventional standards.

So the authors’ accumulating sins against proper statistical techniques and transparent reporting: no presentation of basic associations; reporting one tailed tests; use of multivariate statistics inappropriate for a sample that is so small. Now let’s add another one, in their multivariate regressions, the authors relied on a potentially deceptive backwards elimination:

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.

The authors assembled their candidate control/covariate variables and used a procedure that checks them statistically and drop some from consideration, based on whether they fail to add to the significance of the overall equation. This procedure is condemned because the variables that are retained in the equation capitalize on chance. Particular variables that could be theoretically relevant are eliminated simply because they fail to add anything statistically in the context of the other variables being considered. In the context of other variables, these same discarded variables would have been retained.

The final regression equation had fewer control/covariates then when the authors started. Statistical significance will be calculated on the basis of the small number of variables remaining, not the number that were picked over and so results will artificially appear stronger. Again, potentially quite misleading to the unwary reader.

The authors nonetheless concluded:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had a significantly steeper decline in hippocampal volume

The second study:

examined whether participants holding more negative age stereotypes would have significantly greater accumulation of amyloid plaques and neurofibrillary tangles.

The outcome was a composite-plaques-and-tangles score and the predictor was the same negative age stereotypes measure from the first study. These measurements were obtained from 74 research participants upon death and autopsy. The same covariates were used in stepwise regression with backward elimination. Once again, the statistical test was one tailed.

Results were:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had significantly higher composite-plaques-and-tangles scores, t(1,59) = 1.71 p = .046, d = 0.45, adjusting for age, sex, education, self-rated health, well-being, and number of chronic conditions.

Aha! Now we see why the authors commit themselves to a one tailed test. With a conventional two-tailed test, these results would not be significant. Given a prevailing confirmation bias, aversion to null findings, and obsession with significance levels, this article probably would not have been published without the one tailed test.

The authors’ stirring overall conclusion from the two studies:

By expanding the boundaries of known environmental influences on amyloid plaques, neurofibrillary tangles, and hippocampal volume, our results suggest a new pathway to identifying mechanisms and potential interventions related to Alzheimer’s disease

pubpeerPubPeer discussion of this paper [https://pubpeer.com/publications/16E68DE9879757585EDD8719338DCD ]

Comments accumulated for a couple of days on PubPeer after I posted some concerns about the first study. All of the comments were quite smart, some directly validated points that I been thinking about, but others took the discussion in new directions either statistically or because the commentators knew more about neuroscience.

Using a mechanism available at PubPeer, I sent emails to the first author of the paper, the statistician, and one of the NIA personnel inviting them to make comments also. None have responded so far.

Tom Johnstone, a commentator who exercise the option of identifying himself noted the reliance on inferential statistics in the absence of reporting basic relationships. He also noted that the criterion used to drop covariates was lax. Apparently familiar with neuroscience, he expressed doubts that the results had any clinical significance or relevance to the functioning of the research participants.

Another commentator complained of the small sample size, use of one tailed statistical tests without justification, the “convoluted list of covariates,” and “taboo” strategy for selecting covariates to be retained in the regression equation. This commentator also noted that the authors had examined the effect of outliers, conducting analyses both with and without the inclusion of the most extreme case. While it didn’t affect the overall results, exclusion dramatically change the significance level, highlighting the susceptibility of such a small sample to chance variation or sampling error.

Who gets the blame for misleading claims in this article?

dr-luigi-ferrucciThere’s a lot of blame to go around. By exaggerating the size and significance of any effects, the first author increases the chance of publication and also further funding to pursue what is seen as a “tantalizing” association. But it’s the job of editors and peer reviewers to protect the readership from such exaggerations and maybe to protect the author from herself. They failed, maybe because exaggerated findings are consistent with the journal‘s agenda of increasing citations by publishing newsworthy rather than trustworthy findings. The study statistician, Martin Slade obviously knew that misleading, less than optimal statistics were used, why didn’t he object? Finally, I think the NIA staff, particularly Luigi Ferrucci, the Scientific Director of NIA  should be singled out for the irresponsibility of attaching their names to such misleading claims. Why they do so? Did they not read the manuscript?  I will regularly present instances of NIH staff endorsing dubious claims, such as here. The mind-over-disease, psychosomatic hypothesis, gets a lot of support not warranted by the evidence. Perhaps NIH officials in general see this as a way of attracting research monies from Congress. Regardless, I think NIH officials have the responsibility to see that consumers are not misled by junk science.

This article at least provided the opportunity for an exercise that should raise skepticism and convince consumers at all levels – other researchers, clinicians, policymakers, and those who suffer from Alzheimer’s disease and those who care from them – we just cannot sit back and let trusted sources do our thinking for us.

 

Should have seen it coming: Once high-flying Psychological Science article lies in pieces on the ground

Life is too short for wasting time probing every instance of professional organizations promoting bad science when they have an established record of doing just that.

There were lots of indicators that’s what we were dealing with in the Association for Psychological Science (APS) recent campaign for the now discredited and retracted ‘sadness prevents us from seeing blue’ article.

sad blueA quick assessment of the press release should have led us to dismiss the claims being presented and convinced us to move on.

Readers can skip my introductory material by jumping down this blog post to [*} to see my analysis of the APS press release.

Readers can also still access the original press release, which has now disappeared from the web, here. Some may want to read the press release and form their own opinions before proceeding into this blog post.

What, I’ve stopped talking about the PACE trial? Yup, at least at Mind the Brain, for now. But you can go here for the latest in my continued discussion of the PACE trial of CBT for chronic fatigue syndrome, in which I moved from critical observer to activist a while ago.

Before we were so rudely interrupted  by the bad science and bad media coverage of the PACE trial, I was focusing on how readers can learn to make quick assessments of hyped media coverage of dubious scientific studies.

In “Sex and the single amygdala”  I asked:

Can skeptics who are not specialists, but who are science-minded and have some basic skills, learn to quickly screen and detect questionable science in the journals and its media coverage?

The counter argument of course is Chris Mooney telling us “You Have No Business Challenging Scientific Experts”. He cites

“Jenny McCarthy, who once remarked that she began her autism research at the “University of Google.”

But while we are on the topic of autism, how about the counter example of The Lancet’s coverage of the link between vaccines and autism? This nonsense continues to take its toll on American children whose parents – often higher income and more educated than the rest – refused to vaccinate them on the basis of a story that started in The Lancet. Editor Richard Horton had to concede

horton on lancet autism failure

 

 

 

If we accept Chris Mooney‘s position, we are left at the mercy of press releases cranked out by the likes of professional organizations like Association for Psychological Science (APS) that repeatedly demand that we revise our thinking about human nature and behavior, as well as change our behavior if we want to extend our lives and live happier, all on the basis of a single “breakthrough” study. Rarely do APS press releases have any follow-up as to the fate of a study they promoted. One has to hope that PubPeer  or PubMed Commons pick up on the article touted in the press release and see what a jury of post-publication peers decides.

As we have seen in my past Mind the Brain posts, there are constant demands on our attention from press releases generated from professional organizations, university press officers, and even NIH alerting us to supposed breakthroughs in psychological and brain science. Few such breakthroughs hold up over time.

Are there no alternatives?

Are there no alternatives to our simply deferring to the expertise being offered or taking the time to investigate for ourselves claims that are likely to prove exaggerated or simply false?

We should approach press releases from the APS – or from its rival American Psychological Association – using prior probabilities to set our expectations. The Open Science Collaboration: Psychology (OSC) article  in Science presented results of a systematic attempt to replicate 100 findings from prestigious psychological journals, including APS’ s Psychological Science and APA’s Journal of Personality and Social Psychology. Less than half of the findings were replicated. Findings from the APS and APA journals fared worse than the others.

So, our prior probabilities are that declarations of newsworthy, breakthrough findings trumpeted in press releases from psychological organizations are likely to be false or exaggerated – unless we assume that the publicity machines prefer the trustworthy over the exciting and newsworthy in the article they selected to promote.

I will guide readers through a quick assessment of APS press release which I started on this post before getting swept up into the PACE controversy. However, in the intervening time, there have been some extraordinary developments, which I will then briefly discuss. We can use these developments to validate my and your evaluation of the press release available earlier. Surprisingly, there is little overlap between the issues I note in the press release and what concerned post-publication commentators.

*A running commentary based on screening the press release

What once was a link to the“feeling blue and seeing blue”  article now takes one only to

retraction press releasee

Fortunately, the original press release can still be reached here. The original article is preserved here.

My skepticism was already high after I read the opening two paragraphs of the press release

The world might seem a little grayer than usual when we’re down in the dumps and we often talk about “feeling blue” — new research suggests that the associations we make between emotion and color go beyond mere metaphor. The results of two studies indicate that feeling sadness may actually change how we perceive color. Specifically, researchers found that participants who were induced to feel sad were less accurate in identifying colors on the blue-yellow axis than those who were led to feel amused or emotionally neutral.

Our results show that mood and emotion can affect how we see the world around us,” says psychology researcher Christopher Thorstenson of the University of Rochester, first author on the research. “Our work advances the study of perception by showing that sadness specifically impairs basic visual processes that are involved in perceiving color.”

What Anglocentric nonsense. First, blue as a metaphor for sad does not occur across most languages other than English and Serbian. In German, to call someone blue is suggesting the person is drunk. In Russian, you are suggesting that the person is gay. In Arabic, if you say you are having a blue day, it is a bad one. But if you say in Portuguese that “everything is blue”, it suggests everything is fine.

In Indian culture, blue is more associated with happiness than sadness, probably traceable to the blue-blooded Krishna being associated with divine and human love in Hinduism. In Catholicism, the Virgin Mary is often wearing blue and so the color has come to be associated with calmness and truth.

We are off to a bad start. Going to the authors’ description of their first of two studies, we learn:

In one study, the researchers had 127 undergraduate participants watch an emotional film clip and then complete a visual judgment task. The participants were randomly assigned to watch an animated film clip intended to induce sadness or a standup comedy clip intended to induce amusement. The emotional effects of the two clips had been validated in previous studies and the researchers confirmed that they produced the intended emotions for participants in this study.

Oh no! This is not a study of clinical depression, but another study of normal college students “made sad” with a mood induction.

So-called mood induction tasks don’t necessarily change actual mood state, but they do convey to research participants what is expected of them and how they are supposed to act. In one of the earliest studies I ever did, we described a mood induction procedure to subjects without actually having them experience it. We then asked them to respond as if they had received it. Their responses were indistinguishable. We concluded that we could not rule out that what were considered effects of a mood induction task were simply demand characteristics, what research participants perceive as instructions as to how they should behave.

It was fashionable way back then for psychology researchers who were isolated in departments that did not have access to clinically depressed patients to claim that they were nonetheless conducting analog studies of depression. Subjecting students to unsolvable anagram task or uncontrollable loud noises was seen as inducing learned helplessness in them, thereby allowing investigators an analog study of depression. We demonstrated a problem with that idea. If students believed that the next task that they were administered was part of the same experiment, they performed poorly, as if they were in a state of learned helplessness or depression. However, if they believed that the second task was unrelated to the first, they would show no such deficits. Their negative state of helplessness or depression was confined to their performance in what they thought was the same setting in which the induction had occurred. Shortly after our experiments. Marty Seligman wisely stopped doing studies “inducing” learned helplessness in humans, but he continued to make the same claims about the studies he had done.

Analog studies of depression disappeared for awhile, but I guess they have come back into fashion.

But the sad/blue experiment could also be seen as a priming  experiment. The research participants were primed by the film clip and their response to a color naming task was then examined.

It is fascinating that neither the press release nor the article itself ever mentioned the word priming. It was only a few years ago that APS press releases were crowing about priming studies. For instance, a 2011 press release entitled “Life is one big priming experiment…” declared:

One of the most robust ideas to come out of cognitive psychology in recent years is priming. Scientists have shown again and again that they can very subtly cue people’s unconscious minds to think and act certain ways. These cues might be concepts—like cold or fast or elderly—or they might be goals like professional success; either way, these signals shape our behavior, often without any awareness that we are being manipulated.

Whoever wrote that press release should be embarrassed today. In the interim, priming effects have not proven robust. Priming studies that cannot be replicated have figured heavily in the assessment that the psychological literature is untrustworthy. Priming studies also figure heavily in the 56 retracted studies of fraudster psychologist Diederik Stapel. He claims that he turned to inventing data when his experiments failed to demonstrate priming effects that he knew were there. Yet, once he resorted to publishing studies with fabricated data, others claimed to replicate his work.

I made up research, and wrote papers about it. My peers and the journal editors cast a critical eye over it, and it was published. I would often discover, a few months or years later, that another team of researchers, in another city or another country, had done more or less the same experiment, and found the same effects.  My fantasy research had been replicated. What seemed logical was true, once I’d faked it.

So, we have an APS press release reporting a study that assumes that the association between sadness and the color blue is so hardwired and culturally universal that is reflected in basic visual processes. Yet the study does not involve clinical depression, only an analog mood induction and a closer look reveals that once again APS is pushing a priming study. I think it’s time to move on. But let’s read on:

The results cannot be explained by differences in participants’ level of effort, attention, or engagement with the task, as color perception was only impaired on the blue-yellow axis.

“We were surprised by how specific the effect was, that color was only impaired along the blue-yellow axis,” says Thorstenson. “We did not predict this specific finding, although it might give us a clue to the reason for the effect in neurotransmitter functioning.”

The researchers note that previous work has specifically linked color perception on the blue-yellow axis with the neurotransmitter dopamine.

The press release tells us that the finding is very specific, occurring only on the blue-yellow axis, not the red-green axes and thatdifferences between are not found in level of effort, attention, or engagement of the task. The researchers did not expect such a specific finding, they were surprised.

The press release wants to convince us of an exciting story of novelty and breakthrough.  A skeptic sees it differently: This is an isolated finding that is unanticipated by the researchers getting all dressed up. See, we should’ve moved on.

The evidence with which the press release wants to convince us is exciting because it is specific and novel. iThe researchers are celebrating the specificity of their finding, but the blue-yellow axis finding may be the only one statistically significant because it is due to chance or an artifact.

And bringing up unmeasured “neurotransmitter functioning” is pretentious and unwise. I challenge the researchers to show that effects of watching a brief movie clip registers in measurable changes in neurotransmitters. I’m skeptical even whether persons drawn from the community or outpatient samples reliably differ from non-depressed persons in measures of the neurotransmitter dopamine.

This is new work and we need to take time to determine the robustness and generalizability of this phenomenon before making links to application,” he concludes.

Claims in APS press releases are not known for their “robustness and generalizability.” I don’t think this particular claim should prompt an effort at independent replication when scientists have so many more useful things to keep them busy.

Maybe, these investigators should have checked robustness and generalizability before rushing into print. Maybe APS should stop pestering us with findings that surprise researchers and that have not yet been replicated.

A flying machine in pieces on the ground

Sadness impairs color perception was sent soaring high, lifted by an APS press release now removed from the web, but that is still available here. The press release was initially uncritically echoed, usually cut-and-paste or outright churnaled  in over two dozen media mentions.

But, alas, Sadness impairs color perception is now a flying machine in pieces on the ground 

Noticing of the article’s problems seem to have started with some chatter of skeptically-minded individuals on Twitter,  which led to comments at PubPeer where the article was torn to pieces. What unfolded was a wonderful demonstration of crowdsourced post-publication peer review in action. Lesson: PubPeer rocks and can overcome the failures of pre-publication peer review to keep bad stuff out of the literature.

You can follow the thread of comments at PubPeer.

  • An anonymous skeptic started off by pointing out an apparent lack of a significant statistical effect where one was claimed.
  • There was an immediate call for a retraction, but it seemed premature.
  • Soon re-analyses of the data from the paper were being reported, confirming the lack of a significant statistical effect when analyses were done appropriately and reported transparently.
  • The data set for the article was mysteriously changed after it had been uploaded.
  • Doubts were expressed about the integrity of the data – had they been tinkered with?
  • The data disappeared.
  • There was an announcement of a retraction.

The retraction notice  indicated that the researchers were still convinced of the validity of their hypothesis, despite deciding to retract their paper.

We remain confident in the proposition that sadness impairs color perception, but would like to acquire clearer evidence before making this conclusion in a journal the caliber of Psychological Science.

so deflatedThe retraction note also carries a curious Editors note:

Although I believe it is already clear, I would like to add an explicit statement that this retraction is entirely due to honest mistakes on the part of the authors.

Since then, doubts about express whether retraction was a sufficient response or whether something more is needed. Some of the participants in the PubPeer discussion drafted a letter to the editor incorporating their reanalyses and prepared to submit it to Psychological Science. Unfortunately, having succeeded in getting the bad science retracted, these authors reduced the likelihood of theirr reanalysis being accepted by Psychological Science. As of this date, their fascinating account remains unpublished but available on the web.

Postscript

Next time you see an APS or APA press release, what will be your starting probabilities about the trustworthiness of the article being promoted? Do you agree with Chris Mooney that you should simply defer to the expertise of the professional organization?

Why would professional organizations risk embarrassment with these kinds of press releases? Apparently they are worth the risk. Such press releases can echo through the conventional and social media and attract early attention to an article. The game is increasing the impact factor of the journal (JIFs).

Although it is unclear precisely how journal impact factors are calculated, the number reflects the average number of citations an article obtains within two years of publication. However, if press releases  promote “early releases” of articles,  the journal can acquire citations before the clock starts ticking for the two years. APS and APA are in intense competition for prestige of their journals and membership. It matters greatly to them which organization can claim the most prestigious journals, as demonstrated by their JIFs.

So, press releases are important from garnering early attention. Apparently breakthroughs, innovations, and “first ever” mattered more than trustworthiness. In the professional organizations hope we won’t remember the fate of past claims.