Calling out pseudoscience, radically changing the conversation about Amy Cuddy’s power posing paper

Part 1: Reviewed as the clinical trial that it is, the power posing paper should never have been published.

Has too much already been written about Amy Cuddy’s power pose paper? The conversation should not be stopped until its focus shifts and we change our ways of talking about psychological science.

The dominant narrative is now that a junior scientist published an influential paper on power posing and was subject to harassment and shaming by critics, pointing to the need for greater civility in scientific discourse.

Attention has shifted away from the scientific quality of the paper and the dubious products the paper has been used to promote and on the behavior of its critics.

Amy Cuddy and powerful allies are given forums to attack and vilify critics, accusing them of damaging the environment in which science is done and discouraging prospective early career investigators from entering the field.

Meanwhile, Amy Cuddy commands large speaking fees and has a top-selling book claiming the original paper provides strong science for simple behavioral manipulations altering mind-body relations and producing socially significant behavior.

This misrepresentation of psychological science does potential harm to consumers and the reputation of psychology among lay persons.

This blog post is intended to restart the conversation with a reconsideration of the original paper as a clinical and health psychology randomized trial (RCT) and, on that basis, identifying the kinds of inferences that are warranted from it.

In the first of a two post series, I argue that:

The original power pose article in Psychological Science should never been published.

-Basically, we have a therapeutic analog intervention delivered in 2 1-minute manipulations by unblinded experimenters who had flexibility in what they did,  what they communicated to participants, and which data they chose to analyze and how.

-It’s unrealistic to expect that 2 1-minute behavioral manipulations would have robust and reliable effects on salivary cortisol or testosterone 17 minutes later.

-It’s absurd to assume that the hormones mediated changes in behavior in this context.

-If Amy Cuddy retreats to the idea that she is simply manipulating “felt power,” we are solidly in the realm of trivial nonspecific and placebo effects.

The original power posing paper

Carney DR, Cuddy AJ, Yap AJ. Power posing brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science. 2010 Oct 1;21(10):1363-8.

The Psychological Science article can be construed as a brief mind-body intervention consisting of 2 1-minute behavioral manipulations. Central to the attention that the paper attracted is that argument that this manipulation  affected psychological state and social performance via the effects of the manipulation on the neuroendocrine system.

The original study is in effect, a disguised randomized clinical trial (RCT) of a biobehavioral intervention. Once this is recognized, a host of standards can come into play for reporting this study and interpreting the results.

CONSORT

All major journals and publishers including Association for Psychological Science have adopted the Consolidated Standards of Reporting Trials (CONSORT). Any submission of a manuscript reporting a clinical trial is required to be accompanied by a checklist  indicating that the article reports that particular details of how the trial was conducted. Item 1 on the checklist specifies that both the title and abstract indicate the study was a randomized trial. This is important and intended to aid readers in evaluating the study, but also for the study to be picked up in systematic searches for reviews that depend on screening of titles and abstracts.

I can find no evidence that Psychological Science adheres to CONSORT. For instance, my colleagues and I provided a detailed critique of a widely promoted study of loving-kindness meditation that was published in Psychological Science the same year as Cuddy’s power pose study. We noted that it was actually a poorly reported null trial with switched outcomes. With that recognition, we went on to identify serious conceptual, methodological and statistical problems. After overcoming considerable resistance, we were able  to publish a muted version of our critique. Apparently reviewers of the original paper had failed to evaluate it in terms of it being an RCT.

The submission of the completed CONSORT checklist has become routine in most journals considering manuscripts for studies of clinical and health psychology interventions. Yet, additional CONSORT requirements that developed later about what should be included in abstracts are largely being ignored.

It would be unfair to single out Psychological Science and the Cuddy article for noncompliance to CONSORT for abstracts. However, the checklist can be a useful frame of reference for noting just how woefully inadequate the abstract was as a report of a scientific study.

CONSORT for abstracts

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF, CONSORT Group. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLOS Medicine. 2008 Jan 22;5(1):e20.

Journal and conference abstracts should contain sufficient information about the trial to serve as an accurate record of its conduct and findings, providing optimal information about the trial within the space constraints of the abstract format. A properly constructed and well-written abstract should also help individuals to assess quickly the validity and applicability of the findings and, in the case of abstracts of journal articles, aid the retrieval of reports from electronic databases.

Even if CONSORT for abstracts did not exist, we could argue that readers, starting with the editor and reviewers were faced with an abstract with extraordinary claims that required better substantiation. They were disarmed by a lack of basic details from evaluating these claims.

In effect, the abstract reduces the study to an experimercial for products about to be marketed in corporate talks and workshops, but let’s persist in evaluating it as an abstract as a scientific study.

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

I don’t believe I have ever encountered in an abstract the extravagant claims with which this abstract concludes. But readers are not provided any basis for evaluating the claim until the Methods section. Undoubtedly, many holding opinions about the paper did not read that far.

Namely:

Forty-two participants (26 females and 16 males) were randomly assigned to the high-power-pose or low-power-pose condition.

Testosterone levels were in the normal range at both Time 1 (M = 60.30 pg/ml, SD = 49.58) and Time 2 (M = 57.40 pg/ml, SD = 43.25). As would be suggested by appropriately taken and assayed samples (Schultheiss & Stanton, 2009), men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.2

Too small a study to provide an effect size

Hold on! First. Only 42 participants  (26 females and 16 males) would readily be recognized as insufficient for an RCT, particularly in an area of research without past RCTs.

After decades of witnessing the accumulation of strong effect sizes from underpowered studies, many of us have reacted by requiring 35 participants per group as the minimum acceptable level for a generalizable effect size. Actually, that could be an overly liberal criterion. Why?

Many RCTs are underpowered, yet a lack of enforcement of preregistration allows positive results by redefining the primary outcomes after results are known. A psychotherapy trial with 30 or less patients in the smallest cell has less than a 50% probability of detecting a moderate sized significant effect, even if it is present (Coyne,Thombs, & Hagedoorn, 2010). Yet an examination of the studies mustered for treatments being evidence supported by APA Division 12 indicates that many studies were too underpowered to be reliably counted as evidence of efficacy, but were included without comment about this problem. Taking an overview, it is striking the extent to which the literature continues depend on small, methodologically flawed RCTs conducted by investigators with strong allegiances to one of the treatments being evaluated. Yet, which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated (Luborsky et al., 2006).

Earlier my colleagues and I had argued for the non-accumulative  nature of evidence from small RCTs:

Kraemer, Gardner, Brooks, and Yesavage (1998) propose excluding small, underpowered studies from meta-analyses. The risk of including studies with inadequate sample size is not limited to clinical and pragmatic decisions being made on the basis of trials that cannot demonstrate effectiveness when it is indeed present. Rather, Kraemer et al. demonstrate that inclusion of small, underpowered trials in meta-analyses produces gross overestimates of effect size due to substantial, but unquantifiable confirmatory publication bias from non-representative small trials. Without being able to estimate the size or extent of such biases, it is impossible to control for them. Other authorities voice support for including small trials, but generally limit their argument to trials that are otherwise methodologically adequate (Sackett & Cook, 1993; Schulz & Grimes, 2005). Small trials are particularly susceptible to common methodological problems…such as lack of baseline equivalence of groups; undue influence of outliers on results; selective attrition and lack of intent-to-treat analyses; investigators being unblinded to patient allotment; and not having a pre-determined stopping point so investigators are able to stop a trial when a significant effect is present.

In the power posing paper, there was the control for sex in all analyses because a peek at the data revealed baseline sex differences in testosterone dwarfing any other differences. What do we make of investigators conducting a study depending on testosterone mediating a behavioral manipulation who did not anticipate large baseline sex differences in testosterone?

In a Pubpeer comment leading up to this post , I noted:

We are then told “men were higher than women on testosterone at both Time 1, F(1, 41) = 17.40, p < .001, r = .55, and Time 2, F(1, 41) = 22.55, p < .001, r = .60. To control for sex differences in testosterone, we used participant’s sex as a covariate in all analyses. All hormone analyses examined changes in hormones observed at Time 2, controlling for Time 1. Analyses with cortisol controlled for testosterone, and vice versa.”

The findings alluded to in the abstract should be recognizable as weird and uninterpretable. Most basically, how could the 16 males be distributed across the two groups so that the authors could confidently say that differences held for both males and females? Especially when all analyses control for sex? Sex is highly correlated with testosterone and so an analysis that controlled for both the variables, sex and testosterone would probably not generalize to testosterone without such controls.

We are never given the basic statistics in the paper to independently assess what the authors are doing, not the correlation between cortisol and testosterone, only differences in time 2 cortisol controlling for time 1 cortisol, time 1 testosterone and gender. These multivariate statistics are not  very generalizable in a sample with 42 participants distributed across 2 groups. Certainly not for the 26 females and 16  males taken separately.

The behavioral manipulation

The original paper reports:

Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation.

And then elaborates:

To configure the test participants into the poses, the experimenter placed an electrocardiography lead on the back of each participant’s calf and underbelly of the left arm and explained, “To test accuracy of physiological responses as a function of sensor placement relative to your heart, you are being put into a certain physical position.” The experimenter then manually configured participants’ bodies by lightly touching their arms and legs. As needed, the experimenter provided verbal instructions (e.g., “Keep your feet above heart level by putting them on the desk in front of you”). After manually configuring participants’ bodies into the two poses, the experimenter left the room. Participants were videotaped; all participants correctly made and held either two high-power or two low-power poses for 1 min each. While making and holding the poses, participants completed a filler task that consisted of viewing and forming impressions of nine faces.

The behavioral task and subjective self-report assessment

Measure of risk taking and powerful feelings. After they finished posing, participants were presented with the gambling task. They were endowed with $2 and told they could keep the money—the safe bet—or roll a die and risk losing the $2 for a payoff of $4 (a risky but rational bet; odds of winning were 50/50). Participants indicated how “powerful” and “in charge” they felt on a scale from 1 (not at all) to 4 (a lot).

An imagined bewildered review from someone accustomed to evaluating clinical trials

Although the authors don’t seem to know what they’re doing, we have an underpowered therapy analogue study with extraordinary claims. It’s unconvincing  that the 2 1-minute behavioral manipulations would change subsequent psychological states and behavior with any extralaboratory implications.

The manipulation poses a puzzle to research participants, challenging them to figure out what is being asked of them. The $2 gambling task presumably is meant to simulate effects on real-world behavior. But the low stakes could mean that participants believed the task evaluated whether they “got” the purpose of the intervention and behaved accordingly. Within that perspective, the unvalidated subjective self-report rating scale would serve as a clue to the intentions of the experimenter and an opportunity to show the participants were smart. The  manipulation of putting participants  into a low power pose is even more unconvincing as a contrasting active intervention or a control condition.  Claims that this manipulation did anything but communicate experimenter expectancies are even less credible.

This is a very weak form of evidence: A therapy analogue study with such a brief, low intensity behavioral manipulation followed by assessments of outcomes that might just inform participants of what they needed to do to look smart (i.e., demand characteristics). Add in that the experimenters were unblinded and undoubted had flexibility in how they delivered the intervention and what they said to participants. As a grossly underpowered trial, the study cannot make a contribution to the literature and certainly not an effect size.

Furthermore, if the authors had even a basic understanding of gender differences in social status or sex differences in testosterone, they would have stratified the study with respect to participate gender, not attempted to obtain control by post hoc statistical manipulation.

I could comment on signs of p-hacking and widespread signs of inappropriate naming, use, and interpretation of statistics, but why bother? There are no vital signs of a publishable paper here.

Is power posing salvaged by fashionable hormonal measures?

 Perhaps the skepticism of the editor and reviewers was overcome by the introduction of mind-body explanations  of what some salivary measures supposedly showed. Otherwise, we would be left with a single subjective self-report measure and a behavioral task susceptible to demand characteristics and nonspecific effects.

We recognize that the free availability of powerful statistical packages risks people using them without any idea of the appropriateness of their use or interpretation. The same observation should be made of the ready availability of means of collecting spit samples from research participants to be sent off to outside laboratories for biochemical analysis.

The clinical health psychology literature is increasingly filled with studies incorporating easily collected saliva samples intended to establish that psychological interventions influence mind-body relations. These have become particularly applied in attempts to demonstrate that mindfulness meditation and even tai chi can have beneficial effects on physical health and even cancer outcomes.

Often inaccurately described as as “biomarkers,” rather than merely as biological measurements, there is seldom little learned by inclusion of such measures that is generalizable within participants or across studies.

Let’s start with salivary-based cortisol measures.

A comprehensive review  suggests that:

  • A single measurement on a participant  or a pre-post pair of assessments would not be informative.
  • Single measurements are unreliable and large intra-and inter-individual differences not attributable to intervention can be in play.
  • Minor variations in experimental procedures can have large, unwanted effects.
  • The current standard is cortisol awakening response in the diurnal slope over more than one day, which would not make sense for the effects of 2 1-minute behavioral manipulations.
  • Even with sophisticated measurement strategies there is low agreement across and even within studies and low agreement with behavioral and self-report data.
  • The idea of collecting saliva samples would serve the function the investigators intended is an unscientific, but attractive illusion.

Another relevant comprehensive theoretical review and synthesis of cortisol reactivity was available at the time the power pose study was planned. The article identifies no basis for anticipating that experimenters putting participants into a 1-minute expansive poses would lower cortisol. And certainly no basis for assuming that putting participants into a 1-minute slumped position would raise cortisol. Or what such findings could possibly mean.

But we are clutching at straws. The authors’ interpretations of their hormonal data depend on bizarre post hoc decisions about how to analyze their data in a small sample in which participant sex is treated in incomprehensible  fashion. The process of trying to explain spurious results risks giving the results a credibility that authors have not earned for them. And don’t even try to claim we are getting signals of hormonal mediation from this study.

Another system failure: The incumbent advantage given to a paper that should not have been published.

Even when publication is based on inadequate editorial oversight and review, any likelihood or correction is diminished by published results having been blessed as “peer reviewed” and accorded an incumbent advantage over whatever follows.

A succession of editors have protected the power pose paper from post-publication peer review. Postpublication review has been relegated to other journals and social media, including PubPeer and blogs.

Soon after publication of  the power pose paper, a critique was submitted to Psychological Science, but it was desk rejected. The editor informally communicated to the author that the critique read like a review and teh original article had already been peer reviewed.

The critique by Steven J. Stanton nonetheless eventually appeared in Frontiers in Behavioral Neuroscience and is worth a read.

Stanton took seriously the science being invoked in the claims of the power pose paper.

A sampling:

Carney et al. (2010) collapsed over gender in all testosterone analyses. Testosterone conforms to a bimodal distribution when including both genders (see Figure 13; Sapienza et al., 2009). Raw testosterone cannot be considered a normally distributed dependent or independent variable when including both genders. Thus, Carney et al. (2010) violated a basic assumption of the statistical analyses that they reported, because they used raw testosterone from pre- and post-power posing as independent and dependent variables, respectively, with all subjects (male and female) included.

And

^Mean cortisol levels for all participants were reported as 0.16 ng/mL pre-posing and 0.12 ng/mL post-posing, thus showing that for all participants there was an average decrease of 0.04 ng/mL from pre- to post-posing, regardless of condition. Yet, Figure 4 of Carney et al. (2010) shows that low-power posers had mean cortisol increases of roughly 0.025 ng/mL and high-power posers had mean cortisol decreases of roughly 0.03 ng/mL. It is unclear given the data in Figure 4 how the overall cortisol change for all participants could have been a decrease of 0.04 ng/mL.

Another editor of Psychological Science received a critical comment from Marcus Crede and Leigh A. Phillips. After the first round of reviews, the Crede and Philips removed references to changes in the published power pose paper from earlier drafts that they had received from the first author, Dana Carney. However, Crede and Phillips withdrew their critique when asked to respond to a review by Amy Cuddy in a second resubmission.

The critique is now forthcoming in Social Psychological and Personality Science

Revisiting the Power Pose Effect: How Robust Are the Results Reported by Carney, Cuddy and Yap (2010) to Data Analytic Decisions

The article investigates effects of choices made in p-hacking in the original paper. An excerpt from the abstract

In this paper we use multiverse analysis to examine whether the findings reported in the original paper by Carney, Cuddy, and Yap (2010) are robust to plausible alternative data analytic specifications: outlier identification strategy; the specification of the dependent variable; and the use of control variables. Our findings indicate that the inferences regarding the presence and size of an effect on testosterone and cortisol are  highly sensitive to data analytic specifications. We encourage researchers to routinely explore the influence of data analytic choices on statistical inferences and also encourage editors and  reviewers to require explicit examinations of the influence of alternative data analytic  specifications on the inferences that are drawn from data.

Dana Carney, the first author of the has now posted an explanation why she no longer believes the originally reported findings are genuine and why “the evidence against the existence of power poses is undeniable.” She discloses a number of important confounds and important “researcher degrees of freedom in the analyses reported in the published paper.

Coming Up Next

A different view of the Amy Cuddy’s Ted talk in terms of its selling of pseudoscience to consumers and its acknowledgment of a strong debt to Cuddy’s adviser Susan Fiske.

A disclosure of some of the financial interests that distort discussion of the scientific flaws of the power pose.

How the reflexive response of the replicationados inadvertently reinforced the illusion that the original pose study provided meaningful effect sizes.

How Amy Cuddy and her allies marshalled the resources of the Association for Psychological Science to vilify and intimidate critics of bad science and of the exploitation of consumers by psychological pseudoscience.

How journalists played into this vilification.

What needs to be done to avoid a future fiasco for psychology like the power pose phenomenon and protect reformers of the dissemination of science.

Note: Time to reiterate that all opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Jane Brody promoting the pseudoscience of Barbara Fredrickson in the New York Times

Journalists’ coverage of positive psychology and health is often shabby, even in prestigious outlets like The New York Times.

Jane Brody’s latest installment of the benefits of being positive on health relied heavily on the work of Barbara Fredrickson that my colleagues and I have thoroughly debunked.

All of us need to recognize that research concerning effects of positive psychology interventions are often disguised randomized controlled trials.

With that insight, we need to evaluate this research in terms of reporting standards like CONSORT and declarations of conflict of interests.

We need to be more skeptical about the ability of small changes in behavior being able to profoundly improve health.

When in doubt, assume that much of what we read in the media about positivity and health is false or at least exaggerated.

Jane Brody starts her article in The New York Times by describing how most mornings she is “grinning from ear to ear, uplifted not just by my own workout but even more so” by her interaction with toddlers on the way home from where she swims. When I read Brody’s “Turning Negative Thinkers Into Positive Ones.” I was not left grinning ear to ear. I was left profoundly bummed.

I thought real hard about what was so unsettling about Brody’s article. I now have some clarity.

I don’t mind suffering even pathologically cheerful people in the morning. But I do get bothered when they serve up pseudoscience as the real thing.

I had expected to be served up Brody’s usual recipe of positive psychology pseudoscience concocted  to coerce readers into heeding her Barnum advice about how they should lead their lives. “Smile or die!” Apologies to my friend Barbara Ehrenreich for my putting the retitling of her book outside of North America to use here. I invoke the phrase because Jane Brody makes the case that unless we do what she says, we risk hurting our health and shortening our lives. So we better listen up.

What bummed me most this time was that Brody was drawing on the pseudoscience of Barbara Fredrickson that my colleagues and I have worked so hard to debunk. We took the trouble of obtaining data sets for two of her key papers for reanalysis. We were dismayed by the quality of the data. To start with, we uncovered carelessness at the level of data entry that undermined her claims. But her basic analyses and interpretations did not hold up either.

Fredrickson publishes exaggerated claims about dramatic benefits of simple positive psychology exercises. Fredrickson is very effective in blocking or muting the publication of criticism and getting on with hawking her wares. My colleagues and I have talked to others who similarly met considerable resistance from editors in getting detailed critiques and re-analyses published. Fredrickson is also aided by uncritical people like Jane Brody to promote her weak and inconsistent evidence as strong stuff. It sells a lot of positive psychology merchandise to needy and vulnerable people, like self-help books and workshops.

If it is taken seriously, Fredrickson’s research concerns health effects of behavioral intervention. Yet, her findings are presented in a way that does not readily allow their integration with the rest of health psychology literature. It would be difficult, for instance, to integrate Fredrickson’s randomized trials of loving-kindness meditation with other research because she makes it almost impossible to isolate effect sizes in a way that they could be integrated with other studies in a meta-analysis. Moreover, Fredrickson has multiply published contradictory claims from the sae data set without acknowledging the duplicate publication. [Please read on. I will document all of these claims before the post ends.]

The need of self-help gurus to generate support for their dramatic claims in lucrative positive psychology self-help products is never acknowledged as a conflict of interest.  It should be.

Just imagine, if someone had a contract based on a book prospectus promising that the claims of their last pop psychology book would be surpassed. Such books inevitably paint life too simply, with simple changes in behavior having profound and lasting effects unlike anything obtained in the randomized trials of clinical and health psychology. Readers ought to be informed that these pressures to meet demands of a lucrative book contract could generate a strong confirmation bias. Caveat emptor auditor, but how about at least informing readers and let them decide whether following the money influences their interpretation of what they read?

Psychology journals almost never require disclosures of conflicts of interest of this nature. I am campaigning to make that practice routine, nondisclosure of such financial benefits tantamount to scientific misconduct. I am calling for readers to take to social media when these disclosures do not appear in scientific journals where they should be featured prominently. And holding editors responsible for non-enforcement . I can cite Fredrickson’s work as a case in point, but there are many other examples, inside and outside of positive psychology.

Back to Jane Brody’s exaggerated claims for Fredrickson’s work.

I lived for half a century with a man who suffered from periodic bouts of depression, so I understand how challenging negativism can be. I wish I had known years ago about the work Barbara Fredrickson, a psychologist at the University of North Carolina, has done on fostering positive emotions, in particular her theory that accumulating “micro-moments of positivity,” like my daily interaction with children, can, over time, result in greater overall well-being.

The research that Dr. Fredrickson and others have done demonstrates that the extent to which we can generate positive emotions from even everyday activities can determine who flourishes and who doesn’t. More than a sudden bonanza of good fortune, repeated brief moments of positive feelings can provide a buffer against stress and depression and foster both physical and mental health, their studies show.

“Research…demonstrates” (?). Brody is feeding stupid-making pablum to readers. Fredrickson’s kind of research may produce evidence one way or the other, but it is too strong a claim, an outright illusion, to even begin suggesting that it “demonstrates” (proves) what follows in this passage.

Where, outside of tabloids and self-help products, do the immodest claims that one or a few poor quality studies “demonstrate”?

Negative feelings activate a region of the brain called the amygdala, which is involved in processing fear and anxiety and other emotions. Dr. Richard J. Davidson, a neuroscientist and founder of the Center for Healthy Minds at the University of Wisconsin — Madison, has shown that people in whom the amygdala recovers slowly from a threat are at greater risk for a variety of health problems than those in whom it recovers quickly.

Both he and Dr. Fredrickson and their colleagues have demonstrated that the brain is “plastic,” or capable of generating new cells and pathways, and it is possible to train the circuitry in the brain to promote more positive responses. That is, a person can learn to be more positive by practicing certain skills that foster positivity.

We are knee deep in neuro-nonsense. Try asking a serious neuroscientists about the claims that this duo have “demonstrated that the brain is ‘plastic,’ or that practicing certain positivity skills change the brain with the health benefits that they claim via Brody. Or that they are studying ‘amygdala recovery’ associated with reduced health risk.

For example, Dr. Fredrickson’s team found that six weeks of training in a form of meditation focused on compassion and kindness resulted in an increase in positive emotions and social connectedness and improved function of one of the main nerves that helps to control heart rate. The result is a more variable heart rate that, she said in an interview, is associated with objective health benefits like better control of blood glucose, less inflammation and faster recovery from a heart attack.

I will dissect this key claim about loving-kindness meditation and vagal tone/heart rate variability shortly.

Dr. Davidson’s team showed that as little as two weeks’ training in compassion and kindness meditation generated changes in brain circuitry linked to an increase in positive social behaviors like generosity.

We will save discussing Richard Davidson for another time. But really, Jane, just two weeks to better health? Where is the generosity center in brain circuitry? I dare you to ask a serious neuroscientist and embarrass yourself.

“The results suggest that taking time to learn the skills to self-generate positive emotions can help us become healthier, more social, more resilient versions of ourselves,” Dr. Fredrickson reported in the National Institutes of Health monthly newsletter in 2015.

In other words, Dr. Davidson said, “well-being can be considered a life skill. If you practice, you can actually get better at it.” By learning and regularly practicing skills that promote positive emotions, you can become a happier and healthier person. Thus, there is hope for people like my friend’s parents should they choose to take steps to develop and reinforce positivity.

In her newest book, “Love 2.0,” Dr. Fredrickson reports that “shared positivity — having two people caught up in the same emotion — may have even a greater impact on health than something positive experienced by oneself.” Consider watching a funny play or movie or TV show with a friend of similar tastes, or sharing good news, a joke or amusing incidents with others. Dr. Fredrickson also teaches “loving-kindness meditation” focused on directing good-hearted wishes to others. This can result in people “feeling more in tune with other people at the end of the day,” she said.

Brody ends with 8 things Fredrickson and others endorse to foster positive emotions. (Why only 8 recommendations, why not come up with 10 and make them commandments?) These include “Do good things for other people” and “Appreciate the world around you. Okay, but do Fredrickson and Davidson really show that engaging in these activities have immediate and dramatic effects on our health? I have examined their research and I doubt it. I think the larger problem, though, is the suggestion that physically ill people facing shortened lives risk being blamed for being bad people. They obviously did not do these 8 things or else they would be healthy.

If Brody were selling herbal supplements or coffee enemas, we would readily label the quackery. We should do the same for advice about psychological practices that are promised to transform lives.

Brody’s sloppy links to support her claims: Love 2.0

Journalists who talk of “science”  and respect their readers will provide links to their actual sources in the peer-reviewed scientific literature. That way, readers who are motivated can independently review the evidence. Especially in an outlet as prestigious as The New York Times.

Jane Brody is outright promiscuous in the links that she provides, often secondary or tertiary sources. The first link provide for her discussion of Fredrickson’s Love 2.0 is actually to a somewhat negative review of the book. https://www.scientificamerican.com/article/mind-reviews-love-how-emotion-afftects-everything-we-feel/

Fredrickson builds her case by expanding on research that shows how sharing a strong bond with another person alters our brain chemistry. She describes a study in which best friends’ brains nearly synchronize when exchanging stories, even to the point where the listener can anticipate what the storyteller will say next. Fredrickson takes the findings a step further, concluding that having positive feelings toward someone, even a stranger, can elicit similar neural bonding.

This leap, however, is not supported by the study and fails to bolster her argument. In fact, most of the evidence she uses to support her theory of love falls flat. She leans heavily on subjective reports of people who feel more connected with others after engaging in mental exercises such as meditation, rather than on more objective studies that measure brain activity associated with love.

I would go even further than the reviewer. Fredrickson builds her case by very selectively drawing on the literature, choosing only a few studies that fit.  Even then, the studies fit only with considerable exaggeration and distortion of their findings. She exaggerates the relevance and strength of her own findings. In other cases, she says things that have no basis in anyone’s research.

I came across Love 2.0: How Our Supreme Emotion Affects Everything We Feel, Think, Do, and Become (Unabridged) that sells for $17.95. The product description reads:

We all know love matters, but in this groundbreaking book positive emotions expert Barbara Fredrickson shows us how much. Even more than happiness and optimism, love holds the key to improving our mental and physical health as well as lengthening our lives. Using research from her own lab, Fredrickson redefines love not as a stable behemoth, but as micro-moments of connection between people – even strangers. She demonstrates that our capacity for experiencing love can be measured and strengthened in ways that improve our health and longevity. Finally, she introduces us to informal and formal practices to unlock love in our lives, generate compassion, and even self-soothe. Rare in its scope and ambitious in its message, Love 2.0 will reinvent how you look at and experience our most powerful emotion.

There is a mishmash of language games going on here. Fredrickson’s redefinition of love is not based on her research. Her claim that love is ‘really’ micro-moments of connection between people  – even strangers is a weird re-definition. Attempt to read her book, if you have time to waste.

You will quickly see that much of what she says makes no sense in long-term relationships which is solid but beyond the honeymoon stage. Ask partners in long tem relationships and they will undoubtedly lack lots of such “micro-moments of connection”. I doubt that is adaptive for people seeking to build long term relationships to have the yardstick that if lots of such micro-moments don’t keep coming all the time, the relationship is in trouble. But it is Fredrickson who is selling the strong claims and the burden is on her to produce the evidence.

If you try to take Fredrickson’s work seriously, you wind up seeing she has a rather superficial view of a close relationships and can’t seem to distinguish them from what goes on between strangers in drunken one-night stands. But that is supposed to be revolutionary science.

We should not confuse much of what Fredrickson emphatically states with testable hypotheses. Many statements sound more like marketing slogans – what Joachim Kruger and his student Thomas Mairunteregger identify as the McDonaldalization of positive psychology. Like a Big Mac, Fredrickson’s Love 2.0 requires a lot of imagination to live up to its advertisement.

Fredrickson’s love the supreme emotion vs ‘Trane’s Love Supreme

Where Fredrickson’s selling of love as the supreme emotion is not simply an advertising slogan, it is a bad summary of the research on love and health. John Coltrane makes no empirical claim about love being supreme. But listening to him is an effective self-soothing after taking Love 2.0 seriously and trying to figure it out.  Simply enjoy and don’t worry about what it does for your positivity ratio or micro-moments, shared or alone.

Fredrickson’s study of loving-kindness meditation

Jane Brody, like Fredrickson herself depends heavily on a study of loving kindness meditation in proclaiming the wondrous, transformative health benefits of being loving and kind. After obtaining Fredrickson’s data set and reanalyzing it, my colleagues – James Heathers, Nick Brown, and Harrison Friedman – and I arrived at a very different interpretation of her study. As we first encountered it, the study was:

Kok, B. E., Coffey, K. A., Cohn, M. A., Catalino, L. I., Vacharkulksemsuk, T., Algoe, S. B., . . . Fredrickson, B. L. (2013). How positive emotions build physical health: Perceived positive social connections account for the upward spiral between positive emotions and vagal tone. Psychological Science, 24, 1123-1132.

Consolidated standards for reporting randomized trials (CONSORT) are widely accepted for at least two reasons. First, clinical trials should be clearly identified as such in order to ensure that the results are a recognized and available in systematic searches to be integrated with other studies. CONSORT requires that RCTs be clearly identified in the titles and abstracts. Once RCTs are labeled as such, the CONSORT checklist becomes a handy tallying of what needs to be reported.

It is only in supplementary material that the Kok and Fredrickson paper is identify as a clinical trial. Only in that supplement is the primary outcome is identified, even in passing. No means are reported anywhere in the paper or supplement. Results are presented in terms of what Kok and Fredrickson term “a variant of a mediational, parallel process, latent-curve model.” Basic statistics needed for its evaluation are left to readers’ imagination. Figure 1 in the article depicts the awe-inspiring parallel-process mediational model that guided the analyses. We showed the figure to a number of statistical experts including Andrew Gelman. While some elements were readily recognizable, the overall figure was not, especially the mysterious large dot (a causal pathway roundabout?) near the top.

So, not only might study not be detected as an RCT, there isn’t relevant information that could be used for calculating effect sizes.

Furthermore, if studies are labeled as RCTs, we immediately seek protocols published ahead of time that specify the basic elements of design and analyses and primary outcomes. At Psychological Science, studies with protocols are unusual enough to get the authors awarded a badge. In the clinical and health psychology literature, protocols are increasingly common, like flushing a toilet after using a public restroom. No one runs up and thanks you, “Thank you for flushing/publishing your protocol.”

If Fredrickson and her colleagues are going to be using the study to make claims about the health benefits of loving kindness meditation, they have a responsibility to adhere to CONSORT and to publish their protocol. This is particularly the case because this research was federally funded and results need to be transparently reported for use by a full range of stakeholders who paid for the research.

We identified a number of other problems and submitted a manuscript based on a reanalysis of the data. Our manuscript was promptly rejected by Psychological Science. The associate editor . Batja Mesquita noted that two of my co-authors, Nick Brown and Harris Friedman had co-authored a paper resulting in a partial retraction of Fredrickson’s, positivity ratio paper.

Brown NJ, Sokal AD, Friedman HL. The Complex Dynamics of Wishful Thinking: The Critical Positivity Ratio American Psychologist. 2013 Jul 15.

I won’t go into the details, except to say that Nick and Harris along with Alan Sokal unambiguously established that Fredrickson’s positivity ratio of 2.9013 positive to negative experiences was a fake fact. Fredrickson had been promoting the number  as an “evidence-based guideline” of a ratio acting as a “tipping point beyond which the full impact of positive emotions becomes unleashed.” Once Brown and his co-authors overcame strong resistance to getting their critique published, their paper garnered a lot of attention in social and conventional media. There is a hilariously funny account available at Nick Brown Smelled Bull.

Batja Mesquita argued that that the previously published critique discouraged her from accepting our manuscript. To do, she would be participating in “a witch hunt” and

 The combatant tone of the letter of appeal does not re-assure me that a revised commentary would be useful.

Welcome to one-sided tone policing. We appealed her decision, but Editor Eric Eich indicated, there was no appeal process at Psychological Science, contrary to the requirements of the Committee on Publication Ethics, COPE.

Eich relented after I shared an email to my coauthors in which I threatened to take the whole issue into social media where there would be no peer-review in the traditional outdated sense of the term. Numerous revisions of the manuscript were submitted, some of them in response to reviews by Fredrickson  and Kok who did not want a paper published. A year passed occurred before our paper was accepted and appeared on the website of the journal. You can read our paper here. I think you can see that fatal problems are obvious.

Heathers JA, Brown NJ, Coyne JC, Friedman HL. The elusory upward spiral a reanalysis of Kok et al.(2013). Psychological Science. 2015 May 29:0956797615572908.

In addition to the original paper not adhering to CONSORT, we noted

  1. There was no effect of whether participants were assigned to the loving kindness mediation vs. no-treatment control group on the key physiological variable, cardiac vagal tone. This is a thoroughly disguised null trial.
  2. Kok and Frederickson claimed that there was an effect of meditation on cardiac vagal tone, but any appearance of an effect was due to reduced vagal tone in the control group, which cannot readily be explained.
  3. Kok and Frederickson essentially interpreted changes in cardiac vagal tone as a surrogate outcome for more general changes in physical health. However, other researchers have noted that observed changes in cardiac vagal tone are not consistently related to changes in other health variables and are susceptible to variations in experimental conditions that have nothing to do with health.
  4. No attention was given to whether participants assigned to the loving kindness meditation actually practiced it with any frequency or fidelity. The article nonetheless reported that such data had been collected.

Point 2 is worth elaborating. Participants in the control condition received no intervention. Their assessment of cardiac vagal tone/heart rate variability was essentially a test/retest reliability test of what should have been a stable physiological characteristic. Yet, participants assigned to this no-treatment condition showed as much change as the participants who were assigned to meditation, but in the opposite direction. Kok and Fredrickson ignored this and attributed all differences to meditation. Houston, we have a problem, a big one, with unreliability of measurement in this study.

We could not squeeze all of our critique into our word limit, but James Heathers, who is an expert on cardiac vagal tone/heart rate variability elaborated elsewhere.

  • The study was underpowered from the outset, but sample size decreased from 65 to 52 to missing data.
  • Cardiac vagal tone is unreliable except in the context of carefully control of the conditions in which measurements are obtained, multiple measurements on each participant, and a much larger sample size. None of these conditions were met.
  • There were numerous anomalies in the data, including some participants included without baseline data, improbable baseline or follow up scores, and improbable changes. These alone would invalidate the results.
  • Despite not reporting  basic statistics, the article was full of graphs, impressive to the unimformed, but useless to readers attempting to make sense of what was done and with what results.

We later learned that the same data had been used for another published paper. There was no cross-citation and the duplicate publication was difficult to detect.

Kok, B. E., & Fredrickson, B. L. (2010). Upward spirals of the heart: Autonomic flexibility, as indexed by vagal tone, reciprocally and prospectively predicts positive emotions and social connectedness. Biological Psychology, 85, 432–436. doi:10.1016/j.biopsycho.2010.09.005

Pity the poor systematic reviewer and meta analyst trying to make sense of this RCT and integrate it with the rest of the literature concerning loving-kindness meditation.

This was not our only experience obtained data for a paper crucial to Fredrickson’s claims and having difficulty publishing  our findings. We obtained data for claims that she and her colleagues had solved the classical philosophical problem of whether we should pursue pleasure or meaning in our lives. Pursuing pleasure, they argue, will adversely affect genomic transcription.

We found we could redo extremely complicated analyses and replicate original findings but there were errors in the the original entering data that entirely shifted the results when corrected. Furthermore, we could replicate the original findings when we substituted data from a random number generator for the data collected from study participants. After similar struggles to what we experienced with Psychological Science, we succeeded in getting our critique published.

The original paper

Fredrickson BL, Grewen KM, Coffey KA, Algoe SB, Firestine AM, Arevalo JM, Ma J, Cole SW. A functional genomic perspective on human well-being. Proceedings of the National Academy of Sciences. 2013 Aug 13;110(33):13684-9.

Our critique

Brown NJ, MacDonald DA, Samanta MP, Friedman HL, Coyne JC. A critical reanalysis of the relationship between genomics and well-being. Proceedings of the National Academy of Sciences. 2014 Sep 2;111(35):12705-9.

See also:

Nickerson CA. No Evidence for Differential Relations of Hedonic Well-Being and Eudaimonic Well-Being to Gene Expression: A Comment on Statistical Problems in Fredrickson et al.(2013). Collabra: Psychology. 2017 Apr 11;3(1).

A partial account of the reanalysis is available in:

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure. PLOS Blogs Mind the Brain

Wrapping it up

Strong claims about health effects require strong evidence.

  • Evidence produced in randomized trials need to be reported according to established conventions like CONSORT and clear labeling of duplicate publications.
  • When research is conducted with public funds, these responsibilities are increased.

I have often identified health claims in high profile media like The New York Times and The Guardian. My MO has been to trace the claims back to the original sources in peer reviewed publications, and evaluate both the media reports and the quality of the primary sources.

I hope that I am arming citizen scientists for engaging in these activities independent of me and even to arrive at contradictory appraisals to what I offer.

  • I don’t think I can expect to get many people to ask for data and perform independent analyses and certainly not to overcome the barriers my colleagues and I have met in trying to publish our results. I share my account of some of those frustrations as a warning.
  • I still think I can offer some take away messages to citizen scientists interested in getting better quality, evidence-based information on the internet.
  • Assume most of the claims readers encounter about psychological states and behavior being simply changed and profoundly influencing physical health are false or exaggerated. When in doubt, disregard the claims and certainly don’t retweet or “like” them.
  • Ignore journalists who do not provide adequate links for their claims.
  • Learn to identify generally reliable sources and take journalists off the list when they have made extravagant or undocumented claims.
  • Appreciate the financial gains to be made by scientists who feed journalists false or exaggerated claims.

Advice to citizen scientists who are cultivating more advanced skills:

Some key studies that Brody invokes in support of her claims being science-based are poorly conducted and reported clinical trials that are not labeled as such. This is quite common in positive psychology, but you need to cultivate skills to even detect that is what is going on. Even prestigious psychology journals are often lax in labeling studies as RCTs and in enforcing reporting standards. Authors’ conflicts of interest are ignored.

It is up to you to

  • Identify when the claims you are being fed should have been evaluated in a clinical trial.
  • Be skeptical when the original research is not clearly identified as clinical trial but nonetheless compares participants who received the intervention and those who did not.
  • Be skeptical when CONSORT is not followed and there is no published protocol.
  • Be skeptical of papers published in journals that do not enforce these requirements.

Disclaimer

I think I have provided enough details for readers to decide for themselves whether I am unduly influenced by my experiences with Barbara Fredrickson and her data. She and her colleagues have differing accounts of her research and of the events I have described in this blog.

As a disclosure, I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

Trusted source? The Conversation tells migraine sufferers that child abuse may be at the root of their problems

Patients and family members face a challenge obtaining credible, evidence-based information about health conditions from the web.

Migraine sufferers have a particularly acute need because their condition is often inadequately self-managed without access to best available treatment approaches. Demoralized by the failure of past efforts to get relief, some sufferers may give up consulting professionals and desperately seek solutions on Internet.

A lot of both naïve and exploitative quackery that awaits them.

Even well-educated patients cannot always distinguish the credible from the ridiculous.

One search strategy is to rely on websites that have proven themselves as trusted sources.

The Conversation has promoted itself as such a trusted source, but its brand is tarnished by recent nonsense we will review concerning the role of child abuse in migraines.

Despite some excellent material that has appeared in other articles in The Conversation, I’m issuing a reader’s advisory:

exclamation pointThe Conversation cannot be trusted because this article shamelessly misinforms migraine sufferers that child abuse could be at the root of their problems.

The Conversation article concludes with a non sequitur that shifts sufferers and their primary care physicians away from getting consultation with the medical specialists who are most able to improve management of a complex condition.

 

The Conversation article tells us:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

This blog post identifies clickbait, manipulation, misapplied buzz terms, and  misinformation – in the The Conversation article.

Perhaps the larger message of this blog post is that persons with complex medical conditions and those who provide formal and informal care for them should not rely solely on what they find on the Internet. This exercise specifically focusing on The Conversation article serves to demonstrate this.

Hopefully, The Conversation will issue a correction, as they promise to do at the website when errors are found.

We are committed to responsible and ethical journalism, with a strict Editorial Charter and codes of conduct. Errors are corrected promptly.

The Conversation article –

Why emotional abuse in childhood may lead to migraines in adulthood

clickbaitA clickbait title offered a seductive  integration of a trending emotionally laden social issue – child abuse – with a serious medical condition – migraines – for which management is often not optimal. A widely circulating estimate is that 60% of migraine sufferers do not get appropriate medical attention in large part because they do not understand the treatment options available and may actually stop consulting physicians.

Some quick background about migraine from another, more credible source:

Migraines are different from other headaches. People who suffer migraines other debilitating symptoms.

  • visual disturbances (flashing lights, blind spots in the vision, zig zag patterns etc).
  • nausea and / or vomiting.
  • sensitivity to light (photophobia).
  • sensitivity to noise (phonophobia).
  • sensitivity to smells (osmophobia).
  • tingling / pins and needles / weakness / numbness in the limbs.

Persons with migraines differ greatly among themselves in terms of the frequency, intensity, and chronicity of their symptoms, as well as their triggers for attacks.

Migraine is triggered by an enormous variety of factors – not just cheese, chocolate and red wine! For most people there is not just one trigger but a combination of factors which individually can be tolerated. When these triggers occur altogether, a threshold is passed and a migraine is triggered. The best way to find your triggers is to keep a migraine diary. Download your free diary now!

Into The Conversation article: What is the link between emotional abuse and migraines?

Without immediately providing a clicklink so that  readers can check sources themselves, The Conversation authors say they are drawing on “previous research, including our own…” to declare there is indeed an association between past abuse and migraines.

Previous research, including our own, has found a link between experiencing migraine headaches in adulthood and experiencing emotional abuse in childhood. So how strong is the link? What is it about childhood emotional abuse that could lead to a physical problem, like migraines, in adulthood?

In invoking the horror of childhood emotional abuse, the authors imply that they are talking about something infrequent – outside the realm of most people’s experience.  If “childhood emotional abuse” is commonplace, how could  it be horrible and devastating?

In their pursuit of click bait sensationalism, the authors have only succeeded in trivializing a serious issue.

A minority of people endorsing items concerning past childhood emotional abuse actually currently meet criteria for a diagnosis of posttraumatic stress disorder. Their needs are not met by throwing them into a larger pool of people who do not meet these criteria and making recommendations based on evidence derived from the combined group.

Spiky_Puffer_Fish_Royalty_Free_Clipart_Picture_090530-025255-184042The Conversation authors employ a manipulative puffer fish strategy [1 and  2 ] They take what is a presumably infrequent condition and  attach horror to it. But they then wildly increase the presumed prevalence by switching to a definition that arises in a very different context:

Any act or series of acts of commission or omission by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child.

So we are now talking about ‘Any act or series of acts? ‘.. That results in ‘harm, potential for harm or threat’? The authors then assert that yes, whatever they are talking about is indeed that common. But the clicklink to support for this claim takes the reader behind a pay wall where a consumer can’t venture without access to a university library account.

Most readers are left with the authors’ assertion as an authority they can’t check. I have access to a med school library and I checked. The link is  to a secondary source. It is not a systematic review of the full range of available evidence. Instead, it is a  selective search for evidence favoring particular speculations. Disconfirming evidence is mostly ignored. Yet, this article actually contradicts other assertions of The Conversation authors. For instance, the paywalled article says that there is actually little evidence that cognitive behavior therapy is effective for people whose need for therapy is only because they  reported abuse in early childhood.

Even if you can’t check The Conversation authors’ claims, know that adults’ retrospective of childhood adversity are not particularly reliable or valid, especially studies relying on checklist responses of adults to broad categories, as this research does.

When we are dealing with claims that depend on adult retrospective reports of childhood adversity, we are dealing with a literature with seriously deficiencies. This literature grossly overinterprets common endorsement of particular childhood experiences as strong evidence of exposure to horrific conditions. This literature has a strong confirmation bias. Positive findings are highlighted. Negative findings do not get cited much. Serious limitations in methodology and inconsistency and findings generally ignored.

[This condemnation is worthy of a blog post or two itself. But ahead I will provide some documentation.]

The Conversation authors explain the discrepancy between estimates based on administrative data of one in eight children suffering abuse or neglect before age 18 versus much higher estimates from retrospective adult reports on the basis of so much abuse going unreported.

The discrepancy may be because so many cases of childhood abuse, particularly cases of emotional or psychological abuse, are unreported. This specific type of abuse may occur within a family over the course of years without recognition or detection.

This could certainly be true, but let’s see the evidence. A lack of reporting could also indicate a lack of many experiences reaching a threshold prompting reporting. I’m willing to be convinced otherwise, but let’s see the evidence.

The link between emotional abuse and migraines

The Conversation authors provide links only to their own research for their claim:

While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse. Two studies using nationally representative samples of older Americans (the mean ages were 50 and 56 years old, respectively) have found a link.

The first link is to an article that is paywalled except for its abstract. The abstract shows  the study does not involve a nationally representative sample of adults. The study compared patients with tension headaches to patients with migraines, without a no-headache control group. There is thus no opportunity to examine whether persons with migraines recall more emotional abuse than persons who do not suffer headaches.  Any significant associations in a huge sample disappeared after controlling for self-reported depression and anxiety.

My interpretation: There is nothing robust here. Results could be due to crude measurement, confounding of retrospective self-report by current self-report anxious or depressive symptoms. We can’t say much without a no-headache control group.

The second of the authors’ studies is also paywalled, but we can see from the abstract:

We used data from the Adverse Childhood Experiences (ACE) study, which included 17,337 adult members of the Kaiser Health Plan in San Diego, CA who were undergoing a comprehensive preventive medical evaluation. The study assessed 8 ACEs including abuse (emotional, physical, sexual), witnessing domestic violence, growing up with mentally ill, substance abusing, or criminal household members, and parental separation or divorce. Our measure of headaches came from the medical review of systems using the question: “Are you troubled by frequent headaches?” We used the number of ACEs (ACE score) as a measure of cumulative childhood stress and hypothesized a “dose–response” relationship of the ACE score to the prevalence and risk of frequent headaches.

Results — Each of the ACEs was associated with an increased prevalence and risk of frequent headaches. As the ACE score increased the prevalence and risk of frequent headaches increased in a “dose–response” fashion. The risk of frequent headaches increased more than 2-fold (odds ratio 2.1, 95% confidence interval 1.8-2.4) in persons with an ACE score ≥5, compared to persons with and ACE score of 0. The dose–response relationship of the ACE score to frequent headaches was seen for both men and women.

The Conversation authors misrepresent this study. It is about self-reported headaches, not the subgroup of these patients reporting migraines. But in the first of their own studies they just cited, the authors contrast tension headaches with migraine headaches, with no controls.

So the data did not allow examination of the association between adult retrospective reports of childhood emotional abuse and migraines. There is no mention of self-reported depression and anxiety, which wiped out any relationship with childhood adversity in headaches in the first study. I would expect that a survey of ACES would include such self-report. And the ACEs equate either parental divorce and separation (the same common situation likely occur together and so are counted twice) with sexual abuse in calculating an overall score.

The authors make a big deal of the “dose-response” they found. But this dose-response could just represent uncontrolled confounding  – the more ACEs indicates the more confounding, greater likelihood that respondents faced other social, person, economic, and neighborhood deprivations.  The higher the ACE score, the greater likelihood that other background characteristic s are coming into play.

The only other evidence the authors cite is again another one of their papers, available only as a conference abstract. But the abstract states:

Results: About 14.2% (n = 2,061) of the sample reported a migraine diagnosis. Childhood abuse was recalled by 60.5% (n =1,246) of the migraine sample and 49% (n = 6,088) of the non-migraine sample. Childhood abuse increased the chances of a migraine diagnosis by 55% (OR: 1.55; 95% CI 1.35 – 1.77). Of the three types of abuse, emotional abuse had a stronger effect on migraine (OR: 1.52; 95% CI 1.34 – 1.73) when compared to physical and sexual abuse. When controlled for depression and anxiety, the effect of childhood abuse on migraine (OR: 1.32; 95% CI 1.15 – 1.51) attenuated but remained significant. Similarly, the effect of emotional abuse on migraine decreased but remained significant (OR: 1.33; 95% CI 1.16 – 1.52), when controlled for depression and anxiety.

The rates of childhood abuse seem curiously high for both the migraine and non-migraine samples. If you dig a bit on the web for details of the National Longitudinal Study of Adolescent Health, you can find how crude the measurement is.  The broad question assessing emotional abuse covers the full range of normal to abnormal situations without distinguishing among them.

How often did a parent or other adult caregiver say things that really hurt your feelings or made you feel like you were not wanted or loved? How old were you the first time this happened? (Emotional abuse).

An odds ratio of 1.33 is not going to attract much attention from an epidemiologist, particularly when it is obtained from such messy data.

I conclude that the authors have made only a weak case for the following statement: While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse.

Oddly, if we jump ahead to the closing section of The Conversation article, the authors concede:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

But, as we will  see, they make recommendations that assume a strong link has been established.

Why would emotional abuse in childhood lead to migraines in adulthood?

This section throws out a number of trending buzz terms, strings them together in a way that should impress and intimidate consumers, rather than allow them independent evaluation of what is being said.

got everything

The section also comes below a stock blue picture of the brain.  In web searches, the picture  is associated with social media where the brain is superficially brought into  in discussions where neuroscience is  not relevant.

An Australian neuroscientist commented on Facebook:

Deborah on blowing brains

The section starts out:

The fact that the risk goes up in response to increased exposure is what indicates that abuse may cause biological changes that can lead to migraine later in life. While the exact mechanism between migraine and childhood maltreatment is not yet established, research has deepened our understanding of what might be going on in the body and brain.

We could lost in a quagmire trying to figuring out the evidence for the loose associations that are packed into a five paragraph section.  Instead,  I’ll make some observations that can be followed up by interested readers.

The authors acknowledge that no mechanism has been established linking migraines and child maltreatment. The link for this statement takes the reader to the authors own pay walled article that is explicitly labeled “Opinion Statement ”.

The authors ignore a huge literature that acknowledges great heterogeneity among sufferers of migraines, but points to some rather strong evidence for treatments based on particular mechanisms identified among carefully selected patients. For instance, a paper published in The New England Journal of Medicine with well over 1500 citations:

Goadsby PJ, Lipton RB, Ferrari MD. Migraine—current understanding and treatment. New England Journal of Medicine. 2002 Jan 24;346(4):257-70.

Speculations concerning the connections between childhood adversity, migraines and the HPA axis are loose. The Conversation authors their obviousness needs to be better document with evidence.

For instance, if we try to link “childhood adversity” to the HPA axis, we need to consider the lack of specificity of” childhood adversity’ as defined by retrospective endorsement of Adverse Childhood Experiences (ACEs). Suppose we rely on individual checklist items or cumulative scores based on number of endorsements. We can’t be sure that we are dealing with actual rather than assumed exposure to traumatic events or that there be any consistent correlates in current measures derived from the HPA axis.

Any non-biological factor defined so vaguely is not going to be a candidate for mapping into causal processes or biological measurements.

An excellent recent Mind the Brain article by my colleague blogger Shaili Jain interviews Dr. Rachel Yehuda, who had a key role in researching HPA axis in stress. Dr. Yehuda says endocrinologists would cringe at the kind of misrepresentations that are being made in The Conversation article.

A recent systematic review concludes the evidence for specific links between child treatment and inflammatory markers is of limited and poor quality.

Coelho R, Viola TW, Walss‐Bass C, Brietzke E, Grassi‐Oliveira R. Childhood maltreatment and inflammatory markers: a systematic review. Acta Psychiatrica Scandinavica. 2014 Mar 1;129(3):180-92.

The Conversation article misrepresents gross inconsistencies in the evidence of biological correlates representing biomarkers. There are as yet no biomarkers for migraines in the sense of a biological measurement that reliably distinguishes persons with migraines from other patient populations with whom they may be confused. See an excellent funny blog post by Hilda Bastian.

Notice the rhetorical trick in authors of The Conversation article’s assertion that

Migraine is considered to be a hereditary condition. But, except in a small minority of cases, the genes responsible have not been identified.

Genetic denialists like Oliver James  or Richard Bentall commonly phrased questions in this manner to be a matter of hereditary versus non-hereditary. But complex traits like height, intelligence, or migraines involve combinations of variations in a number of genes, not a single gene or even a few genes.. For an example of the kind of insights that sophisticated genetic studies of migraines are yielding see:

Yang Y, Ligthart L, Terwindt GM, Boomsma DI, Rodriguez-Acevedo AJ, Nyholt DR. Genetic epidemiology of migraine and depression. Cephalalgia. 2016 Mar 9:0333102416638520.

The Conversation article ends with some signature nonsense speculation about epigenetics:

However, stress early in life induces alterations in gene expression without altering the DNA sequence. These are called epigenetic changes, and they are long-lasting and may even be passed on to offspring.

Interested readers can find these claims demolished in Epigenetic Ain’t Magic by PZ Myers, a biologist who attempts to rescue an extremely important development concept from its misuse.

Or Carl Zimmer’s Growing Pains for Field of Epigenetics as Some Call for Overhaul.

What does this mean for doctors treating migraine patients?

The Conversation authors startle readers with an acknowledgment that contradicts what they have been saying earlier in their article:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

It is therefore puzzling when they next say:

But because research indicates that there is a strong link between the two, clinicians may want to bear that in mind when evaluating patients.

Cognitive behavior therapy is misrepresented as an established effective treatment for migraines. A recent systematic review and meta-analysis  had to combine migraines with other chronic headaches and order to get ten studies to consider.

The conclusion of this meta-analysis:

Methodology inadequacies in the evidence base make it difficult to draw any meaningful conclusions or to make any recommendations.

The Conversation article notes that the FDA has approved anti-epileptic drugs such as valproate and topiramate for treatment of migraines. However, the article’s claim that the efficacy of these drugs are due to their effects on epigenetics is quite inconsistent with what is said in the larger literature.

Clinicians specializing and treating fibromyalgia or irritable bowel syndrome would be troubled by the authors’ lumping these conditions with migraines and suggesting that a psychiatric consultation is the most appropriate referral for patients who are having difficulty achieving satisfactory management.

See for instance the links contained in my blog post, No, irritable bowel syndrome is not all in your head.

The Conversation article closes with:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

It’s difficult to how this recommendation is relevant to what has preceded it. Routine screening is not evidence-based.

The authors should know that the World Health Organization formerly recommended screening primary care women for intimate abuse but withdrew the recommendation because of a lack of evidence that it improved outcomes for women facing abuse and a lack of evidence that no harm was being done.

I am sharing this blog post with the authors of The Conversation article. I am requesting a correction from The Conversation. Let’s see what they have to say.

Meanwhile, patients seeking health information are advised to avoid The Conversation.

Remission of suicidal ideation by magnetic seizure therapy? Neuro-nonsense in JAMA: Psychiatry

A recent article in JAMA: Psychiatry:

Sun Y, Farzan F, Mulsant BH, Rajji TK, Fitzgerald PB, Barr MS, Downar J, Wong W, Blumberger DM, Daskalakis ZJ. Indicators for remission of suicidal ideation following magnetic seizure therapy in patients with treatment-resistant depression. JAMA Psychiatry. 2016 Mar 16.

Was accompanied by an editorial commentary:

Camprodon JA, Pascual-Leone A. Multimodal Applications of Transcranial Magnetic Stimulation for Circuit-Based Psychiatry. JAMA: Psychiatry. 2016 Mar 16.

Together both the article and commentary can be studied as:

  • An effort by the authors and the journal itself to promote prematurely a treatment for reducing suicide.
  • A pay back to sources of financial support for the authors. Both groups have industry ties that provide them with consulting fees, equipment, grants, and other unspecified rewards. One author has a patent that should increase in value as result of this article and commentary.
  • A bid for successful applications to new grant initiatives with a pledge of allegiance to the NIMH Research Domain Criteria (RDoC).

After considering just how bad the science and reporting:

We have sufficient reason to ask how did this promotional campaign come about? Why was this article accepted by JAMA:Psychiatry? Why was it deemed worthy of comment?

I think a skeptical look at this article would lead to a warning label:

exclamation pointWarning: Results reported in this article are neither robust nor trustworthy, but considerable effort has gone into promoting them as innovative and even breakthrough. Skepticism warranted.

As we will see, the article is seriously flawed as a contribution to neuroscience, identification of biomarkers, treatment development, and suicidology, but we can nonetheless learn a lot from it in terms of how to detect such flaws when they are more subtle. If nothing else, your skepticism will be raised about articles accompanied by commentaries in prestigious journals and you will learn tools for probing such pairs of articles.

 

This article involves intimidating technical details and awe-inspiring figures.

figure 1 picture onefigure 1 picture two

 

 

 

 

 

 

 

 

 

Yet, as in some past blog posts concerning neuroscience and the NIMH RDoC, we will gloss over some technical details, which would be readily interpreted by experts. I would welcome the comments and critiques from experts.

I nonetheless expect readers to agree when they have finished this blog post that I have demonstrated that you don’t have to be an expert to detect neurononsense and crass publishing of articles that fit vested interests.

The larger trial from which these patients is registered as:

ClinicalTrials.gov. Magnetic Seizure Therapy (MST) for Treatment Resistant Depression, Schizophrenia, and Obsessive Compulsive Disorder. NCT01596608.

Because this article is strikingly lacking in crucial details or details in places where we would expect to find them, it will be useful at times to refer to the trial registration.

The title and abstract of the article

As we will soon see, the title, Indicators for remission of suicidal ideation following MST in patients with treatment-resistant depression is misleading. The article has too small sample and too inappropriate a design to establish anything as a reproducible “indicator.”

That the article is going to fail to deliver is already apparent in the abstract.

The abstract states:

 Objective  To identify a biomarker that may serve as an indicator of remission of suicidal ideation following a course of MST by using cortical inhibition measures from interleaved transcranial magnetic stimulation and electroencephalography (TMS-EEG).

Design, Setting, and Participants  Thirty-three patients with TRD were part of an open-label clinical trial of MST treatment. Data from 27 patients (82%) were available for analysis in this study. Baseline TMS-EEG measures were assessed within 1 week before the initiation of MST treatment using the TMS-EEG measures of cortical inhibition (ie, N100 and long-interval cortical inhibition [LICI]) from the left dorsolateral prefrontal cortex and the left motor cortex, with the latter acting as a control site.

Interventions The MST treatments were administered under general anesthesia, and a stimulator coil consisting of 2 individual cone-shaped coils was used.

Main Outcomes and Measures Suicidal ideation was evaluated before initiation and after completion of MST using the Scale for Suicide Ideation (SSI). Measures of cortical inhibition (ie, N100 and LICI) from the left dorsolateral prefrontal cortex were selected. N100 was quantified as the amplitude of the negative peak around 100 milliseconds in the TMS-evoked potential (TEP) after a single TMS pulse. LICI was quantified as the amount of suppression in the double-pulse TEP relative to the single-pulse TEP.

Results  Of the 27 patients included in the analyses, 15 (56%) were women; mean (SD) age of the sample was 46.0 (15.3) years. At baseline, patients had a mean SSI score of 9.0 (6.8), with 8 of 27 patients (30%) having a score of 0. After completion of MST, patients had a mean SSI score of 4.2 (6.3) (pre-post treatment mean difference, 4.8 [6.7]; paired t26 = 3.72; P = .001), and 18 of 27 individuals (67%) had a score of 0 for a remission rate of 53%. The N100 and LICI in the frontal cortex—but not in the motor cortex—were indicators of remission of suicidal ideation with 89% accuracy, 90% sensitivity, and 89% specificity (area under the curve, 0.90; P = .003).

Conclusions and Relevance  These results suggest that cortical inhibition may be used to identify patients with TRD who are most likely to experience remission of suicidal ideation following a course of MST. Stronger inhibitory neurotransmission at baseline may reflect the integrity of transsynaptic networks that are targeted by MST for optimal therapeutic response.

Even viewing the abstract alone, we can see this article is in trouble. It claims to identify a biomarker following a course of magnet seizure therapy (MST) ]. That is an extraordinary claim when a study only started with 33 patients of whom only 27 remain for analysis. Furthermore, at the initial assessment of suicidal ideation, eight of the 27 patients did not have any and so could show no benefit of treatment.

Any results could be substantially changed with any of the four excluded patients being recovered for analysis and any of the 27 included patients being dropped from analyses as an outlier. Statistical controls to control for potential confounds will produce spurious results because of overfit equations ] with even one confound. We also know well that in situation requiring control of possible confounding factors, control of only one is really sufficient and often produces worse results than leaving variables unadjusted.

Identification of any biomarkers is unlikely to be reproducible in larger more representative samples. Any claims of performance characteristics of the biomarkers (accuracy, sensitivity, specificity, area under the curve) are likely to capitalize on sampling and chance in ways that are unlikely to be reproducible.

Nonetheless, the accompanying figures are dazzling, even if not readily interpretable or representative of what would be found in another sample.

Comparison of the article to the trial registration.

According to the trial registration, the study started in February 2012 and the registration was received in May 2012. There were unspecified changes as recently as this month (March 2016), and the study is expected to and final collection of primary outcome data is in December 2016.

Primary outcome

The registration indicates that patients will have been diagnosed with severe major depression, schizophrenia or obsessive compulsive disorder. The primary outcome will depend on diagnosis. For depression it is the Hamilton Rating Scale for Depression.

There is no mention of suicidal ideation as either a primary or secondary outcome.

Secondary outcomes

According to the registration, outcomes include (1) cognitive functioning as measured by episodic memory and non-memory cognitive functions; (2) changes in neuroimaging measures of brain structure and activity derived from fMRI and MRI from baseline to 24th treatment or 12 weeks, whichever comes sooner.

Comparison to the article suggests some important neuroimaging assessment proposed in the registration were compromised. (1) only baseline measures were obtained and without MRI or fMRI; and (2) the article states

Although magnetic resonance imaging (MRI)–guided TMS-EEG is more accurate than non–MRI-guided methods, the added step of obtaining an MRI for every participant would have significantly slowed recruitment for this study owing to the pressing

need to begin treatment in acutely ill patients, many of whom were experiencing suicidal ideation. As such, we proceeded with non–MRI-guided TMS-EEG using EEG-guided methods according to a previously published study.

Treatment

magnetic seizure therapyThe article provides some details of the magnetic seizure treatment:

The MST treatments were administered under general anesthesia using a stimulator machine (MagPro MST; MagVenture) with a twin coil. Methohexital sodium (n = 14), methohexital with remifentanil hydrochloride (n = 18), and ketamine hydrochloride (n = 1) were used as the anesthetic agents. Succinylcholine chloride was used as the neuromuscular blocker. Patients had a mean (SD) seizure duration of 45.1 (21.4) seconds. The twin coil consists of 2 individual cone-shaped coils. Stimulation was delivered over the frontal cortex at the midline position directly over the electrode Fz according to the international 10-20 system.36 Placing the twin coil symmetrically over electrode Fz results in the centers of the 2 coils being over F3 and F4. Based on finite element modeling, this configuration produces a maximum induced electric field between the 2 coils, which is over electrode Fz in this case.37 Patients were treated for 24 sessions or until remission of depressive symptoms based on the 24-item Hamilton Rating Scale for Depression (HRSD) (defined as an HRSD-24 score ≤10 and 60% reduction in symptoms for at least 2 days after the last treatment).38 These remission criteria were standardized from previous ECT depression trials.39,40 Further details of the treatment protocol are available,30 and comprehensive clinical and neurophysiologic trial results will be reported separately.

The article intended to refer the reader to the trial registration for further description of treatment, but the superscript citation in the article is inaccurate. Regardless, given other deviations from registration, readers can’t tell whether any deviations from what was proposed. In in the registration, seizure therapy was described as involving:

100% machine output at between 25 and 100 Hz, with coil directed over frontal brain regions, until adequate seizure achieved. Six treatment sessions, at a frequency of two or three times per week will be administered. If subjects fail to achieve the pre-defined criteria of remission at that point, the dose will be increased to the maximal stimulator output and 3 additional treatment sessions will be provided. This will be repeated a total of 5 times (i.e., maximum treatment number is 24). 24 treatments is typically longer that a conventional ECT treatment course.

One important implication is for this treatment being proposed as resolving suicidal ideation. It takes place over a considerable period of time. Patients who die by suicide notoriously break contact before doing so. It would seem that a required 24 treatments delivered on an outpatient basis would provide ample opportunities for breaks – including demoralization because so many treatments are needed in some cases – and therefore death by suicide

But a protocol that involves continuing treatment until a prespecified reduction in the Hamilton Depression Rating Scale is achieved assures that there will be a drop in suicidal ideation. The interview-based Hamilton depression rating scales and suicidal ideation are highly correlated.

eeg-electroencephalogrphy-250x250There is no randomization or even adequate description of patient accrual in terms of the population from which the patients came. There is no control group and therefore no control for nonspecific factors. The patients are being subject to an elaborate, intrusive ritual In terms of nonspecific effects. The treatment involves patients in an elaborate ritual, starting with electroencephalographic (EEG) assessment [http://www.mayoclinic.org/tests-procedures/eeg/basics/definition/prc-20014093].

The ritual will undoubtedly will undoubtedly have strong nonspecific factors associated with it – instilling a positive expectations and considerable personal attention.

The article’s discussion of results

The discussion opens with some strong claims, unjustified by the modesty of the study and the likelihood that its specific results are not reproducible:

We found that TMS-EEG measures of cortical inhibition (ie, the N100 and LICI) in the frontal cortex, but not in the motor cortex, were strongly correlated with changes in suicidal ideation in patients with TRD who were treated with MST. These findings suggest that patients who benefitted the most from MST demonstrated the greatest cortical inhibition at baseline. More important, when patients were divided into remitters and nonremitters based on their SSI score, our results show that these measures can indicate remission of suicidal ideation from a course of MST with 90% sensitivity and 89% specificity.

Pledge of AllegianceThe discussion contains a Pledge of Allegiance to the research domain criteria approach that is not actually a reflection of the results at hand. Among the many things that we knew before the study was done and that was not shown by the study, is to suicidal ideation is so hopelessly linked to hopelessness, negative affect, and attentional biases, that in such a situation is best seen as a surrogate measure of depression, rather than a marker for risk of suicidal acts or death by suicide.

 

 

Wave that RDoC flag and maybe you will attract money from NIMH.

Our results also support the research domain criteria approach, that is, that suicidal ideation represents a homogeneous symptom construct in TRD that is targeted by MST. Suicidal ideation has been shown to be linked to hopelessness, negative affect, and attentional biases. These maladaptive behaviors all fall under the domain of negative valence systems and are associated with the specific constructs of loss, sustained threat, and frustrative nonreward. Suicidal ideation may represent a better phenotype through which to understand the neurobiologic features of mental illnesses.In this case, variations in GABAergic-mediated inhibition before MST treatment explained much of the variance for improvements in suicidal ideation across individuals with TRD.

Debunking ‘a better phenotype through which to understand the neurobiologic features of mental illnesses.’

  • Suicide is not a disorder or a symptom, but an infrequent, difficult to predict and complex act that varies greatly in nature and circumstances.
  • While some features of a brain or brain functioning may be correlated with eventual death by suicide, most identifications they provide of persons at risk to eventually die by suicide will be false positives.
  • In the United States, access to a firearm is a reliable proximal cause of suicide and is likely to be more so than anything in the brain. However, this basic observation is not consistent with American politics and can lead to grant applications not being funded.

In an important sense,

  • It’s not what’s going on in the brain, but what’s going in the interpersonal context of the brain, in terms of modifiable risk for death by suicide.

The editorial commentary

On the JAMA: Psychiatry website, both the article and the editorial commentary contain sidebar links to each other. Is only in the last two paragraphs of a 14 paragraph commentary that the target article is mentioned. However, the commentary ends with a resounding celebration of the innovation this article represents [emphasis added]:

Sun and colleagues10 report that 2 different EEG measures of cortical inhibition (a negative evoked potential in the EEG that happens approximately 100 milliseconds after a stimulus or event of interest and long-interval cortical inhibition) evoked by TMS to the left dorsolateral prefrontal cortex, but not to the left motor cortex, predicted remission of suicidal ideation with great sensitivity and specificity. This study10 illustrates the potential of multimodal TMS to study physiological properties of relevant circuits in neuropsychiatric populations. Significantly, it also highlights the anatomical specificity of these measures because the predictive value was exclusive to the inhibitory properties of prefrontal circuits but not motor systems.

Multimodal TMS applications allow us to study the physiology of human brain circuitry noninvasively and with causal resolution, expanding previous motor applications to cognitive, behavioral, and affective systems. These innovations can significantly affect psychiatry at multiple levels, by studying disease-relevant circuits to further develop systems for neuroscience models of disease and by developing tools that could be integrated into clinical practice, as they are in clinical neurophysiology clinics, to inform decision making, the differential diagnosis, or treatment planning.

Disclosures of conflicts of interest

The article’s disclosure of conflicts of interest statement is longer than the abstract.

conflict of interest disclosure

The disclosure for the conflicts of interest for the editorial commentary is much shorter but nonetheless impressive:

editorial commentary disclosures

How did this article get into JAMA: Psychiatry with an editorial comment?

Editorial commentaries are often provided by reviewers who either simply check the box on the reviewers’ form indicating their willingness to provide a comment. For reviewers who already have a conflict of interest, this provides an additional one: a non-peer-reviewed paper in which they can promote their interest.

Alternatively, commentators are simply picked by an editor who judges an article to be noteworthy of special recognition. It’s noteworthy that at least one of the associate editors of JAMA: Psychiatry is actively campaigning for a particular direction to suicide research funded by NIMH as seen in an editorial comment of his own that I recently discussed. One of the authors of this paper currently under discussion was until recently a senior member of this associate editor’s department, before departing to become Chair of the Department of Psychiatry at University of Toronto.

Essentially the authors of the paper and the authors of the commentary of providing carefully constructed advertisers for themselves and their agenda. The opportunity for them to do so is because of consistency with the agenda of at least one of the editors, if not the journal itself.

The Committee on Publication Ethics (COPE)   requires that non-peer-reviewed material in ostensibly peer reviewed journals be labeled as such. This requirement is seldom met.

The journal further promoted this article by providing 10 free continuing medical education credits for reading it.

I could go on much longer identifying other flaws in this paper and its editorial commentary. I could raise other objections to the article being published in JAMA:Psychiatry. But out of mercy for the authors, the editor, and my readers, I’ll stop here.

I would welcome comments about other flaws.

Special thanks to Bernard “Barney” Carroll for his helpful comments and encouragement, but all opinions expressed and all factual errors are my own responsibility.

Is risk of Alzheimer’s Disease reduced by taking a more positive attitude toward aging?

Unwarranted claims that “modifiable” negative beliefs cause Alzheimer’s disease lead to blaming persons who develop Alzheimer’s disease for not having been more positive.

Lesson: A source’s impressive credentials are no substitute for independent critical appraisal of what sounds like junk science and is.

More lessons on how to protect yourself from dodgy claims in press releases of prestigious universities promoting their research.

If you judge the credibility of health-related information based on the credentials of the source, this article  is a clear winner:

Levy BR, Ferrucci L, Zonderman AB, Slade MD, Troncoso J, Resnick SM. A Culture–Brain Link: Negative Age Stereotypes Predict Alzheimer’s Disease Biomarkers. Psychology and Aging. Dec 7 , 2015, No Pagination Specified. http://dx.doi.org/10.1037/pag0000062

alzheimers
From INI

As noted in the press release from Yale University, two of the authors are from Yale School of Medicine, another is a neurologist at Johns Hopkins School of Medicine, and the remaining three authors are from the US National Institute on Aging (NIA), including NIA’s Scientific Director.

The press release Negative beliefs about aging predict Alzheimer’s disease in Yale-led study declared:

“Newly published research led by the Yale School of Public Health demonstrates that                   individuals who hold negative beliefs about aging are more likely to have brain changes associated with Alzheimer’s disease.

“The study suggests that combatting negative beliefs about aging, such as elderly people are decrepit, could potentially offer a way to reduce the rapidly rising rate of Alzheimer’s disease, a devastating neurodegenerative disorder that causes dementia in more than 5 million Americans.

The press release posited a novel mechanism:

“We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

A Google search reveals over 40 stories about the study in the media. Provocative titles of the media coverage suggest a children’s game of telephone or Chinese whispers in which distortions accumulate with each retelling.

Negative beliefs about aging tied to Alzheimer’s (Waltonian)

Distain for the elderly could increase your risk of Alzheimer’s (FinancialSpots)

Lack of respect for elderly may be fueling Alzheimer’s epidemic (Telegraph)

Negative thoughts speed up onset of Alzheimer’s disease (Tech Times)

Karma bites back: Hating on the elderly may put you at risk of Alzheimer’s (LA Times)

How you feel about your grandfather may affect your brain health later in life (Men’s Health News)

Young people pessimistic about aging more likely to develop Alzheimer’s later on (Health.com)

Looking forward to old age can save you from Alzheimer’s (Canonplace News)

If you don’t like old people, you are at higher risk of Alzheimer’s, study says (RedOrbit)

If you think elderly people are icky, you’re more likely to get Alzheimer’s (HealthLine)

In defense of the authors of this article as well as journalists, it is likely that editors added the provocative titles without obtaining approval of the authors or even the journalists writing the articles. So, let’s suspend judgment and write off sometimes absurd titles to editors’ need to establish they are offering distinctive coverage, when they are not necessarily doing so. That’s a lesson for the future: if we’re going to criticize media coverage, better focus on the content of the coverage, not the titles.

However, a number of these stories have direct quotes from the study’s first author. Unless the media coverage is misattributing direct quotes to her, she must have been making herself available to the media.

Was the article such an important breakthrough offering new ways in which consumers could take control of their risk of Alzheimer’s by changing beliefs about aging?

No, not at all. In the following analysis, I’ll show that judging the credibility of claims based on the credentials of the sources can be seriously misleading.

What is troubling about this article and its well-organized publicity effort is that information is being disseminated that is misleading and potentially harmful, with the prestige of Yale and NIA attached.

Before we go any further, you can take your own look at a copy of the article in the American Psychological Association journal Psychology and Aging here, the Yale University press release here, and a fascinating post-publication peer review at PubPeer that I initiated as peer 1.

Ask yourself: if you encountered coverage of this article in the media, would you have been skeptical? If so what were the clues?

spoiler aheadcure within The article is yet another example of trusted authorities exploiting entrenched cultural beliefs about the mind-body connection being able to be harnessed in some mysterious way to combat or prevent physical illness. As Ann Harrington details in her wonderful book, The Cure Within, this psychosomatic hypothesis has a long and checkered history, and gets continually reinvented and misapplied.

We see an example of this in claims that attitude can conquer cancer. What’s the harm of such illusions? If people can be led to believe they have such control, they are set up for blame from themselves and from those around them when they fail to fend off and control the outcome of disease by sheer mental power.

The myth of “fighting spirit” overcoming cancer that has survived despite the accumulation of excellent contradictory evidence. Cancer patients are vulnerable to blaming themselves for being blamed by loved ones when they do not “win” the fight against cancer. They are also subject to unfair exhortations to fight harder as their health situation deteriorates.

onion composite
                                                        From the satirical Onion

 What I saw when I skimmed the press release and the article

  • The first alarm went off when I saw that causal claims were being made from a modest sized correlational study. This should set off anyone’s alarms.
  • The press release refers to this as a “first ever” d discussion section of the article refer to this as a “first ever” study. One does not seek nor expect to find robust “first ever” discoveries in such a small data set.
  • The authors do not provide evidence that their key measure of “negative stereotypes” is a valid measure of either stereotyping or likelihood of experiencing stress. They don’t even show it is related to concurrent reports of stress.
  • Like a lot of measures with a negative tone to items, this one is affected by what Paul Meehl calls the crud factor. Whatever is being measured in this study cannot be distinguished from a full range of confounds that not even being assessed in this study.
  • The mechanism by which effects of this self-report measure somehow get manifested in changes in the brain lacks evidence and is highly dubious.
  • There was no presentation of actual data or basic statistics. Instead, there were only multivariate statistics that require at least some access to basic statistics for independent evaluation.
  • The authors resorted to cheap statistical strategies to fool readers with their confirmation bias: reliance on one tailed rather than two-tailed tests of significance; use of a discredited backwards elimination method for choosing control variables; and exploring too many control/covariate variables, given their modest sample size.
  • The analyses that are reported do not accurately depict what is in the data set, nor generalize to other data sets.

The article

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

The authors used their measure of negative aging stereotypes in other studies, but do not provide the usual evidence of convergent  and discriminant validity needed to establish the measure assesses what is intended. Basically, we should expect authors to show that a measure that they have developed is related to existing measures (convergent validity) in ways that one would expect, but not related to existing measures (discriminate validity) with which it should have associations.

Psychology has a long history of researchers claiming that their “new” self-report measures containing negatively toned items assess distinct concepts, despite high correlations with other measures of negative emotion as well as lots of confounds. I poked fun at this unproductive tradition in a presentation, Negative emotions and health: why do we keep stalking bears, when we only find scat in the woods?

The article reported two studies. The first tested whether participants holding more negative age stereotypes would have significantly greater loss of hippocampal volume over time. The study involved 52 individuals selected from a larger cohort enrolled in the brain-neuroimaging program of the Baltimore Longitudinal Study of Aging.

Readers are given none of the basic statistics that would be needed to interpret the complex multivariate analyses. Ideally, we would be given an opportunity to see how the independent variable, negative age stereotypes, is related to other data available on the subjects, and so we could get some sense if we are starting with some basic, meaningful associations.

Instead the authors present the association between negative age stereotyping and hippocampal volume only in the presence of multiple control variables:

Covariates consisted of demographics (i.e., age, sex, and education) and health at time of baseline-age-stereotype assessment, (number of chronic conditions on the basis of medical records; well-being as measured by a subset of the Chicago Attitude Inventory); self-rated health, neuroticism, and cognitive performance, measured by the Benton Visual Retention Test (BVRT; Benton, 1974).

Readers get cannot tell why these variables and not others were chosen. Adding or dropping a few variables could produce radically different results. But there are just too many variables being considered. With only 52 research participants, spurious findings that do not generalize to other samples are highly likely.

I was astonished when the authors announced that they were relying on one-tailed statistical tests. This is widely condemned as unnecessary and misleading.

Basically, every time the authors report a significance level in this article, you need to double the number to get what is obtained with a more conventional two-tailed test. So, if they proudly declare that results are significant p = .046, then the results are actually (non)significant, p= .092. I know, we should not make such a fuss about significance levels, but journals do. We’re being set up to be persuaded the results are significant, when they are not by conventional standards.

So the authors’ accumulating sins against proper statistical techniques and transparent reporting: no presentation of basic associations; reporting one tailed tests; use of multivariate statistics inappropriate for a sample that is so small. Now let’s add another one, in their multivariate regressions, the authors relied on a potentially deceptive backwards elimination:

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.

The authors assembled their candidate control/covariate variables and used a procedure that checks them statistically and drop some from consideration, based on whether they fail to add to the significance of the overall equation. This procedure is condemned because the variables that are retained in the equation capitalize on chance. Particular variables that could be theoretically relevant are eliminated simply because they fail to add anything statistically in the context of the other variables being considered. In the context of other variables, these same discarded variables would have been retained.

The final regression equation had fewer control/covariates then when the authors started. Statistical significance will be calculated on the basis of the small number of variables remaining, not the number that were picked over and so results will artificially appear stronger. Again, potentially quite misleading to the unwary reader.

The authors nonetheless concluded:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had a significantly steeper decline in hippocampal volume

The second study:

examined whether participants holding more negative age stereotypes would have significantly greater accumulation of amyloid plaques and neurofibrillary tangles.

The outcome was a composite-plaques-and-tangles score and the predictor was the same negative age stereotypes measure from the first study. These measurements were obtained from 74 research participants upon death and autopsy. The same covariates were used in stepwise regression with backward elimination. Once again, the statistical test was one tailed.

Results were:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had significantly higher composite-plaques-and-tangles scores, t(1,59) = 1.71 p = .046, d = 0.45, adjusting for age, sex, education, self-rated health, well-being, and number of chronic conditions.

Aha! Now we see why the authors commit themselves to a one tailed test. With a conventional two-tailed test, these results would not be significant. Given a prevailing confirmation bias, aversion to null findings, and obsession with significance levels, this article probably would not have been published without the one tailed test.

The authors’ stirring overall conclusion from the two studies:

By expanding the boundaries of known environmental influences on amyloid plaques, neurofibrillary tangles, and hippocampal volume, our results suggest a new pathway to identifying mechanisms and potential interventions related to Alzheimer’s disease

pubpeerPubPeer discussion of this paper [https://pubpeer.com/publications/16E68DE9879757585EDD8719338DCD ]

Comments accumulated for a couple of days on PubPeer after I posted some concerns about the first study. All of the comments were quite smart, some directly validated points that I been thinking about, but others took the discussion in new directions either statistically or because the commentators knew more about neuroscience.

Using a mechanism available at PubPeer, I sent emails to the first author of the paper, the statistician, and one of the NIA personnel inviting them to make comments also. None have responded so far.

Tom Johnstone, a commentator who exercise the option of identifying himself noted the reliance on inferential statistics in the absence of reporting basic relationships. He also noted that the criterion used to drop covariates was lax. Apparently familiar with neuroscience, he expressed doubts that the results had any clinical significance or relevance to the functioning of the research participants.

Another commentator complained of the small sample size, use of one tailed statistical tests without justification, the “convoluted list of covariates,” and “taboo” strategy for selecting covariates to be retained in the regression equation. This commentator also noted that the authors had examined the effect of outliers, conducting analyses both with and without the inclusion of the most extreme case. While it didn’t affect the overall results, exclusion dramatically change the significance level, highlighting the susceptibility of such a small sample to chance variation or sampling error.

Who gets the blame for misleading claims in this article?

dr-luigi-ferrucciThere’s a lot of blame to go around. By exaggerating the size and significance of any effects, the first author increases the chance of publication and also further funding to pursue what is seen as a “tantalizing” association. But it’s the job of editors and peer reviewers to protect the readership from such exaggerations and maybe to protect the author from herself. They failed, maybe because exaggerated findings are consistent with the journal‘s agenda of increasing citations by publishing newsworthy rather than trustworthy findings. The study statistician, Martin Slade obviously knew that misleading, less than optimal statistics were used, why didn’t he object? Finally, I think the NIA staff, particularly Luigi Ferrucci, the Scientific Director of NIA  should be singled out for the irresponsibility of attaching their names to such misleading claims. Why they do so? Did they not read the manuscript?  I will regularly present instances of NIH staff endorsing dubious claims, such as here. The mind-over-disease, psychosomatic hypothesis, gets a lot of support not warranted by the evidence. Perhaps NIH officials in general see this as a way of attracting research monies from Congress. Regardless, I think NIH officials have the responsibility to see that consumers are not misled by junk science.

This article at least provided the opportunity for an exercise that should raise skepticism and convince consumers at all levels – other researchers, clinicians, policymakers, and those who suffer from Alzheimer’s disease and those who care from them – we just cannot sit back and let trusted sources do our thinking for us.

 

Biomarker Porn: From Bad Science to Press Release to Praise by NIMH Director

Concluding installment of NIMH biomarker porn: Depression, daughters, and telomeres

Pioneer HPA-axis researcher Bernard “Barney” Carroll’s comment left no doubt about what he thought of the Molecular Psychiatry article I discussed in my last issue of Mind the Brain:

Where is the HPA axis dysregulation? It is mainly in the minds0@PubSubMain@NIHMS2@s@0@44595.html of the authors, in service of their desired narrative. Were basal cortisol levels increased? No. Were peak cortisol levels increased? They didn’t say. Was the cortisol increment increased? Only if we accept a p value of 0.042 with no correction for multiple comparisons. Most importantly, was the termination of the stress cortisol response impaired? No, it wasn’t (Table 3). That variable is a feature of allostasis, about which co-author Wolkowitz is well informed. Termination of the stress response is a crucial component of HPA axis regulation (see PubMed #18282566), and it was no different between the two groups. So, where’s the beef? The weakness of this report tells us not only about the authors’ standards but also about the level of editorial tradecraft on display in Molecular Psychiatry. [Hyperlink added]

You also can see my response to Professor Carroll in the comments.

I transferred  another  comment  to the blog from my Facebook wall. It gave me an opportunity to elaborate on why

we shouldn’t depend on small convenience samples to attempt to understand phenomena that must be examined in larger samples followed prospectively.

I explained

There are lots of unanswered questions about the authors’ sampling of adolescents. We don’t know what they are like when their mothers are not depressed. The young girls could also simply be reacting to environmental conditions contributing to their mother’s depression, not to their mother’s depression per se. We don’t know how representative this convenience sample is of other daughters of depressed mothers. Is it unusual or common that daughters of this age are not depressed concurrent with their mothers’ depression? What factors about the daughters, the mothers, or their circumstances determine that the mother and daughter depression does not occur at the same time? What about differences with him him dthe daughters of mothers who are prone to depression, but are not currently depressed?  We need to keep in mind that most biomarkers associated with depression are state dependent, not trait dependent. And these daughters were chosen because they are not depressed…

But with no differences in cortisol response, what are we explaining anyway?

The Molecular Psychiatry article provides an excellent opportunity to learn to spot bad

From  http://www.compoundchem.com/2014/04/02/a-rough-guide-to-spotting-bad-science/
From http://www.compoundchem.com/2014/04/02/a-rough-guide-to-spotting-bad-science/

science. I encourage interested readers to map what is said in that into the chart at the right.

This second installment of my two-part blog examines how the exaggerations and distortions of the article reverberate through a press release and then coverage in NIMH Director Thomas Insel’s personal blog.

The Stanford University press release headline is worthy of the trashy newspapers we find at supermarket checkouts:

Girls under stress age more rapidly, new Stanford study reveals

The press release says things that didn’t appear in the article, but echoes the distorted literature review of the article’s introduction in claiming well-established links between shortened telomeres, frequent infections in chronic disease and death that just are not there.

The girls also had telomeres that were shorter by the equivalent of six years in adults. Telomeres are caps on the ends of chromosomes. Every time a cell divides the telomeres get a little shorter. Telomere length is like a biological clock corresponding to age. Telomeres also shorten as a result of exposure to stress. Scientists have uncovered links in adults between shorter telomeres and premature death, more frequent infections and chronic diseases.

From http://news.stanford.edu/news/2014/october/telomeres-depression-girls-10-28-2014.html
From http://news.stanford.edu/news/2014/october/telomeres-depression-girls-10-28-2014.html

And the claim of “the equivalent of six years” comes from direct quote from obtained from senior author Professor Ian Gotlib.

“It’s the equivalent in adults of six years of biological aging,” Gotlib said, but “it’s not at all clear that that makes them 18, because no one has done this measurement in children.”

Dr. Gotlib  seems confused himself about what he mean by the 10 to 14-year-old girls having aged an additional six years. Does he really think that they are now 18? If so in what way? What could he possibly mean – do they look six years older than age matched controls? That would be really strange if they did.

I hope he lets us know when he figures out what he were saying, but he shouldn’t have given the statement to the Stanford press officer unless he was clear what he meant.

The press release noted that Dr. Gotlib had already moved on to intervention studies designed to prevent telomere shortening these girls.

In other studies, Gotlib and his team are examining the effectiveness of stress reduction techniques for girls. Neurofeedback and attention bias training (redirecting attention toward the positive) seem promising. Other investigators are studying techniques based on mindfulness training.

That’s a move based on speculation, if not outright science-fiction. Neurofeedback has some very preliminary evidence for effectiveness in treating current depression, but I would like to see evidence that it has any benefit for preventing depression in young persons who have never been depressed

neurofeedbackGotlib’s claims play right into popular fantasies about rigging people up with some sort of apparatus that changes their brain. But everything changes the brain, even reading this blog post. I don’t think that reading this blog post has any less evidence for preventing later depression than neurofeedback. Nonetheless, I’m hoping  that my blogging implants a healthy dose of skepticism in readers’ brains so that they are immunized against further confusion from exposure to such press releases. For an intelligent, consumer oriented discussion of neurofeedback, see Christian Jarrett’s

Read this before paying $100s for neurofeedback therapy

Attention bias training is a curious choice. It is almost as trendy as neurofeedback, but would it work?  We have the benefit of a systematic review and recent meta-analysis that suggests a lack of evidence for attention bias training in  treating depression and no evidence for preventing it. If it’s ineffectual in treating depression, how could we possibly expect it to prevent depression? Evidence please!

Let’s speculate about the implications if the authors found the cortisol differences between the daughters of the depressed mothers and daughters of controls that they had hypothesized but did not find. What then could have been done for these young girls? Note that the daughters of depressed mothers were chosen because they were functioning well, not currently depressed themselves. Just because they were different from the control girls would not necessarily indicate that any cortisol variables were in the abnormal range. Cortisol levels are not like blood pressure – we cannot specify a level below which cortisol levels have to be brought down for better health and functioning.

Note also that these daughters were selected on the basis of their mothers being depressed and that could mean the daughters themselves were facing a difficult situation. We can’t make the mother-bashing assumption that their mother’s depression was inflicting stress on them. Maybe any psychobiological stress response that was evident was due to the circumstances that led to the depression of their mother. We don’t know enough to specify what levels of cortisol variables would be optimal and consistent with good coping with the situation – we let even specify what is normal. And we don’t know how the daughters would recover from any abnormalities without formal treatment when their circumstances changed.

Bottom line is that these investigators did not get the results they hypothesized. Even if they had, results would not necessarily to lead to clinical applications.

Nonetheless, the director of NIMH saw fit to single this paper out or maybe he was just picking up on the press release.

my blogThomas Insel’s Personal Blog: Depression, Daughters, and Telomeres.

Thomas Insel’s Director’s Blog starts by acknowledging that there are no genetic or imaging markers predicting risk for depression, but research by Stanford Psychology Professor Ian Gotlib and colleagues in Molecular Psychiatry is “worth watching.”

Insel describes Gotlib’s “longitudinal” research as following depressed mothers’ early adolescent daughters.

The young girls have not yet developed depression, but 60 percent will become depressed by the age of 18.

I can find no basis in the article for Insel’s claim that Gotlib has found 60 per cent of these girls will be depressed by age 18. The estimate seems exaggerated, particularly given the case mix of mothers of these girls. It appears that some or most of the mothers were drawn from the community. We cannot expect severe course and biological correlates of depression that we would expect from a more inpatient sample.

Searching the papers coming out of this lab, I could only find one study involving a 30 month follow-up of 22 daughters of depressed mothers in the same age range as the sample in the Molecular Psychiatry article. That’s hardly a basis for the strong claim of 60% becoming depressed by 18.

Insel embellishes the importance of differences in telomere length. He perpetuates the illusion that we can be confident that differences in telomere length suggest these girls were experiencing accelerated aging and what have high risk for disease when the girls reached middle and late age. Without the backing of data from the paper or the existing literature, Insel zeros in on

Troubling early sign of risk for premature biological aging and possibly age-related chronic diseases, such as cardiovascular disease. Investigating the cause and timing of decreased telomere length—to what extent it may result from abnormalities in stress responses or is genetically influenced, for example—will be important for understanding the relationship between cellular aging, depression, and other medical conditions.

Insel ponders how such young, healthy girls could possibly show signs of aging. According to him the answer is not clear, but it might be tied to the increased stress reactivity these girls show in performing laboratory tasks.

But as Professor Caroll noted, the study just does not much evidence of “increased stress reactivity.”

neurofeedback2jpgNonetheless, Insel indicates that Gotlib’s next step is

Using neurofeedback to help these girls retrain their brain circuits and hopefully their stress responses. It will be a few years before we will know how much this intervention reduces risk for depression, but anything that prevents or slows the telomere shortening may be an early indication of success.

It’s interesting that Insel sidestepped the claim in the press release that Gotlib was trying out a cognitive behavioral intervention to affect stress reactivity. Instead he presents a fanciful notion that neural feedback will somehow retrain these girls’ brain circuits and reduce their stress response throughout their time at home and prevent them getting depressed by their mother’s depression.

Oh, if that were only so: Insel would be vindicated in his requiring for funding that researchers get down to basic mechanisms and simply bypass existing diagnoses with limited reliability, but at least some ties to patients’ verbal reports of why they are seeking treatment. In his world of science fiction, patients, or at least these young girls, which come in to have their brains retrained to forestall the telomere shortening that is threatening them not only with becoming depressed later, but with chronic diseases and middle and late life and early death.

So, let’s retrace what was said in the original Molecular Psychiatry article to what was claimed in the Stanford University press release and what was disseminated in the social media of Dr. Insel’s personal blog. Authors’ spin bad science in a peer-reviewed article. They collaborate with their university’s press relations department by providing even more exaggerated claims. And Dr. Insel’s purpose is served by simply passing them on and social media.

There’s a lot in Dr. Insel’s Personal Blog to disappoint and even outrage

  • Researchers  seeking guidance for funding priorites.
  • Clinicians in the trenches needing to do something now to deal with the symptoms and simple misery that are being presented to them.
  • Consumers looking for guidance from the Director of NIMH as to whether they should be concerned about their daughters and what they should do about it.

A lot of bad science and science fiction is being served to back up false promises about anything likely to occur in our lifetimes, if ever.

promising treatmentTaxpayers need to appreciate where Dr. Insel is taking funding of mental health with research. He will no longer fund grants that will explore different psychotherapeutic strategies for common mental health problems as they are currently understood – you know, diagnoses tied to what patients complain about. Instead he is offering a futuristic vision in which we no longer have to pay for primary care physicians or mental health clinicians spending time talking to patients about the problems in their lives. Rather, patients can bring in a saliva sample to assess the telomere length. They then can be rigged up to a videogame providing a social stress challenge. They will then be given neurofeedback and asked to provide another saliva sample. If the cortisol levels aren’t where they are supposed to be, they will come back and get some more neurofeedback and videogames.

But wait! We don’t even need to wait until people develop problems in their lives. We can start collecting spit samples when they are preteens and head off any problems developing in their life with neural feedback.

Presumably all this could be done by technicians who don’t need to be taught communication skills. And if the technicians are having problems, we can collect spit samples from them and maybe give them some neurofeedback.

Sure, mild to moderate depression in the community is a large and mixed grouping. The diagnostic category major depression loses some of its already limited reliability and validity when applied to this level of severity. But I still have a lot more confidence in this diagnosis than relying on some unproven notions about treating telomere length and cortisol parameters in people who do not currently complain about mental health or their circumstances. And the lamer notion that this can be done without any empathy or understanding.

It’s instructive to compare what Insel says in this blog post to what he recently said in another post.

He acknowledged some of the serious barriers to the development of valid, clinically useful biomarkers:

Patients with mental disorders show many biological abnormalities which distinguish them from normal volunteers; however, few of these have led to tests with clinical utility. Several reasons contribute to this delay: lack of a biological ‘gold standard’ definition of psychiatric illnesses; a profusion of statistically significant, but minimally differentiating, biological findings;‘approximate replications’ of these findings in a way that neither confirms nor refutes them; and a focus on comparing prototypical patients to healthy controls which generates differentiations with limited clinical applicability. Overcoming these hurdles will require a new approach. Rather than seek biomedical tests that can ‘diagnose’ DSM-defined disorders, the field should focus on identifying biologically homogenous subtypes that cut across phenotypic diagnosis—thereby sidestepping the issue of a gold standard.

All but the last sentence could have been part of a negative review of the Molecular Psychiatry article or the grant that provided funding for it. But the last sentence is the kind of nonsense that a director of NIMH can lay on the and research community and expect it to be reflected in their grant applications.

But just what was the theme of this other blog post from Dr. Insel? P-hacking and the crisis concerning results of biomedical research not being consistently reproducible.

The relentless quest for a significant “P” value is only one of the many problems with data analysis that could contribute to the reproducibility problem. Many mistakenly believe that “P” values convey information about the size of the difference between two groups. P values are actually only a way of estimating the likelihood that the difference you observe could have occurred by chance. In science, “significance” usually means a P value of less than 0.05 or 1 in 20, but this does not mean that the difference observed between two groups is functionally important. Perhaps the biggest problem is the tendency for scientists to report data that have been heavily processed rather than showing or explaining the details. This suggests one of the solutions for P-hacking and other problems in data analysis: provide the details, including what comparisons were planned prior to running the experiment.

Maybe because Insel is Director of NIMH, he doesn’t expect anybody to call him on the contradictions in what he is requesting. In the p-hacking blog post, he endorsed a call to action to address the problem of a lot of federal money being wasted on research that can’t lead to improvements in the health and well-being of the population because the research is simply unreliable and depends on “heavily processed” data for which investigators don’t provide the details. Yet in the Depression, Daughters, and Telomeres post he grabs an outrageous example of this being done and tells the research community he wants to see more of it.

 

porn

NIMH Biomarker Porn: Depression, Daughters, and Telomeres Part 1

Does having to cope with their mother’s depression REALLY inflict irreversible damage on daughters’ psychobiology and shorten their lives?

telomerejpg
Telomere

A recent BMJ article revived discussion of responsibility for hyped and distorted coverage of scientific work in the media. The usual suspects, self-promoting researchers, are passed over and their University press releases are implicated instead.

But university press releases are not distributed without authors’ approval.  Exaggerated statements in press releases are often direct quotes from authors. And don’t forget the churnaling journalists and bloggers who uncritically pass on press releases without getting second opinions.  Gary Schwitzer remarked:

Don’t let news-release-copying journalists off the hook so easily. It’s journalism, not stenography.

In this two-part blog post, I’ll document this process of amplification of the distortion of science from article to press release to subsequent coverage. In the first installment, I’ll provide a walkthrough commentary and critique of a flawed small study of telomere length among daughters of depressed women published in the prestigious Nature Publishing Group journal, Molecular Psychiatry. In the second, I will compare the article and press release to media coverage, specifically the personal blog of NIMH Director Thomas Insel.

whackI warn the squeamish that I will whack some bad science and outrageous assumptions with demands for evidence and pelt the study, its press release, and Insel’s interpretation with contradictory evidence.

I’m devoting a two-part blog to this effort. Bad science with misogynist, mother bashing assumptions is being touted by the  Director of NIMH as an example to be followed. When he speaks, others pay attention because he sets funding priorities. Okay, Dr. Insel, we will listen up, but we will do so skeptically.

A paper that shares an author with the Molecular Psychiatry paper was criticized by Daniel Engber for delivering

A mishmash of suspect stats and overbroad conclusions, marshaled to advance a theory that’s both unsupported by the data and somewhat at odds with existing research in the field.

The criticism applies to this paper as well.

But first, we need to understand some things about telomere length…

What is a Telomere?

Telomeres are caps on the ends of every chromosome. They protect the chromosome from losing important genes or sticking to other chromosomes. They become shorter every time the cell divides.

I have assembled some resources in an issue of Science-Based Medicine:

Skeptic’s Guide to Debunking Claims about Telomeres in the Scientific and Pseudoscientific Literature

As I say in that blog, there are many exaggerated and outright pseudoscientific claims about telomere length as a measure of “cellular aging” and therefore how long we’re going to live.

I explain the concepts of biomarker and surrogate endpoint, which are needed to understand the current fuss about telomeres. I show why the evidence is against routinely accepting telomere length as a biomarker or surrogate endpoint for accelerated aging and other health outcomes.

I note

  • A recent article in American Journal of Public Health claimed that drinking 20soda kills ounces of carbonated (but not noncarbonated) sugar-sweetened drinks was associated with shortened telomere length “equivalent to an approximately 4.6 additional years of aging.” So, effects of drinking soda on life expectancy is equivalent to what we know about smoking’s effect.
  • Rubbish. Just ignore the telomere length data and directly compare the effects of drinking 20 ounces soda to the effects of smoking on life expectancy. There is no equivalence. The authors confused differences in what they thought was a biomarker with differences in health outcomes and relied on some dubious statistics. The American Journal of Public Health soda study was appropriately skewered in a wonderful Slate article, which I strongly recommend.
  • Claims are made for telomere length as a marker for effects of chronic stress and risk of chronic disease. Telomere length has a large genetic component and is correlated with age. When appropriate controls are introduced, correlation among telomere length, stress, and health outcomes tend to disappear or get sharply reduced.
  • A 30-year birth cohort study did not find an association between exposure to stress and telomere length.
  • Articles from a small group of investigators claim findings about telomere lengths that do not typically get reproduced in larger, more transparently reported studies by independent groups. This group of investigators tends to have or have had conflicts of interest in marketing of telomere diagnostic services, as well as promotion of herbal products to slow or reverse the shortening of telomere length.
  • Generally speaking, reproducible findings concerning telomere length require large samples with well-defined phenotypes, i.e., individuals having well-defined clinical presentations of particular characteristics, and we can expect associations to be small.

Based on what I have learned about the literature concerning telomere length, I would suggest

  • Beware of small studies claiming strong associations between telomere length and characteristics other than age, race, and gender.
  • Beware of studies claiming differences in telomere length arising in cross-sectional research or in the short term if they are not reproduced in longitudinal, prospective studies.

A walk-through commentary and critique of the actual article

Gotlib, I. H., LeMoult, J., Colich, N. L., Foland-Ross, L. C., Hallmayer, J., Joormann, J., … & Wolkowitz, O. M. (2014). Telomere length and cortisol reactivity in children of depressed mothers. Molecular Psychiatry.

Molecular Psychiatry is a pay-walled journal, but a downloadable version of the article is available here.

Conflict of Interest Statement

The authors report no conflict of interest. However, in the soda article published December 2014, one of the authors of the present paper, Jun Lin disclosed being a shareholder in Telomere Diagnostics, Inc., a telomere measurement company. Links at my previous blog post take you to “Telomeres and Your Health: Get the Facts” at the website of that company. You find claims that herbal products based on traditional Chinese medicine can reduce the shortening of telomeres.

Jun Lin has a record of outrageous claims. For instance, in another article, that normal women whose minds wander may be losing four years of life, based on the association between self-reported mind wandering and telomere length. So, if we pit this claim against what is known about the effects of smoking on life expectancy, women can extend their lives almost as much by better paying attention as from quitting smoking.

Hmm, I don’t know if we have undeclared conflict of interest here, but we certainly have a credibility problem.

The Abstract

Past research shows distorted and exaggerated media portrayals of studies are often already evident in abstracts of journal articles. Authors engage in a lot of cherry picking and spin results to strengthen the case their work is innovative and significant.

The opening sentence of the abstract to this article is a mashup of wild claims about telomere length in depression and risk for physical illnesses. But I will leave commenting until we reach the introduction, where the identical statement appears with elaboration and a single reference to one of the author’s work.

The abstract goes on to state

Both MDD and telomere length have been associated independently with high levels of stress, implicating dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion in this relation.

hpa useWhen I showed this to a pioneer in the study of the HPA axis, he remarked:

If you can find coherence in this from the Abstract you are smarter than I am…The phrase dysregulation of the HPA axis has been used to support more hand waving than substance.

The abstract ends with

This study is the first to demonstrate that children at familial risk of developing MDD are characterized by accelerated biological aging, operationalized as shortened telomere length, before they had experienced an onset of depression; this may predispose them to develop not only MDD but also other age-related medical illnesses. It is critical, therefore, that we attempt to identify and distinguish genetic and environmental mechanisms that contribute to telomere shortening.

This breathless editorializing about the urgency of pursuing this line of research is not tied to the actual methods and results of the study. “Accelerated biological aging” and “predispose to develop… other age-related medical illnesses” is not a summary of the findings of the study, but only dubious assumptions.

Actually, the evidence for telomere length as a biomarker for aging is equivocal and does not meet American Federation of Aging Research criteria.  A large scale prospective study did not find that telomere length predicted onset of diabetes or cardiovascular disease.

And wait to when we examine whether the study had reproducible results concerning either shorter telomeres and depression or telomeres being related to cortisol reactivity.

The introduction

The 6-paragraph introduction packs in a lot of questionable assumptions backed by a highly selective citation of the literature.

A growing body of research demonstrates that individuals diagnosed with major depressive disorder (MDD) are characterized by shortened telomere length, which has been posited to underlie the association between depression and increased rates of medical illness, including cardiovascular disease, diabetes, metabolic syndrome, osteoporosis and dementia (see Wolkowitz et al.1 for a review).

Really? A study co-authored by Wolkowitz and cited later in the introduction actually concluded

telomere shortening does not antedate depression and is not an intrinsic feature. Rather, telomere shortening may progress in proportion to lifetime depression exposure.

“Exposure” = personal experience being depressed. This would seem to undercut the rationale for examining telomere shortening in young girls who have not yet become depressed.

But more importantly, nether the Molecular Psychiatry article nor the Wolkowitz review acknowledge the weakness of evidence for

  • Depression being characterized by shortened telomere length.
  • The association of depression and medical illness in older persons representing a causal role for depression that can be modified by or prevention or treatment of depression in young people.
  • Telomere length observed in the young underlying any association between depression and medical illnesses when they get old.

Wolkowitz’s “review” is a narrative, nonsystematic review. The article assumes at the outset that depression represents “accelerated aging” and offers a highly selective consideration of the available literature.

In neither it nor the Molecular Psychiatry article we told

  • Some large scale studies with well-defined phenotypes fail to find associations between telomeres and depressive disorder or depressive symptoms. One large-scale study co-authored by Wolkowitz found weak associations between depression and telomere length too small to be detected in the present small sample. Any apparent association may well spurious.
  • The American Heart Association does not consider depression as a (causal) risk factor for cardiovascular disease, but as a risk marker because of a lack of the evidence needed to meet formal criteria for causality. Depression after a heart attack predicts another heart attack. However, our JAMA systematic review revealed a lack of evidence that screening cardiac patients for depression and offering treatment reduces their likelihood of having another heart attack or improves their survival. An updated review confirmed our conclusions.
  • The association between recent depressive symptoms and subsequent dementia is evident with very low level of symptoms, suggesting that it reflects residual confounding and reverse causation  of depressive symptoms with other risk factors, including poor health and functioning. I published a commentary in British Medical Journal  that criticized  claim that we should begin intervening for even low symptoms of depression in order to prevent dementia. I suggested that we would be treating a confound and it would be unlikely to make a difference in outcomes.

I could go on. Depression causally linked to diabetes via differences in telomere length? Causing osteoarthritis? You gotta be kidding. I demand quality evidence. The burden of evidence is on anyone who makes such wild claims.

Sure, there is lots of evidence that if people have been depressed in the past, they are more likely to get depressed again when they have a chronic illness. And their episodes of depression will last longer.

In general, there are associations between depression and onset and outcome of chronic illness. But the simple, unadjusted association is typically seen at low levels of symptoms, increases with age and accumulation of other risk factors and other physical co-morbidities. People who are older, already showing signs of illness, or who have poor health-related behaviors tend to get sicker and die. Statistical control for these factors reduces or eliminates the apparent association of depressive symptoms with illness outcomes. So, we are probably not dealing with depression per se.  If you are interested in further discussion of this see my slide presentation, see

Negative emotion and health: why do we keep stalking bears, when we only find scat in the woods?

I explain risk factors (like bears) versus risk markers (like scat) and why shooting scat does not eliminate the health risk posed by bears,.

I doubt few people familiar with the literature believe that associations among telomeres and depression, depression and the onset of chronic illness, and telomeres and chronic illness are such that a case could be made for telomere length in young girls being importantly related to physical disease in their mid and late life. This is science fiction being falsely presented as evidence-based.

The authors of the Molecular Psychiatry paper are similarly unreliable when discussing “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” You would think that they are referring to established biomarkers for risk of depression. Actually, most biological correlates of depression are modest, nonspecific to depression, and state, not trait-related – limited to when people are actually depressed.

MDD and ND [nondepressed] individuals exhibited similar baseline and stress cortisol levels, but MDD patients had much higher cortisol levels during the recovery period than their ND counterparts.

We did not find the expected main effects of maternal depression on children’s cortisol  reactivity.

  • They misrepresent a directly relevant study that examined cortisol secretion in the saliva of adolescents as a predictor of subsequent development of depression.  It actually found no baseline measure of cortisol measures predicted development of depression except cortisol awakening response.

In general, cortisol secretion is more related to stress than to clinical depression. One study concluded

The hypothalamic—pituitary—adrenal axis is sensitive to social stress but does not mediate vulnerability to depression.

depressed girlWhat is most outrageous about the introduction, however, is the specification of the pathway between having a depressed mother and shortened telomere length:

The chronic exposure of these children to this stress as a function of living with mothers who have experienced recurrent episodes of depression could represent a mechanism of accelerated biologic aging, operationalized as having shorter telomere length.

Recognize the argument that is being set up: having to deal with the mothers’ depression is a chronic stressor for the daughters, which sets up irreversible processes before the daughters even become depressed themselves, leading to accelerated aging, chronic illness, and early death. We can ignore all the characteristics, including common social factors, that the daughter share with their mothers, that might be the source of any daughters’ problems.

This article is a dream paper for the lawyers for men seeking custody of their children in a divorce: “Your honor, sole custody for my client is the children’s only hope, if it is not already too late. His wife’s depression is irreversibly damaging the children, causing later sickness and early death. I introduced as evidence of an article by Ian Gotlib that was endorsed by the Director of the National Institute of Mental Health…

Geraldine Downey and I warned about this trap in a classic review, children of depressed parents, cited 2300 times according to Google Scholar and still going strong. We noted that depressed mothers and their children share a lot of uncharted biological, psychological, and environmental factors. But we also found that among the strongest risk factors for maternal depression are marital conflict, other life events generated by the marriage and husband, and a lack of marital support. These same factors could contribute to any problems in the children. Actually, the husband could be a source of child problems. Ignoring these possibilities constitutes a “consistent, if unintentional, ‘mother-bashing’ in the literature.”

The authors have asked readers to buy into a reductionist delusion. They assume some biological factors in depression are so clearly established that they can serve as biomarkers.  The transmission of any risk for depression associated with having a depressed mother is by way of irreversible damage to telomeres. We can forget about any other complex social and psychological processes going on, except that the mothers’ depression is stressing the daughters and we can single out a couple of biological variables to examine this.

Methods

The Methods lacks basic details necessary to evaluate the appropriateness of what was done and the conclusions drawn from any results. Nonetheless, there is good reason to believe that we are dealing with a poorly selected sample of daughters from poorly selected mothers.

We’re not told much about the mothers except that they have experienced recurrent depression during the childhood of the daughters. We have to look to other papers coming out of this research group to discover how these mothers were probably identified. What we see is that they are a mixed group, in part drawn from outpatient settings and in part from advertisements in the community.

Recall that identification of biological factors associated with depression requires well-defined phenotypes. The optimal group to study would be patients with severe depression. We know that depression is highly heterogeneous and that “depressed” people in the community who are not in specialty treatment are likely to just barely meet criteria. We are dealing with milder disorder that is less likely to be characterized by any of the biological features of more severe disorder. Social factors likely play more of a role in their misery. In many countries, medication would not be the first line of treatment.

Depression is a chronic, remitting, recurrent disorder with varying degrees of severity of overall course and in particular episodes. It has its onset in adolescence or early adulthood. By the time women have daughters who are 10 to 14 years old, they are likely to have had multiple episodes. But in a sample selected from the community, these episodes may have been mild and not necessarily treated, nor even noticeable by the daughters. The bottom line is we should not be too impressed with the label “recurrent depression” without better documentation of the length, severity, and associated impairment of functioning.

Presumably the depressed mothers in the study were selected because they were currently depressed. That makes it difficult to separate out enduring factors in the mothers and their social context versus those that are tied to the women currently being depressed. And because we know that most biological factors associated with depression are state dependent, we may be getting a skewed picture of the biology of these women – and their daughters, for that matter – then at other times.

Basically, we are dealing with a poorly selected sample of daughters from a poorly selected sample of mothers with depression. The authors are not telling us crucial details that we need to understand any results they get. Apparently they are not measuring relevant variables and have too a small sample to apply statistical controls anyway.As I said about another small study making claims for a blood test for depression, these authors are

Looking for love biomarkers in all the wrong places.

Recall that I also said that results from small samples like this one often conflict with results from larger, epidemiologic studies with larger samples and better defined phenotypes. I think we can see the reasons why developing here. The small sample consist only of daughters who have a depressed mother, but who have not yet become depressed themselves and have low scores on a child depression checklist. Just how representative is the sample? What proportion of daughters this age of depressed women would meet these criteria? How are they similar or different from daughters who have already become depressed? Do the differences lie in their mothers or in the daughters or both? We can’t address any of these questions, but they are highly relevant. That’s why we need more larger clinical epidemiologic studies and fewer small studies of poorly defined samples. Who knows what selection biases are operating?

Searching the literature for what this lab group was doing in other studies in terms of mother and daughter recruitment, I came across a number of small studies of various psychological and psychobiological characteristics of the daughters. We have no idea whether the samples are overlapping or distinct. We have no idea about how the results of these other modest studies confirm or contradict results of the present one. But integrating their results with the results of the present study could have been a start in better understanding it.

As noted in my post at Science Based Medicine, we get a sense of the methods section of the Molecular Psychiatry article of unreliability in single assessments of telomeres. Read the description of the assay of telomere length in the article to get a sense of the authors having to rely on multiple measurements, as well as the unreliability of any single assessment. Look at the paragraph beginning

To control for interassay variability…

This description reflects the more general problems in the comparability of assessment of telomeres across individuals, samples, and laboratories problems that, that preclude recommending telomere length as a biomarker or surrogate outcome with any precision.

Results and Interpretation

As in the methods, the authors fail to supply basic details of the results and leave us having to trust them. There is a striking lack of simple descriptive statistics and bivariate relations, i.e., simple correlations. But we can see signs of unruly, difficult to tame data and spun statistics. And in the end, there are real doubts that there is any connection in these data between telomeres and cortisol.

The authors report a significant difference in telomere length between the daughters of depressed women versus daughters in the control group. Given how the data had to be preprocessed, I would really like to see a scatter plot and examine the effects of outliers before I came to a firm conclusion. With only 50 daughters of depressed mothers and 40 controls, differences could have arose from the influence of one or two outliers.

We are told that the two groups of young girls did not differ in Tanner scores, i.e., self-reported signs of puberty. If the daughters of depressed women had indeed endured “accelerated aging,” would it be reflected in Tanner scores? The authors and for that matter, Insel, seem to take quite literally this accelerated aging thing.

I think we have another seemingly large difference coming from a small sample that is statistically improbable to yield such a difference, given past findings. I could be convinced by these data of group differences in telomere length, but only if findings were replicated in an independent, adequately sized sample. And I still would not know what to make of them.

The authors fuss about  anticipating a “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” They indicate that the cortisol data was highly skewed and had to be tamed by winsorizing, i.e., substituting arbitrary values for outliers. We are not told for how many subjects this was done or from which group they came. The authors then engaged in some fancy multivariate statistics, “a piecewise linear growth model to fit the quadratic nature of the [winsorized] data.”  We need to keep in mind that multilevel modeling is not a magic wand to transform messy data. Rather, it involves some assumptions that need to be tested and not assumed. We get no evidence of the assumptions being tested and the small sample sizes is such that they could not be reliably tested.

The authors found no differences in baseline cortisol secretion. Moreover, they found no differences in distress recovery for telomere length, group (depressed versus nondepressed mother), or group by telomere interaction. They found no effect for group or group by telomere interaction, but they did find a just significant (p< .042) main effect for telomere length on cortisol reactivity. This would not to seem to offer much support for a dysregulation of the HPA axis or anomalous levels of cortisol secretion associated with group membership (having a depressed versus nondepressed mother). If we are guided by the meta-analysis of depression and cortisol secretion, the authors should have obtained a group difference in recovery, which they didn’t. I really doubt this is reproducible in a larger, independent sample, with transparently reported statistics.

Recognize what we have here: prestigious journals like Molecular Psychiatry have a strong publication bias in requiring statistical significance. Authors therefore must chase and obtain statistical significance. There is miniscule difference from p<.042 and p<.06 – or p<.07, for that matter – particularly in the context of multivariate statistics being applied to skewed and winsorized data. The difference is well within the error of messy measurements. Yet if the authors had obtained p<.06 or p<.07, we probably wouldn’t get to read their story, at least in Molecular Psychiatry.*

Stay tuned for my next installment in which I compare results of this study to the press release and coverage in Insel’s personal blog.  I particularly welcome feedback before then.

*For a discussion of whether “The number of p-values in the psychology literaturethat barely meet the criterion for statistical significance (i.e., that fall just below .05) is unusually large,” see Masicampo and LaLande (2012)  and Lakens (2015).