Study: Switching from antidepressants to mindfulness meditation increases relapse

  • A well-designed recent study found that patients with depression in remission who switch from maintenance antidepressants to mindfulness meditation without continuing medication had an increase in relapses.
  • The study is better designed and more transparently reported than a recent British study, but will get none of the British study’s attention.
  • The well-orchestrated promotion of mindfulness raises issues about the lack of checks and balances between investigators’ vested interest, supposedly independent evaluation, and the making of policy.

The study

Huijbers MJ, Spinhoven P, Spijker J, Ruhé HG, van Schaik DJ, van Oppen P, Nolen WA, Ormel J, Kuyken W, van der Wilt GJ, Blom MB. Discontinuation of antidepressant medication after mindfulness-based cognitive therapy for recurrent depression: randomised controlled non-inferiority trial. The British Journal of Psychiatry. 2016 Feb 18:bjp-p.

The study is currently behind a pay wall and does not appear to have a press release. These two factors will not contribute to it getting the attention it deserves.

But the protocol for the study is available here.

Huijbers MJ, Spijker J, Donders AR, van Schaik DJ, van Oppen P, Ruhé HG, Blom MB, Nolen WA, Ormel J, van der Wilt GJ, Kuyken W. Preventing relapse in recurrent depression using mindfulness-based cognitive therapy, antidepressant medication or the combination: trial design and protocol of the MOMENT study. BMC Psychiatry. 2012 Aug 27;12(1):1.

And the trial registration is here

Mindfulness Based Cognitive Therapy and Antidepressant Medication in Recurrent Depression. ClinicalTrials.gov: NCT00928980

The abstract

Background

Mindfulness-based cognitive therapy (MBCT) and maintenance antidepressant medication (mADM) both reduce the risk of relapse in recurrent depression, but their combination has not been studied.

Aims

To investigate whether MBCT with discontinuation of mADM is non-inferior to MBCT+mADM.

Method

A multicentre randomised controlled non-inferiority trial (ClinicalTrials.gov: NCT00928980). Adults with recurrent depression in remission, using mADM for 6 months or longer (n = 249), were randomly allocated to either discontinue (n = 128) or continue (n = 121) mADM after MBCT. The primary outcome was depressive relapse/recurrence within 15 months. A confidence interval approach with a margin of 25% was used to test non-inferiority. Key secondary outcomes were time to relapse/recurrence and depression severity.

Results

The difference in relapse/recurrence rates exceeded the non-inferiority margin and time to relapse/recurrence was significantly shorter after discontinuation of mADM. There were only minor differences in depression severity.

Conclusions

Our findings suggest an increased risk of relapse/recurrence in patients withdrawing from mADM after MBCT.

Translation?

Meditating_Dog clay___4e7ba9ad6f13e

A comment by Deborah Apthorp suggested that the original title Switching from antidepressants to mindfulness meditation increases relapse was incorrect. Checking it I realized that the abstract provides the article was Confusing, but the study did indded show that mindfulness alone led to more relapses and continued medication plus mindfulness.

Here is what is said in the actual introduction to the article:

The main aim of this multicentre, noninferiority effectiveness trial was to examine whether patients who receive MBCT for recurrent depression in remission could safely withdraw from mADM, i.e. without increased relapse/recurrence risk, compared with the combination of these interventions. Patients were randomly allocated to MBCT followed by discontinuation of mADM or MBCT+mADM. The study had a follow-up of 15 months. Our primary hypothesis was that discontinuing mADM after MBCT would be non-inferior, i.e. would not lead to an unacceptably higher risk of relapse/ recurrence, compared with the combination of MBCT+mADM.

Here is what is said in the discussion:

The findings of this effectiveness study reflect an increased risk of relapse/recurrence for patients withdrawing from mADM after having participated in MBCT for recurrent depression.

So, to be clear, the sequence was that patients were randomized either to MBCT without antidepressant or to MBCT with continuing antidepressants. Patients were then followed up for 15 months. Patients who received MBCT without the antidepressants have significantly more relapses/recurrences In the follow-up period than those who received MBCT with antidepressants.

The study addresses the question about whether patients with remitted depression on maintenance antidepressants who were randomized to receive mindfulness-based cognitive therapy (MBCT) have poorer outcomes than those randomized to remaining on their antidepressants.

The study found that poorer outcomes – more relapses – were experienced by patients switching to MBCT verses those remaining on antidepressants plus MBCT.

Strengths of the study

The patients were carefully assessed with validated semi structured interviews to verify they had recurrent past depression, were in current remission, and were taking their antidepressants. Assessment has an advantage over past studies that depended on less reliable primary-care physicians’ records to ascertain eligibility. There’s ample evidence that primary-care physicians often do not make systematic assessments deciding whether or not to preparation on antidepressants.

The control group. The comparison/control group continued on antidepressants after they were assessed by a psychiatrist who made specific recommendations.

 Power analysis. Calculation of sample size for this study was based on a noninferiority design. That meant that the investigators wanted to establish that within particular limit (25%), whether switching to MBCT produce poor outcomes.

A conventional clinical trial is designed to see if the the null hypothesis can rejected of no differences between intervention and control group. As an noninferiority trial, this study tested the null hypothesis that the intervention, shifting patients to MBCT would not result in an unacceptable rise, set at 25% more relapses and recurrences. Noninferiority trials are explained here.

Change in plans for the study

The protocol for the study originally proposed a more complex design. Patients would be randomized to one of three conditions: (1) continuing antidepressants alone; (2) continuing antidepressants, but with MBCT; or (3) MBCT alone. The problem the investigators encountered was that many patients had a strong preference and did not want to be randomized. So, they conducted two separate randomized trials.

This change in plans was appropriately noted in a modification in the trial registration.

The companion study examined whether adding MBCT to maintenance antidepressants reduce relapses. The study was published first:

Huijbers MJ, Spinhoven P, Spijker J, Ruhé HG, van Schaik DJ, van Oppen P, Nolen WA, Ormel J, Kuyken W, van der Wilt GJ, Blom MB. Adding mindfulness-based cognitive therapy to maintenance antidepressant medication for prevention of relapse/recurrence in major depressive disorder: Randomised controlled trial. Journal of Affective Disorders. 2015 Nov 15;187:54-61.

A copy can be obtained from this depository.

It was a smaller study – 35 patients randomized to MBCT alone and 33 patients randomized to a combination of MBCT and continued antidepressants. There were no differences in relapse/recurrence in 15 months.

An important limitation on generalizability

 The patients were recruited from university-based mental health settings. The minority of patients who move from treatment of depression in primary care to a specially mental health settings proportionately include more with moderate to severe depression and with a more defined history of past depression. In contrast, the patients being treated for depression in primary care include more who were mild to moderate and whose current depression and past history have not been systematically assessed. There is evidence that primary-care physicians do not make diagnoses of depression based on a structured assessment. Many patients deemed depressed and in need of treatment will have milder depression and only meet the vaguer, less validated diagnosis of Depression Not Otherwise Specified.

Declaration of interest

The authors indicated no conflicts of interest to declare for either study.

Added February 29: This may be a true statement for the core Dutch researchers who led in conducted the study. However, it is certainly not true for the British collaborator who may have served as a consultant and got authorship as result. He has extensive conflicts of interest and gains a lot personally and professionally from promotion of mindfulness in the UK. Read on.

The previous British study in The Lancet

Kuyken W, Hayes R, Barrett B, Byng R, Dalgleish T, Kessler D, Lewis G, Watkins E, Brejcha C, Cardy J, Causley A. Effectiveness and cost-effectiveness of mindfulness-based cognitive therapy compared with maintenance antidepressant treatment in the prevention of depressive relapse or recurrence (PREVENT): a randomised controlled trial. The Lancet. 2015 Jul 10;386(9988):63-73.

I provided my extended critique of this study in a previous blog post:

Is mindfulness-based therapy ready for rollout to prevent relapse and recurrence in depression?

The study protocol claimed it was designed as a superiority trial, but the authors did not provide the added sample size needed to demonstrate superiority. And they spun null findings, starting in their abstract:

However, when considered in the context of the totality of randomised controlled data, we found evidence from this trial to support MBCT-TS as an alternative to maintenance antidepressants for prevention of depressive relapse or recurrence at similar costs.

What is wrong here? They are discussing null findings as if they had conducted a noninferiority trial with sufficient power to show that differences of a particular size could be ruled out. Lots of psychotherapy trials are underpowered, but should not be used to declare treatments can be substituted for each other.

Contrasting features of the previous study versus the present one

Spinning of null findings. According to the trial registration, the previous study was designed to show that MBCT was superior to maintenance antidepressant treatment and preventing relapse and recurrence. A superiority trial tests the hypothesis that an intervention is better than a control group by a pre-set margin. For a very cool slideshow comparing superiority to noninferiority trials, see here .

Rather than demonstrating that MBCT was superior to routine care with maintenance antidepressant treatment, The Lancet study failed to find significant differences between the two conditions. In an amazing feat of spin, the authors took to publicizing this has a success that MBCT was equivalent to maintenance antidepressants. Equivalence is a stricter criterion that requires more than null findings – that any differences be within pre-set (registered) margins. Many null findings represent low power to find significant differences, not equivalence.

Patient selection. Patients were recruited from primary care on the basis of records indicating they had been prescribed antidepressants two years ago. There was no ascertainment of whether the patients were currently adhering to the antidepressants or whether they were getting effective monitoring with feedback.

Poorly matched, nonequivalent comparison/control group. The guidelines that patients with recurrent depression should remain on antidepressants for two years when developed based on studies in tertiary care. It’s likely that many of these patients were never systematically assessed for the appropriateness of treatment with antidepressants, follow-up was spotty, and many patients were not even continuing to take their antidepressants with any regularit

So, MBCT was being compared to an ill-defined, unknown condition in which some proportion of patients do not need to be taken antidepressants and were not taking them. This routine care also lack the intensity, positive expectations, attention and support of the MBCT condition. If an advantage for MBCT had been found – and it was not – it might only a matter that there was nothing specific about MBCT, but only the benefits of providing nonspecific conditions that were lacking in routine care.

The unknowns. There was no assessment of whether the patients actually practiced MBCT, and so there was further doubt that anything specific to MBCT was relevant. But then again, in the absence of any differences between groups, we may not have anything to explain.

  • Given we don’t know what proportion of patients were taking an adequate maintenance doses of antidepressants, we don’t know whether anything further treatment was needed for them – Or for what proportion.
  • We don’t know whether it would have been more cost-effective simply to have a depression care manager  recontact patients recontact patients, and determine whether they were still taking their antidepressants and whether they were interested in a supervised tapering.
  • We’re not even given the answer of the extent to which primary care patients provided with an MBCT actually practiced.

A well orchestrated publicity campaign to misrepresent the findings. Rather than offering an independent critical evaluation of The Lancet study, press coverage offered the investigators’ preferred spin. As I noted in a previous blog

The headline of a Guardian column  written by one of the Lancet article’s first author’s colleagues at Oxford misleadingly proclaimed that the study showed

freeman promoAnd that misrepresentation was echoed in the Mental Health Foundation call for mindfulness to be offered through the UK National Health Service –

 

calls for NHS mindfulness

The Mental Health Foundation is offering a 10-session online course  for £60 and is undoubtedly prepared for an expanded market

Declaration of interests

WK [the first author] and AE are co-directors of the Mindfulness Network Community Interest Company and teach nationally and internationally on MBCT. The other authors declare no competing interests.

Like most declarations of conflicts of interest, this one alerts us to something we might be concerned about but does not adequately inform us.

We are not told, for instance, something the authors were likely to know: Soon after all the hoopla about the study, The Oxford Mindfulness Centre, which is directed by the first author, but not mentioned in the declaration of interest publicize a massive effort by the Wellcome Trust to roll out its massive Mindfulness in the Schools project that provides mindfulness training to children, teachers, and parents.

A recent headline in the Times: US & America says it all.

times americakey to big bucks 

 

 

A Confirmation bias in subsequent citing

It is generally understood that much of what we read in the scientific literature is false or exaggerated due to various Questionable Research Practices (QRP) leading to confirmation bias in what is reported in the literature. But there is another kind of confirmation bias associated with the creation of false authority through citation distortion. It’s well-documented that proponents of a particular view selectively cite papers in terms of whether the conclusions support of their position. Not only are positive findings claimed original reports exaggerated as they progress through citations, negative findings receie less attention or are simply lost.

Huijbers et al.transparently reported that switching to MBCT leads to more relapses in patients who have recovered from depression. I confidently predict that these findings will be cited less often than the poorer quality The Lancet study, which was spun to create the appearance that it showed MBCT had equivalent  outcomes to remaining on antidepressants. I also predict that the Huijbers et al MBCT study will often be misrepresented when it is cited.

Added February 29: For whatever reason, perhaps because he served as a consultant, the author of The Lancet study is also an author on this paper, which describes a study conducted entirely in the Netherlands. Note however, when it comes to the British The Lancet study,  this article cites it has replicating past work when it was a null trial. This is an example of creating a false authority by distorted citation in action. I can’t judge whether the Dutch authors simply accepted the the conclusions offered in the abstract and press coverage of The Lancet study, or whether The Lancet author influenced their interpretation of it.

I would be very curious and his outpouring of subsequent papers on MBCT, whether The author of  The Lancet paper cites this paper and whether he cites it accurately. Skeptics, join me in watching.

What do I think is going on it in the study?

I think it is apparent that the authors have selected a group of patients who have remitted from their depression, but who are at risk for relapse and recurrence if they go without treatment. With such chronic, recurring depression, there is evidence that psychotherapy adds little to medication, particularly when patients are showing a clinical response to the antidepressants. However, psychotherapy benefits from antidepressants being added.

But a final point is important – MBCT was never designed as a primary cognitive behavioral therapy for depression. It was intended as a means of patients paying attention to themselves in terms of cues suggesting there are sliding back into depression and taking appropriate action. It’s unfortunate that been oversold as something more than this.

 

Is risk of Alzheimer’s Disease reduced by taking a more positive attitude toward aging?

Unwarranted claims that “modifiable” negative beliefs cause Alzheimer’s disease lead to blaming persons who develop Alzheimer’s disease for not having been more positive.

Lesson: A source’s impressive credentials are no substitute for independent critical appraisal of what sounds like junk science and is.

More lessons on how to protect yourself from dodgy claims in press releases of prestigious universities promoting their research.

If you judge the credibility of health-related information based on the credentials of the source, this article  is a clear winner:

Levy BR, Ferrucci L, Zonderman AB, Slade MD, Troncoso J, Resnick SM. A Culture–Brain Link: Negative Age Stereotypes Predict Alzheimer’s Disease Biomarkers. Psychology and Aging. Dec 7 , 2015, No Pagination Specified. http://dx.doi.org/10.1037/pag0000062

alzheimers
From INI

As noted in the press release from Yale University, two of the authors are from Yale School of Medicine, another is a neurologist at Johns Hopkins School of Medicine, and the remaining three authors are from the US National Institute on Aging (NIA), including NIA’s Scientific Director.

The press release Negative beliefs about aging predict Alzheimer’s disease in Yale-led study declared:

“Newly published research led by the Yale School of Public Health demonstrates that                   individuals who hold negative beliefs about aging are more likely to have brain changes associated with Alzheimer’s disease.

“The study suggests that combatting negative beliefs about aging, such as elderly people are decrepit, could potentially offer a way to reduce the rapidly rising rate of Alzheimer’s disease, a devastating neurodegenerative disorder that causes dementia in more than 5 million Americans.

The press release posited a novel mechanism:

“We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

A Google search reveals over 40 stories about the study in the media. Provocative titles of the media coverage suggest a children’s game of telephone or Chinese whispers in which distortions accumulate with each retelling.

Negative beliefs about aging tied to Alzheimer’s (Waltonian)

Distain for the elderly could increase your risk of Alzheimer’s (FinancialSpots)

Lack of respect for elderly may be fueling Alzheimer’s epidemic (Telegraph)

Negative thoughts speed up onset of Alzheimer’s disease (Tech Times)

Karma bites back: Hating on the elderly may put you at risk of Alzheimer’s (LA Times)

How you feel about your grandfather may affect your brain health later in life (Men’s Health News)

Young people pessimistic about aging more likely to develop Alzheimer’s later on (Health.com)

Looking forward to old age can save you from Alzheimer’s (Canonplace News)

If you don’t like old people, you are at higher risk of Alzheimer’s, study says (RedOrbit)

If you think elderly people are icky, you’re more likely to get Alzheimer’s (HealthLine)

In defense of the authors of this article as well as journalists, it is likely that editors added the provocative titles without obtaining approval of the authors or even the journalists writing the articles. So, let’s suspend judgment and write off sometimes absurd titles to editors’ need to establish they are offering distinctive coverage, when they are not necessarily doing so. That’s a lesson for the future: if we’re going to criticize media coverage, better focus on the content of the coverage, not the titles.

However, a number of these stories have direct quotes from the study’s first author. Unless the media coverage is misattributing direct quotes to her, she must have been making herself available to the media.

Was the article such an important breakthrough offering new ways in which consumers could take control of their risk of Alzheimer’s by changing beliefs about aging?

No, not at all. In the following analysis, I’ll show that judging the credibility of claims based on the credentials of the sources can be seriously misleading.

What is troubling about this article and its well-organized publicity effort is that information is being disseminated that is misleading and potentially harmful, with the prestige of Yale and NIA attached.

Before we go any further, you can take your own look at a copy of the article in the American Psychological Association journal Psychology and Aging here, the Yale University press release here, and a fascinating post-publication peer review at PubPeer that I initiated as peer 1.

Ask yourself: if you encountered coverage of this article in the media, would you have been skeptical? If so what were the clues?

spoiler aheadcure within The article is yet another example of trusted authorities exploiting entrenched cultural beliefs about the mind-body connection being able to be harnessed in some mysterious way to combat or prevent physical illness. As Ann Harrington details in her wonderful book, The Cure Within, this psychosomatic hypothesis has a long and checkered history, and gets continually reinvented and misapplied.

We see an example of this in claims that attitude can conquer cancer. What’s the harm of such illusions? If people can be led to believe they have such control, they are set up for blame from themselves and from those around them when they fail to fend off and control the outcome of disease by sheer mental power.

The myth of “fighting spirit” overcoming cancer that has survived despite the accumulation of excellent contradictory evidence. Cancer patients are vulnerable to blaming themselves for being blamed by loved ones when they do not “win” the fight against cancer. They are also subject to unfair exhortations to fight harder as their health situation deteriorates.

onion composite
                                                        From the satirical Onion

 What I saw when I skimmed the press release and the article

  • The first alarm went off when I saw that causal claims were being made from a modest sized correlational study. This should set off anyone’s alarms.
  • The press release refers to this as a “first ever” d discussion section of the article refer to this as a “first ever” study. One does not seek nor expect to find robust “first ever” discoveries in such a small data set.
  • The authors do not provide evidence that their key measure of “negative stereotypes” is a valid measure of either stereotyping or likelihood of experiencing stress. They don’t even show it is related to concurrent reports of stress.
  • Like a lot of measures with a negative tone to items, this one is affected by what Paul Meehl calls the crud factor. Whatever is being measured in this study cannot be distinguished from a full range of confounds that not even being assessed in this study.
  • The mechanism by which effects of this self-report measure somehow get manifested in changes in the brain lacks evidence and is highly dubious.
  • There was no presentation of actual data or basic statistics. Instead, there were only multivariate statistics that require at least some access to basic statistics for independent evaluation.
  • The authors resorted to cheap statistical strategies to fool readers with their confirmation bias: reliance on one tailed rather than two-tailed tests of significance; use of a discredited backwards elimination method for choosing control variables; and exploring too many control/covariate variables, given their modest sample size.
  • The analyses that are reported do not accurately depict what is in the data set, nor generalize to other data sets.

The article

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

The authors used their measure of negative aging stereotypes in other studies, but do not provide the usual evidence of convergent  and discriminant validity needed to establish the measure assesses what is intended. Basically, we should expect authors to show that a measure that they have developed is related to existing measures (convergent validity) in ways that one would expect, but not related to existing measures (discriminate validity) with which it should have associations.

Psychology has a long history of researchers claiming that their “new” self-report measures containing negatively toned items assess distinct concepts, despite high correlations with other measures of negative emotion as well as lots of confounds. I poked fun at this unproductive tradition in a presentation, Negative emotions and health: why do we keep stalking bears, when we only find scat in the woods?

The article reported two studies. The first tested whether participants holding more negative age stereotypes would have significantly greater loss of hippocampal volume over time. The study involved 52 individuals selected from a larger cohort enrolled in the brain-neuroimaging program of the Baltimore Longitudinal Study of Aging.

Readers are given none of the basic statistics that would be needed to interpret the complex multivariate analyses. Ideally, we would be given an opportunity to see how the independent variable, negative age stereotypes, is related to other data available on the subjects, and so we could get some sense if we are starting with some basic, meaningful associations.

Instead the authors present the association between negative age stereotyping and hippocampal volume only in the presence of multiple control variables:

Covariates consisted of demographics (i.e., age, sex, and education) and health at time of baseline-age-stereotype assessment, (number of chronic conditions on the basis of medical records; well-being as measured by a subset of the Chicago Attitude Inventory); self-rated health, neuroticism, and cognitive performance, measured by the Benton Visual Retention Test (BVRT; Benton, 1974).

Readers get cannot tell why these variables and not others were chosen. Adding or dropping a few variables could produce radically different results. But there are just too many variables being considered. With only 52 research participants, spurious findings that do not generalize to other samples are highly likely.

I was astonished when the authors announced that they were relying on one-tailed statistical tests. This is widely condemned as unnecessary and misleading.

Basically, every time the authors report a significance level in this article, you need to double the number to get what is obtained with a more conventional two-tailed test. So, if they proudly declare that results are significant p = .046, then the results are actually (non)significant, p= .092. I know, we should not make such a fuss about significance levels, but journals do. We’re being set up to be persuaded the results are significant, when they are not by conventional standards.

So the authors’ accumulating sins against proper statistical techniques and transparent reporting: no presentation of basic associations; reporting one tailed tests; use of multivariate statistics inappropriate for a sample that is so small. Now let’s add another one, in their multivariate regressions, the authors relied on a potentially deceptive backwards elimination:

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.

The authors assembled their candidate control/covariate variables and used a procedure that checks them statistically and drop some from consideration, based on whether they fail to add to the significance of the overall equation. This procedure is condemned because the variables that are retained in the equation capitalize on chance. Particular variables that could be theoretically relevant are eliminated simply because they fail to add anything statistically in the context of the other variables being considered. In the context of other variables, these same discarded variables would have been retained.

The final regression equation had fewer control/covariates then when the authors started. Statistical significance will be calculated on the basis of the small number of variables remaining, not the number that were picked over and so results will artificially appear stronger. Again, potentially quite misleading to the unwary reader.

The authors nonetheless concluded:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had a significantly steeper decline in hippocampal volume

The second study:

examined whether participants holding more negative age stereotypes would have significantly greater accumulation of amyloid plaques and neurofibrillary tangles.

The outcome was a composite-plaques-and-tangles score and the predictor was the same negative age stereotypes measure from the first study. These measurements were obtained from 74 research participants upon death and autopsy. The same covariates were used in stepwise regression with backward elimination. Once again, the statistical test was one tailed.

Results were:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had significantly higher composite-plaques-and-tangles scores, t(1,59) = 1.71 p = .046, d = 0.45, adjusting for age, sex, education, self-rated health, well-being, and number of chronic conditions.

Aha! Now we see why the authors commit themselves to a one tailed test. With a conventional two-tailed test, these results would not be significant. Given a prevailing confirmation bias, aversion to null findings, and obsession with significance levels, this article probably would not have been published without the one tailed test.

The authors’ stirring overall conclusion from the two studies:

By expanding the boundaries of known environmental influences on amyloid plaques, neurofibrillary tangles, and hippocampal volume, our results suggest a new pathway to identifying mechanisms and potential interventions related to Alzheimer’s disease

pubpeerPubPeer discussion of this paper [https://pubpeer.com/publications/16E68DE9879757585EDD8719338DCD ]

Comments accumulated for a couple of days on PubPeer after I posted some concerns about the first study. All of the comments were quite smart, some directly validated points that I been thinking about, but others took the discussion in new directions either statistically or because the commentators knew more about neuroscience.

Using a mechanism available at PubPeer, I sent emails to the first author of the paper, the statistician, and one of the NIA personnel inviting them to make comments also. None have responded so far.

Tom Johnstone, a commentator who exercise the option of identifying himself noted the reliance on inferential statistics in the absence of reporting basic relationships. He also noted that the criterion used to drop covariates was lax. Apparently familiar with neuroscience, he expressed doubts that the results had any clinical significance or relevance to the functioning of the research participants.

Another commentator complained of the small sample size, use of one tailed statistical tests without justification, the “convoluted list of covariates,” and “taboo” strategy for selecting covariates to be retained in the regression equation. This commentator also noted that the authors had examined the effect of outliers, conducting analyses both with and without the inclusion of the most extreme case. While it didn’t affect the overall results, exclusion dramatically change the significance level, highlighting the susceptibility of such a small sample to chance variation or sampling error.

Who gets the blame for misleading claims in this article?

dr-luigi-ferrucciThere’s a lot of blame to go around. By exaggerating the size and significance of any effects, the first author increases the chance of publication and also further funding to pursue what is seen as a “tantalizing” association. But it’s the job of editors and peer reviewers to protect the readership from such exaggerations and maybe to protect the author from herself. They failed, maybe because exaggerated findings are consistent with the journal‘s agenda of increasing citations by publishing newsworthy rather than trustworthy findings. The study statistician, Martin Slade obviously knew that misleading, less than optimal statistics were used, why didn’t he object? Finally, I think the NIA staff, particularly Luigi Ferrucci, the Scientific Director of NIA  should be singled out for the irresponsibility of attaching their names to such misleading claims. Why they do so? Did they not read the manuscript?  I will regularly present instances of NIH staff endorsing dubious claims, such as here. The mind-over-disease, psychosomatic hypothesis, gets a lot of support not warranted by the evidence. Perhaps NIH officials in general see this as a way of attracting research monies from Congress. Regardless, I think NIH officials have the responsibility to see that consumers are not misled by junk science.

This article at least provided the opportunity for an exercise that should raise skepticism and convince consumers at all levels – other researchers, clinicians, policymakers, and those who suffer from Alzheimer’s disease and those who care from them – we just cannot sit back and let trusted sources do our thinking for us.

 

Should have seen it coming: Once high-flying Psychological Science article lies in pieces on the ground

Life is too short for wasting time probing every instance of professional organizations promoting bad science when they have an established record of doing just that.

There were lots of indicators that’s what we were dealing with in the Association for Psychological Science (APS) recent campaign for the now discredited and retracted ‘sadness prevents us from seeing blue’ article.

sad blueA quick assessment of the press release should have led us to dismiss the claims being presented and convinced us to move on.

Readers can skip my introductory material by jumping down this blog post to [*} to see my analysis of the APS press release.

Readers can also still access the original press release, which has now disappeared from the web, here. Some may want to read the press release and form their own opinions before proceeding into this blog post.

What, I’ve stopped talking about the PACE trial? Yup, at least at Mind the Brain, for now. But you can go here for the latest in my continued discussion of the PACE trial of CBT for chronic fatigue syndrome, in which I moved from critical observer to activist a while ago.

Before we were so rudely interrupted  by the bad science and bad media coverage of the PACE trial, I was focusing on how readers can learn to make quick assessments of hyped media coverage of dubious scientific studies.

In “Sex and the single amygdala”  I asked:

Can skeptics who are not specialists, but who are science-minded and have some basic skills, learn to quickly screen and detect questionable science in the journals and its media coverage?

The counter argument of course is Chris Mooney telling us “You Have No Business Challenging Scientific Experts”. He cites

“Jenny McCarthy, who once remarked that she began her autism research at the “University of Google.”

But while we are on the topic of autism, how about the counter example of The Lancet’s coverage of the link between vaccines and autism? This nonsense continues to take its toll on American children whose parents – often higher income and more educated than the rest – refused to vaccinate them on the basis of a story that started in The Lancet. Editor Richard Horton had to concede

horton on lancet autism failure

 

 

 

If we accept Chris Mooney‘s position, we are left at the mercy of press releases cranked out by the likes of professional organizations like Association for Psychological Science (APS) that repeatedly demand that we revise our thinking about human nature and behavior, as well as change our behavior if we want to extend our lives and live happier, all on the basis of a single “breakthrough” study. Rarely do APS press releases have any follow-up as to the fate of a study they promoted. One has to hope that PubPeer  or PubMed Commons pick up on the article touted in the press release and see what a jury of post-publication peers decides.

As we have seen in my past Mind the Brain posts, there are constant demands on our attention from press releases generated from professional organizations, university press officers, and even NIH alerting us to supposed breakthroughs in psychological and brain science. Few such breakthroughs hold up over time.

Are there no alternatives?

Are there no alternatives to our simply deferring to the expertise being offered or taking the time to investigate for ourselves claims that are likely to prove exaggerated or simply false?

We should approach press releases from the APS – or from its rival American Psychological Association – using prior probabilities to set our expectations. The Open Science Collaboration: Psychology (OSC) article  in Science presented results of a systematic attempt to replicate 100 findings from prestigious psychological journals, including APS’ s Psychological Science and APA’s Journal of Personality and Social Psychology. Less than half of the findings were replicated. Findings from the APS and APA journals fared worse than the others.

So, our prior probabilities are that declarations of newsworthy, breakthrough findings trumpeted in press releases from psychological organizations are likely to be false or exaggerated – unless we assume that the publicity machines prefer the trustworthy over the exciting and newsworthy in the article they selected to promote.

I will guide readers through a quick assessment of APS press release which I started on this post before getting swept up into the PACE controversy. However, in the intervening time, there have been some extraordinary developments, which I will then briefly discuss. We can use these developments to validate my and your evaluation of the press release available earlier. Surprisingly, there is little overlap between the issues I note in the press release and what concerned post-publication commentators.

*A running commentary based on screening the press release

What once was a link to the“feeling blue and seeing blue”  article now takes one only to

retraction press releasee

Fortunately, the original press release can still be reached here. The original article is preserved here.

My skepticism was already high after I read the opening two paragraphs of the press release

The world might seem a little grayer than usual when we’re down in the dumps and we often talk about “feeling blue” — new research suggests that the associations we make between emotion and color go beyond mere metaphor. The results of two studies indicate that feeling sadness may actually change how we perceive color. Specifically, researchers found that participants who were induced to feel sad were less accurate in identifying colors on the blue-yellow axis than those who were led to feel amused or emotionally neutral.

Our results show that mood and emotion can affect how we see the world around us,” says psychology researcher Christopher Thorstenson of the University of Rochester, first author on the research. “Our work advances the study of perception by showing that sadness specifically impairs basic visual processes that are involved in perceiving color.”

What Anglocentric nonsense. First, blue as a metaphor for sad does not occur across most languages other than English and Serbian. In German, to call someone blue is suggesting the person is drunk. In Russian, you are suggesting that the person is gay. In Arabic, if you say you are having a blue day, it is a bad one. But if you say in Portuguese that “everything is blue”, it suggests everything is fine.

In Indian culture, blue is more associated with happiness than sadness, probably traceable to the blue-blooded Krishna being associated with divine and human love in Hinduism. In Catholicism, the Virgin Mary is often wearing blue and so the color has come to be associated with calmness and truth.

We are off to a bad start. Going to the authors’ description of their first of two studies, we learn:

In one study, the researchers had 127 undergraduate participants watch an emotional film clip and then complete a visual judgment task. The participants were randomly assigned to watch an animated film clip intended to induce sadness or a standup comedy clip intended to induce amusement. The emotional effects of the two clips had been validated in previous studies and the researchers confirmed that they produced the intended emotions for participants in this study.

Oh no! This is not a study of clinical depression, but another study of normal college students “made sad” with a mood induction.

So-called mood induction tasks don’t necessarily change actual mood state, but they do convey to research participants what is expected of them and how they are supposed to act. In one of the earliest studies I ever did, we described a mood induction procedure to subjects without actually having them experience it. We then asked them to respond as if they had received it. Their responses were indistinguishable. We concluded that we could not rule out that what were considered effects of a mood induction task were simply demand characteristics, what research participants perceive as instructions as to how they should behave.

It was fashionable way back then for psychology researchers who were isolated in departments that did not have access to clinically depressed patients to claim that they were nonetheless conducting analog studies of depression. Subjecting students to unsolvable anagram task or uncontrollable loud noises was seen as inducing learned helplessness in them, thereby allowing investigators an analog study of depression. We demonstrated a problem with that idea. If students believed that the next task that they were administered was part of the same experiment, they performed poorly, as if they were in a state of learned helplessness or depression. However, if they believed that the second task was unrelated to the first, they would show no such deficits. Their negative state of helplessness or depression was confined to their performance in what they thought was the same setting in which the induction had occurred. Shortly after our experiments. Marty Seligman wisely stopped doing studies “inducing” learned helplessness in humans, but he continued to make the same claims about the studies he had done.

Analog studies of depression disappeared for awhile, but I guess they have come back into fashion.

But the sad/blue experiment could also be seen as a priming  experiment. The research participants were primed by the film clip and their response to a color naming task was then examined.

It is fascinating that neither the press release nor the article itself ever mentioned the word priming. It was only a few years ago that APS press releases were crowing about priming studies. For instance, a 2011 press release entitled “Life is one big priming experiment…” declared:

One of the most robust ideas to come out of cognitive psychology in recent years is priming. Scientists have shown again and again that they can very subtly cue people’s unconscious minds to think and act certain ways. These cues might be concepts—like cold or fast or elderly—or they might be goals like professional success; either way, these signals shape our behavior, often without any awareness that we are being manipulated.

Whoever wrote that press release should be embarrassed today. In the interim, priming effects have not proven robust. Priming studies that cannot be replicated have figured heavily in the assessment that the psychological literature is untrustworthy. Priming studies also figure heavily in the 56 retracted studies of fraudster psychologist Diederik Stapel. He claims that he turned to inventing data when his experiments failed to demonstrate priming effects that he knew were there. Yet, once he resorted to publishing studies with fabricated data, others claimed to replicate his work.

I made up research, and wrote papers about it. My peers and the journal editors cast a critical eye over it, and it was published. I would often discover, a few months or years later, that another team of researchers, in another city or another country, had done more or less the same experiment, and found the same effects.  My fantasy research had been replicated. What seemed logical was true, once I’d faked it.

So, we have an APS press release reporting a study that assumes that the association between sadness and the color blue is so hardwired and culturally universal that is reflected in basic visual processes. Yet the study does not involve clinical depression, only an analog mood induction and a closer look reveals that once again APS is pushing a priming study. I think it’s time to move on. But let’s read on:

The results cannot be explained by differences in participants’ level of effort, attention, or engagement with the task, as color perception was only impaired on the blue-yellow axis.

“We were surprised by how specific the effect was, that color was only impaired along the blue-yellow axis,” says Thorstenson. “We did not predict this specific finding, although it might give us a clue to the reason for the effect in neurotransmitter functioning.”

The researchers note that previous work has specifically linked color perception on the blue-yellow axis with the neurotransmitter dopamine.

The press release tells us that the finding is very specific, occurring only on the blue-yellow axis, not the red-green axes and thatdifferences between are not found in level of effort, attention, or engagement of the task. The researchers did not expect such a specific finding, they were surprised.

The press release wants to convince us of an exciting story of novelty and breakthrough.  A skeptic sees it differently: This is an isolated finding that is unanticipated by the researchers getting all dressed up. See, we should’ve moved on.

The evidence with which the press release wants to convince us is exciting because it is specific and novel. iThe researchers are celebrating the specificity of their finding, but the blue-yellow axis finding may be the only one statistically significant because it is due to chance or an artifact.

And bringing up unmeasured “neurotransmitter functioning” is pretentious and unwise. I challenge the researchers to show that effects of watching a brief movie clip registers in measurable changes in neurotransmitters. I’m skeptical even whether persons drawn from the community or outpatient samples reliably differ from non-depressed persons in measures of the neurotransmitter dopamine.

This is new work and we need to take time to determine the robustness and generalizability of this phenomenon before making links to application,” he concludes.

Claims in APS press releases are not known for their “robustness and generalizability.” I don’t think this particular claim should prompt an effort at independent replication when scientists have so many more useful things to keep them busy.

Maybe, these investigators should have checked robustness and generalizability before rushing into print. Maybe APS should stop pestering us with findings that surprise researchers and that have not yet been replicated.

A flying machine in pieces on the ground

Sadness impairs color perception was sent soaring high, lifted by an APS press release now removed from the web, but that is still available here. The press release was initially uncritically echoed, usually cut-and-paste or outright churnaled  in over two dozen media mentions.

But, alas, Sadness impairs color perception is now a flying machine in pieces on the ground 

Noticing of the article’s problems seem to have started with some chatter of skeptically-minded individuals on Twitter,  which led to comments at PubPeer where the article was torn to pieces. What unfolded was a wonderful demonstration of crowdsourced post-publication peer review in action. Lesson: PubPeer rocks and can overcome the failures of pre-publication peer review to keep bad stuff out of the literature.

You can follow the thread of comments at PubPeer.

  • An anonymous skeptic started off by pointing out an apparent lack of a significant statistical effect where one was claimed.
  • There was an immediate call for a retraction, but it seemed premature.
  • Soon re-analyses of the data from the paper were being reported, confirming the lack of a significant statistical effect when analyses were done appropriately and reported transparently.
  • The data set for the article was mysteriously changed after it had been uploaded.
  • Doubts were expressed about the integrity of the data – had they been tinkered with?
  • The data disappeared.
  • There was an announcement of a retraction.

The retraction notice  indicated that the researchers were still convinced of the validity of their hypothesis, despite deciding to retract their paper.

We remain confident in the proposition that sadness impairs color perception, but would like to acquire clearer evidence before making this conclusion in a journal the caliber of Psychological Science.

so deflatedThe retraction note also carries a curious Editors note:

Although I believe it is already clear, I would like to add an explicit statement that this retraction is entirely due to honest mistakes on the part of the authors.

Since then, doubts about express whether retraction was a sufficient response or whether something more is needed. Some of the participants in the PubPeer discussion drafted a letter to the editor incorporating their reanalyses and prepared to submit it to Psychological Science. Unfortunately, having succeeded in getting the bad science retracted, these authors reduced the likelihood of theirr reanalysis being accepted by Psychological Science. As of this date, their fascinating account remains unpublished but available on the web.

Postscript

Next time you see an APS or APA press release, what will be your starting probabilities about the trustworthiness of the article being promoted? Do you agree with Chris Mooney that you should simply defer to the expertise of the professional organization?

Why would professional organizations risk embarrassment with these kinds of press releases? Apparently they are worth the risk. Such press releases can echo through the conventional and social media and attract early attention to an article. The game is increasing the impact factor of the journal (JIFs).

Although it is unclear precisely how journal impact factors are calculated, the number reflects the average number of citations an article obtains within two years of publication. However, if press releases  promote “early releases” of articles,  the journal can acquire citations before the clock starts ticking for the two years. APS and APA are in intense competition for prestige of their journals and membership. It matters greatly to them which organization can claim the most prestigious journals, as demonstrated by their JIFs.

So, press releases are important from garnering early attention. Apparently breakthroughs, innovations, and “first ever” mattered more than trustworthiness. In the professional organizations hope we won’t remember the fate of past claims.

 

Promoting a positive psychology self-help book with a Wikipedia entry

to mama with loveThis edition of Mind the Brain continues an odd and fascinating story of an aggressive promotion of a positive psychology self-help book. In this chapter, I tell how the promotion is being aided by the author’s son creating a laudatory Wikipedia entry.

 The story can simply be appreciated as amusing. Or it can be used to raise the consciousness of readers concerning just what is involved in the promotion of sciencey self-help books. The story could raise readers’ level of skepticism about what they might have previously seen as a spontaneous outpouring of enthusiasm for the launch of books.

 The story can also be used to raise questions about the blurry lines between science, self-promotion of persons who traffic in the label of being a scientist, and commercial profitability.

 Is the science behind positive psychology self-help books being shaped and even distorted in the way it appears in the peer-reviewed literature and social media in order to make books and other commercial products like workshops and training for coaches more profitable? Do we need more routine declarations of conflicts of interest in scientific publications of persons writing self-help books?

I wonder how many people have ever thought of inventing a term and having a Wikipedia entry written for it in order to appropriate – claim personal credit for – a cherry picked literature. Having redefined the relevant scientific literature, such a clever person can then select and scrub the literature so that shines brilliantly with positive findings, excluding a considerable amount of negative findings and work done by others? All in the service of promoting a self-help book. Clever or crass?

Staking a claim on a piece of the scientific literature as your own.

eqd_natg_day_4__staking_her_claim_by_samueleallen-d5a0uziAppropriating an area of research under your new label, such as mental contrasting  or grit  allows you to choose to take charge of what studies to include as relevant and what to exclude. Others outside of your laboratory who take your appropriation seriously will miss a potentially larger relevant literature when they attempt a search with standard electronic bibliographic source like Google Scholar or Web of Science using the existing terms that are being replaced by a new one. They are not searching your concept, only the old one.

Naïve PhD students who were inspired to investigate the renamed, appropriated concept will need to cite the author’s work. Critics who are motivated to challenge the confirmatory bias included under the rubric of the new term will be faced with the problem that they did not actually investigate it, only an alternative topic for which they are trying to claim relevance.

Step1: Appropriate the literature, with a novel renaming of a corner of the scientific literature.

Step2: Write a self-help book.

Step3: Get your son to write an entry for Wikipedia promoting the concept. A loving son who will please his mom by citing her for 19 of the 20 citations included in the Wikipedia entry.

Some background.

I was persuaded by an extraordinary publicity campaign to purchase a self-help book, Rethinking Positive Psychology. With stories in prominent media outlets titled like

 The Case Against Positive Thinking

I thought I was buying a long overdue critique of positive psychology. Instead, the book represents a clever repackaging of the familiar wild claims of positive psychology gurus that life transformations await anyone doing their exercises. In the case of Rethinking Positive Psychology, the pitch is made that positive fantasies are not enough, but one only needs a simple and superficial consideration of the obstacles involved in achieving them and what could be done. Rather than any elaborate process of problem definition and consideration of coping options, the book calls for a swift application of a WOOP exercise – (Wish, Outcome, Obstacle, Plan).

stop thinkI quickly saw that WOOP is just a reheating of common old stuff in the self-help and clinical literature, like, for instance, the familiar Stop and Think of problem-solving therapy.

I read the book to the end on a long train ride, but from the outset I found that it was being misrepresented as being evidence-based. Over a series of blog posts at I am exploring the book’s promotion and the bad science in which it is grounded

Some of what is claimed as the science behind this book is not peer peer-reviewed. Readers have no opportunity to go to an outside source and decide for themselves whether claims are valid, bolstered in their confidence that the sources at least survived peer review. Some of what passes for the science behind the book likely predates the conception of the book and any deal with publishers. But some papers that are cited have a distinct quality of being experimercials concocted as part of the creation of a marketing advantage of the book as more sciencey than its competitors. We’ll come back to that in a later blog post.

The author of the book coined the term mental contrasting and the acronym WOOP to selectively appropriate and represent parta of a larger literature concerning implementation of intentions and positive fantasies. Relying on the author’s work alone, along with that of her husband, one would get the impression that they have together developed a whole literature that has produced results uniformly consistent with their theory and supportive of their self-help products.

Checking with Wikipedia

Only late in my investigation did I come across a Wikipedia entry for mental contrasting.

The Wikipedia entry prominently displays an exclamation point with a warning and a plea:

exclaimThis entry contains content that is written like an advertisement. Please help improve it by removing promotional content and inappropriate external links, and by adding encyclopedic content written from a neutral point of view. (April 2015).

The entry stakes out the self-help book author’s claim of the invention:

Mental contrasting (MC) is a problem-solving strategy and motivational tool that leads to selective behavior modification.[1] It was introduced by psychologist Gabriele Oettingen in 2001.[2]

There are 20 references included for the entry. Nineteen are to the work of the author of the self-help book.

How the Wikipedia entry got there was a matter of mystery and speculation until it occurred to me to click on the View History link for the entry.

keep you are my hero momIt revealed that the entry had been created by Anton Gollwitzer, described as a contributor who does not have a Wikipedia user page. He happens to have the same last name as the husband of the author of the self-help book. [*] Anton created his entry just at the time the self-help book was published.

Clicking on the talk link  for him, we immediately comes to a warning:

exclaimSpeedy deletion of “Woop (Scientific Strategy)”

A page you created, Woop (Scientific Strategy), has been tagged for deletion, as it meets one or more of the criteria for speedy deletion; specifically, you removed all content from the page or otherwise requested its deletion.

You are welcome to contribute content which complies with our content policies and any applicable inclusion guidelines. However, please do not simply re-create the page with the same content. You may also wish to read our introduction to editing and guide to writing your first article.

Thank you. — Rrburke (talk) 17:55, 27 October 2014 (UTC),

This was followed by another entry:

Your contributed article, WOOP (scientific strategy)

Which began

Hello, I noticed that you recently created a new page, WOOP (scientific strategy). First, thank you for your contribution; Wikipedia relies solely on the efforts of volunteers such as you. Unfortunately, the page you created covers a topic on which we already have a page – Mental contrasting. Because of the duplication, your article has been tagged for speedy deletion. Please note that this is not a comment on you personally and we hope you will continue helping to improve Wikipedia. If the topic of the article you created is one that interests you, then perhaps you would like to help out at Mental contrasting – you might like to discuss new information at the article’s talk page.

It was then followed by another entry:

Managing a conflict of interest

That began:

Hello, AntonGollwitzer. We welcome your contributions to Wikipedia, but if you are affiliated with some of the people, places or things you have written about on Wikipedia, you may have a conflict of interest or close connection to the subject.

All editors are required to comply with Wikipedia’s neutral point of view content policy. People who are very close to a subject often have a distorted view of it, which may cause them to inadvertently edit in ways that make the article either too flattering or too disparaging. People with a close connection to a subject are not absolutely prohibited from editing about that subject, but they need to be especially careful about ensuring their edits are verified by reliable sources and writing with as little bias as possible.

If you are very close to a subject, here are some ways you can reduce the risk of problems:

Avoid or exercise great caution when editing or creating articles related to you, your organization, or its competitors, as well as projects and products they are involved with.

Avoid linking to the Wikipedia article or website of your organization in other articles (see Wikipedia:Spam).

Exercise great caution so that you do not accidentally breach Wikipedia’s content policies.

This is getting more embarrassing. And then comes another entry:

exclaimNomination of WOOP (scientific strategy) for deletion

A discussion is taking place as to whether the article WOOP (scientific strategy) is suitable for inclusion in Wikipedia according to Wikipedia’s policies and guidelines or whether it should be deleted.

The article will be discussed at Wikipedia: Articles for deletion/WOOP (scientific strategy) until a consensus is reached, and anyone is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the article during the discussion, including to improve the article to address concerns raised in the discussion. However, do not remove the article-for-deletion notice from the top of the article. DGG ( talk ) 04:11, 29 March 2015 (UTC)

I can’t wait to see where all this is going. But is anyone else offended by this misuse of Wikipedia?

NOTE

*I was wrapping up this blog post when I did a Google Scholar search did I should have done earlier. I found that when I entered the names Anton Gollwitzer and Gabriele Oettingen, the first citation was

Gollwitzer, A., Oettingen, G., Kirby, T. A., Duckworth, A. L., & Mayer, D. (2011). Mental contrasting facilitates academic performance in school children. Motivation and Emotion, 35(4), 403-412.

Angela Duckworth provided a wildly enthusiastic endorsement of the book.

I was once asked by educators to identify the single most effective intervention for improving self-control. Every scientist I spoke to referred me to the work summarized here – masterfully in with incompatible insight and warmth. Read this brilliant book and then go out and do what Gabriele Oettingen recommends. No changes the way you think about making your dreams come true.”

Duckworth has her own contract for a self-help book. Similar to Oettingen, she appropriated an existing literature under her term grit. Maybe Oettingen will return the favor of Duckworth’s endorsement and do the same for her. What a wonderful mutual admiration society the positive psychology community is.

 

 

 

Do positive fantasies prevent dieters from losing weight?

Want to WOOP yourself into amazing shape, and fulfill your wildest dreams? Then get a self-help book telling you how through the Association for Psychological Science or the British Psychological Society Division of Health Psychology…Well not really, save your money.

I was like woopIn this issue of Mind the Brain, I discuss my tracking back into the scientific literature claims about positive fantasies and weight loss made for a self-help book promoted as science-based. I locate the study in which the claims supposedly arose. I find no basis for misleading and highly unrealistic claims. Rather than disseminating any science of positive psychology, the marketing effort for the book promotes unrealistic assumptions about what can be accomplished by dieters trying to lose weight. People who take these claims seriously can be demoralized by unrealistic expectations and encouraged to blame themselves when they can’t achieve what is presented as so simple. This promotion holds out the unwarranted promise that if people want to lose weight, they just need to buy this book, and integrate its simple exercises into their everyday life. If they have failed in the past, they can now succeed. Dieters are being exploited and made to feel bad.

Rethinking Positive Psychology  and an associated WOOP app were highlighted in a featured book signing at the annual convention of Association for Psychological Science. If you missed that opportunity, you still get to a site promoting the book through links at advertisements for the British Psychological Society Division of Health Psychology Annual Meeting.

The book/app package is organized around a simplistic idea. From the book’s preface:

Rethinking Positive Thinking presents scientific research suggesting that starry eyed dreaming isn’t all it’s cracked up to be. The book then examines and documents the power of a deceptively simple task: juxtaposing our dreams with the obstacles that prevent their attainment. I delve into why such mental contrasting works, particularly available via subconscious minds, and introduce specific planning process that renders even more effective. In the book’s last two chapters, apply the method of mental contrasting 23 areas of personal change – becoming healthier, nurturing better relationships, and performing better at school and work – I offer advice on how to get started with the method in your own life. In particular, I present a four step procedure based on mental contrasting called WOOP – Wish, Outcome, Obstacle, Plan – that is need to learn, easy to apply to short-and long-term wishes, and is scientifically shown to help you become more energized and directed.

From the outset, the author tries to convince us that positive thinking or, more precisely, positive fantasies by themselves lead to negative outcomes. The research that is cited is almost entirely the author’s own and often consists of contrived laboratory studies with weak findings. A large body of null and contradictory findings from others is shoved aside. This is not about translating scientific findings into practical life strategies, it’s about selling a self-help product as more sciencey than the rest. Buyers beware.

Like me, you probably figure from everyday life experience, that positive fantasies are rather harmless (*) Asked the question, “Are positive fantasies good or bad, helpful or destructive?” we would probably answer “It depends.” By themselves, positive fantasies can have little or no effect, and when they do, effects can be positive or negative.

wOOP i got chicken soupCertainly we don’t want to get caught up in unrealistic fantasies, but who does succumb to them? Maybe you get suckered, if you have been taken in by Chicken Soup for the Soul or a Tony Robbins seminar and think that you can dream yourself to health and wealth. Of course, it helps to be realistic and have a workable plan, but we don’t need a self-help book to tell us that. This book provides very little useful advice about how we should cope with the obstacles we encounter.

Way back when I was in graduate school, there was a lot of excitement about using positive fantasies elicited from people as a way of predicting achievement motivation. Interest in the idea waned when it was shown that such assessments were generally unreliable. Any predictive value disappeared when IQ or productivity was taken into account. Keep that in mind as you read on: why should we think that fantasies elicited in contrived exercises should have much predictive value about things off in the future and subject to lots of other influences? Why would we presume that a fantasy elicited at the beginning of a weight loss program would predict what was actually lost a year later?

But the author is selling a book making a strong case that having positive fantasies are destructive of getting your goals achieved. An impressive publicity campaign hit major media outlets with a mind-numbing repetition of the same message. You could find it in pretty much the same thing being said in the Wall Street Journal, USA Today, the New York Times Sunday Magazine, the New Yorker, The Guardian, the Atlantic, Psychology Today, Huff Post, etc. etc.

le woopThere were also impressive endorsements from celebrity positive psychology gurus. Like the media coverage, these endorsements had a certain sameness suggesting the endorsers  were coached, if not outright provided with a script. Typical of these endorsements, Angela Duckworth, the author’s labmate and sometimes co-author back in Seligman’s lab gushed:

I was once asked by educators to identify the single most effective intervention for improving self-control. Every scientist I spoke to referred me to the work summarized here – masterfully in with incompatible insight and warmth. Read this brilliant book and then go out and do what Gabriele Oettingen recommends. No changes the way you think about making your dreams come true.”

I wanted to track some wild claims in the book and promotion back into the scientific literature and see if they held up. A recurring claim about weight loss triggered my skepticism.

In USA Today: Positive thinking? It’s not enough to reach your goals

ROSY VISIONS CAN BACKFIRE

One of Oettingen’s earliest studies showed that positive thinking alone can backfire when it comes to losing weight. In that study, women in a one-year weight loss program who had the most positive fantasies about future slimness lost an average of 24 pounds less than women with less rosy visions.

In the Wall Street Journal: The Case Against Positive Thinking

In one of Dr. Oettingen’s studies, obese participants who fantasized about successfully losing weight lost 24 pounds less than those who refrained from doing so.

A difference of 24 pounds in a weight loss program is huge. Consult a 2015 meta-analysis  of weight loss in self-help programs. You will see that at six months participants in weight loss programs are typically better off than those in the control condition by only 1.85 kg or 4.78 pounds. At 12 months, any benefit of being in the self-help program has disappeared.

Another meta-analysis evaluated commercial weight-loss programs like Weight Watchers, Jenny Craig, and Nutrisystem. Available evidence was limited and of poor quality, plagued by short follow-up periods of generally less than a year, high dropout rates, and the evaluation of outcome not being blinded to which participants had been assigned to the active weight loss program or a control condition.

Nonetheless, the review suggested that at 12 months, Weight Watchers achieved 2.6% more weight loss then education/control comparison treatments. Jenny Craig had 4.9% greater weight loss. Nutrisystem did 3.4% better than education/control groups. These figures are long way from a difference of 24 pounds.

Harriet Brown on Obesity
Harriet Brown on Obesity

In an exceptionally evidence-based recent Slate article, Harriet Brown argued that it was time to stop telling fat people to become thin. Even when dieters lose weight in the short-term, 97% of them regain everything they lost in three years. The article criticizes studies evaluating weight loss programs because they typically have too short a follow up period.

I was also skeptical too about the disadvantages the author of the self-help book attached to the positive fantasies that dieters have. Most participants in weight loss programs  have unrealistic fantasies about how much weight they will lose. But the fantasies do not strongly predict the modest amounts of weight they actually lose. So, there is no argument for targeting unrealistic expectations and fantasies if the intent is only to improve weight loss.

I started my search for the evidence behind the claims in press releases that women with positive fantasies lost 24 pounds less than women with less positive fantasies. Using the author’s name and “weight loss” in Google Scholar, I immediately came to the article to which I could eventually tracked the claim.

Oettingen, G., & Wadden, T. A. (1991). Expectation, fantasy, and weight loss: Is the impact of positive thinking always positive?. Cognitive Therapy and Research, 15(2), 167-175.

But I couldn’t immediately see its relevance and so I kept looking. I stumbled upon a non-peer-reviewed chapter by the author made available on the Internet .

The chapter cited the same 1991 weight reduction study with Tom Wadden at Penn. But the chapter made a claim that was not obvious in the original paper:

After one year, patients with high expectations lost about 12 kg more than subjects with negative fantasies. After two years, the respective differences were 15 and 12 kg. These patterns of results stayed on change when subjects’ weight loss aspirations, as well as subjective incentives to reach their spy weight loss, were covariates. The findings that supported our assumption that optimistic expectations and positive fantasies a different types of optimistic thinking, and they have differential effects on motivation and action. Apparently, images of getting slim and resisting food temptations he did weight loss. Subjects seem to daydream that weight loss had occurred without their having to make any effort.

jesus saysAnd then I stumbled upon a later peer-reviewed overview article that also reviewed the 1991 Oettingen and Wadden study. It converted the kilograms to pounds and elaborated:

Participants with positive expectations about losing weight (i.e., ‘‘It is likely that I will lose the indicated amount of weight’’) lost on average 26 pounds more than those with negative expectations (i.e., ‘‘It is unlikely that I will lose the indicated amount of weight’’). However, participants with positive fantasies (e.g., those who imagined shining when going out with the friend and easily resisting the temptation of the leftover box of doughnuts in the lunch room) lost on average 24 pounds less than participants with negative fantasies (e.g., those who imagined having disappointed the friend and having a hard time resisting the leftover box of doughnuts in the lunch room). In short, while positive expectations predicted successful weight loss, positive fantasies predicted little success in reaching one’s desired weight.

So I gave the 1991 paper a closer look.

The abstract stated

We investigated the impact of expectation and fantasy on the weight losses of 25 obese women participating in a behavioral weight reduction program. Both expectations of reaching one’s goal weight and spontaneous weight-related fantasies were measured at pretreatment before subjects began 1 year of weekly group-treatment. Consistent with our hypothesis that expectation and fantasy are different in quality, these variables predicted weight change in opposite directions. Optimistic expectations but negative fantasies favored weight loss. Subjects who displayed pessimistic expectations combined with positive fantasies had the poorest treatment outcome. Finally, expectation but not fantasy predicted program attendance. The effects of fantasy are discussed with regard to their potential impact on weight reduction therapy and the need for further studies of dieters’ spontaneous thoughts and images.

From the method section I learned

  • Subjects weighted average of 106.4 kg with a BMI of 39.1. The recruited with advertisements seeking women at least 25 kg overweight.
  • 13 subjects were randomly assigned to a very low calorie diet and 12 were assigned to a balanced-deficit diet.

Such a small randomized trial can’t reliably give effect sizes for anything. At best it can only suggest the feasibility of doing such a trial of a larger scale. Weight related fantasies were not manipulated, but they were measured:

Weight-Related Fantasy. Each subject was asked to vividly imagine herself as the main character in for hypothetical weight-and food-related scenarios. Two stories were designed to elicit fantasies about the subject’s weight loss, worse to others describing cows were tempting foods. Each story led to an unspecified outcome with subjects were asked to complete (in writing) by describing the stream of thoughts and images that occurred to them. Care was taken to make the scenarios open ended in order to elicit a variety of responses. One of the scenarios is described below:

You’re just completed Penn’s weight loss program. Tonight you have made plans to go out with an old friend whom you haven’t seen in about a year. As you wait for your friend to arrive, you imagine

Subjects rated the positivity, negativity, and intensity of their responses to each scenario, as well as their imagined body shape (using seven-point scales; 1 = low, 7 = high). After completing one scenario, they proceeded to the next. Scores were averaged across all four studies to form positivity, negativity, intensity, and body shape scales.

The study also assessed participants’ expectations of reaching their goal weight with three related questions:

(1) “How likely do you think it is that during this weight reduction program you’ll lose the amount of weight (that you have specified)?”; (2) “you feel that you will be successful in the weight loss program?” and (3) “how confident are you that after this program is completed, you will watch the amount of weight you indicated in question 1?” Questions were answered using 7-point scales (1 = low, 7 = high).

Results

The results suggested this exceptionally strict and long-term weight reduction program yielded some significant losses for both groups.

At weeks 17 and 52, weight losses for the very low calorie diet participants were 17.1 kg and 16.1 kg, respectively. Losses for the BDT balanced-deficit diet participants were 11.1 kg and 14.8 kg.

But where does the extraordinary claim about fantasies get support? That is really not clear from anything presented.

Weight-related fantasy predicted weight loss in week 17 (r = -.34, p = .05) but not in 52 weeks ( r = -.31, p = .09). But these numbers demonstrate the problem: with such a small number of participants, something can be significant at .34, but not at the trivially different .31. Beam me up, Scotty, nothing interesting happening here.

The authors then undertook multiple regression analyses that were inappropriate for a number of reasons. [Warning! Briefly getting technical ahead] First, with so few subjects, the equation was overfit with too many independent variables: initial weight, fantasy, and expectation were entered simultaneously in the first step; the interaction between fantasy and expectation in the second. Weight at weeks 17 and 52 were dependent variables of the two analyses, respectively. The second issue is that with expectations and fantasy correlated .45, entering both of these variables simultaneously would lead to misleading results, probably different than if they were entered alone.

In these dubious complex analyses, positive fantasies were not significant at 17 weeks, but were at 52 weeks. If anyone is still taking these analyses seriously, these are contradictory results. But who cares?

The authors then furthered their illusion by graphing the interaction effect, crossing fantasies with expectations. Think of it: they only had 25 patients and they nonetheless graphed participants after creating three groups (low, medium, high) based on fantasy scores and then 3 groups based on expectations(low, medium, high). We can’t take these results seriously.

So, I can find no basis in this study for the claim that women having positive fantasies lost 24 fewer pounds versus those having less positive fantasies. There was no randomization with respect to fantasies and no results justifying such an astonishing claim. We’ve got numbers, but not science here, and no basis for claiming that this self-help book is more sciencey than its competitors. But citing numbers is impressive, particularly when it’s so hard to find out from where they came.

carousel banner
carousel banner

So we have the reality of most people who try to lose weight won’t succeed and certainly they won’t succeed in keeping it off. And probably at some point during that time, they fantasize about what would be like to be slimmer. We should give them a break. Instead, the author gives them reason to feel bad about themselves by suggesting that somehow WOOP could have saved them. Weight loss is that much under their control. And they can still save their dignity by buying this book. And if they don’t succeed in losing such weight, they just haven’t integrated the exercises of the book into the lives enough.

Okay, the author was capitalizing on a 1991 study that she’d probably completed long before she even thought about the book – Dare I say, before she fantasized about the book? – And the idea to try to turn the article into a promotion of the book was not a good one.

welcome to woop woopBut in a forthcoming blog, maybe not my next, I will critique another study that she published when she was working on the book. It serves as an experimercial in promoting the book. The study claims that a drop of only 1 mmHg in systolic blood pressure in women told to have positive fantasies about how they will look in high heels represents a serious sapping of energy that can be generalized to real world situations. Yep, experimercial . I’m going to introduce a new and very useful term to link the packaging together and publication of weak studies when they serve the promotion of commercial products. That’s a lot more of what positive psychology is about then we recognize.

Finally, why are supposedly scientific organizations like British Psychological Society Division of Health Psychology hawking a self-help book with a weak relationship to science that is likely to mislead consumers with its pitch?

Note

(*) British Psychological Society President-elect, Peter Kinderman says he is frightened by his positive fantasies of “winning Nobel prizes, winning Pulitzer prizes, being elected to this and that, being awarded knighthoods,” but he’s an odd bird.

 

Biomarker Porn: From Bad Science to Press Release to Praise by NIMH Director

Concluding installment of NIMH biomarker porn: Depression, daughters, and telomeres

Pioneer HPA-axis researcher Bernard “Barney” Carroll’s comment left no doubt about what he thought of the Molecular Psychiatry article I discussed in my last issue of Mind the Brain:

Where is the HPA axis dysregulation? It is mainly in the minds0@PubSubMain@NIHMS2@s@0@44595.html of the authors, in service of their desired narrative. Were basal cortisol levels increased? No. Were peak cortisol levels increased? They didn’t say. Was the cortisol increment increased? Only if we accept a p value of 0.042 with no correction for multiple comparisons. Most importantly, was the termination of the stress cortisol response impaired? No, it wasn’t (Table 3). That variable is a feature of allostasis, about which co-author Wolkowitz is well informed. Termination of the stress response is a crucial component of HPA axis regulation (see PubMed #18282566), and it was no different between the two groups. So, where’s the beef? The weakness of this report tells us not only about the authors’ standards but also about the level of editorial tradecraft on display in Molecular Psychiatry. [Hyperlink added]

You also can see my response to Professor Carroll in the comments.

I transferred  another  comment  to the blog from my Facebook wall. It gave me an opportunity to elaborate on why

we shouldn’t depend on small convenience samples to attempt to understand phenomena that must be examined in larger samples followed prospectively.

I explained

There are lots of unanswered questions about the authors’ sampling of adolescents. We don’t know what they are like when their mothers are not depressed. The young girls could also simply be reacting to environmental conditions contributing to their mother’s depression, not to their mother’s depression per se. We don’t know how representative this convenience sample is of other daughters of depressed mothers. Is it unusual or common that daughters of this age are not depressed concurrent with their mothers’ depression? What factors about the daughters, the mothers, or their circumstances determine that the mother and daughter depression does not occur at the same time? What about differences with him him dthe daughters of mothers who are prone to depression, but are not currently depressed?  We need to keep in mind that most biomarkers associated with depression are state dependent, not trait dependent. And these daughters were chosen because they are not depressed…

But with no differences in cortisol response, what are we explaining anyway?

The Molecular Psychiatry article provides an excellent opportunity to learn to spot bad

From  http://www.compoundchem.com/2014/04/02/a-rough-guide-to-spotting-bad-science/
From http://www.compoundchem.com/2014/04/02/a-rough-guide-to-spotting-bad-science/

science. I encourage interested readers to map what is said in that into the chart at the right.

This second installment of my two-part blog examines how the exaggerations and distortions of the article reverberate through a press release and then coverage in NIMH Director Thomas Insel’s personal blog.

The Stanford University press release headline is worthy of the trashy newspapers we find at supermarket checkouts:

Girls under stress age more rapidly, new Stanford study reveals

The press release says things that didn’t appear in the article, but echoes the distorted literature review of the article’s introduction in claiming well-established links between shortened telomeres, frequent infections in chronic disease and death that just are not there.

The girls also had telomeres that were shorter by the equivalent of six years in adults. Telomeres are caps on the ends of chromosomes. Every time a cell divides the telomeres get a little shorter. Telomere length is like a biological clock corresponding to age. Telomeres also shorten as a result of exposure to stress. Scientists have uncovered links in adults between shorter telomeres and premature death, more frequent infections and chronic diseases.

From http://news.stanford.edu/news/2014/october/telomeres-depression-girls-10-28-2014.html
From http://news.stanford.edu/news/2014/october/telomeres-depression-girls-10-28-2014.html

And the claim of “the equivalent of six years” comes from direct quote from obtained from senior author Professor Ian Gotlib.

“It’s the equivalent in adults of six years of biological aging,” Gotlib said, but “it’s not at all clear that that makes them 18, because no one has done this measurement in children.”

Dr. Gotlib  seems confused himself about what he mean by the 10 to 14-year-old girls having aged an additional six years. Does he really think that they are now 18? If so in what way? What could he possibly mean – do they look six years older than age matched controls? That would be really strange if they did.

I hope he lets us know when he figures out what he were saying, but he shouldn’t have given the statement to the Stanford press officer unless he was clear what he meant.

The press release noted that Dr. Gotlib had already moved on to intervention studies designed to prevent telomere shortening these girls.

In other studies, Gotlib and his team are examining the effectiveness of stress reduction techniques for girls. Neurofeedback and attention bias training (redirecting attention toward the positive) seem promising. Other investigators are studying techniques based on mindfulness training.

That’s a move based on speculation, if not outright science-fiction. Neurofeedback has some very preliminary evidence for effectiveness in treating current depression, but I would like to see evidence that it has any benefit for preventing depression in young persons who have never been depressed

neurofeedbackGotlib’s claims play right into popular fantasies about rigging people up with some sort of apparatus that changes their brain. But everything changes the brain, even reading this blog post. I don’t think that reading this blog post has any less evidence for preventing later depression than neurofeedback. Nonetheless, I’m hoping  that my blogging implants a healthy dose of skepticism in readers’ brains so that they are immunized against further confusion from exposure to such press releases. For an intelligent, consumer oriented discussion of neurofeedback, see Christian Jarrett’s

Read this before paying $100s for neurofeedback therapy

Attention bias training is a curious choice. It is almost as trendy as neurofeedback, but would it work?  We have the benefit of a systematic review and recent meta-analysis that suggests a lack of evidence for attention bias training in  treating depression and no evidence for preventing it. If it’s ineffectual in treating depression, how could we possibly expect it to prevent depression? Evidence please!

Let’s speculate about the implications if the authors found the cortisol differences between the daughters of the depressed mothers and daughters of controls that they had hypothesized but did not find. What then could have been done for these young girls? Note that the daughters of depressed mothers were chosen because they were functioning well, not currently depressed themselves. Just because they were different from the control girls would not necessarily indicate that any cortisol variables were in the abnormal range. Cortisol levels are not like blood pressure – we cannot specify a level below which cortisol levels have to be brought down for better health and functioning.

Note also that these daughters were selected on the basis of their mothers being depressed and that could mean the daughters themselves were facing a difficult situation. We can’t make the mother-bashing assumption that their mother’s depression was inflicting stress on them. Maybe any psychobiological stress response that was evident was due to the circumstances that led to the depression of their mother. We don’t know enough to specify what levels of cortisol variables would be optimal and consistent with good coping with the situation – we let even specify what is normal. And we don’t know how the daughters would recover from any abnormalities without formal treatment when their circumstances changed.

Bottom line is that these investigators did not get the results they hypothesized. Even if they had, results would not necessarily to lead to clinical applications.

Nonetheless, the director of NIMH saw fit to single this paper out or maybe he was just picking up on the press release.

my blogThomas Insel’s Personal Blog: Depression, Daughters, and Telomeres.

Thomas Insel’s Director’s Blog starts by acknowledging that there are no genetic or imaging markers predicting risk for depression, but research by Stanford Psychology Professor Ian Gotlib and colleagues in Molecular Psychiatry is “worth watching.”

Insel describes Gotlib’s “longitudinal” research as following depressed mothers’ early adolescent daughters.

The young girls have not yet developed depression, but 60 percent will become depressed by the age of 18.

I can find no basis in the article for Insel’s claim that Gotlib has found 60 per cent of these girls will be depressed by age 18. The estimate seems exaggerated, particularly given the case mix of mothers of these girls. It appears that some or most of the mothers were drawn from the community. We cannot expect severe course and biological correlates of depression that we would expect from a more inpatient sample.

Searching the papers coming out of this lab, I could only find one study involving a 30 month follow-up of 22 daughters of depressed mothers in the same age range as the sample in the Molecular Psychiatry article. That’s hardly a basis for the strong claim of 60% becoming depressed by 18.

Insel embellishes the importance of differences in telomere length. He perpetuates the illusion that we can be confident that differences in telomere length suggest these girls were experiencing accelerated aging and what have high risk for disease when the girls reached middle and late age. Without the backing of data from the paper or the existing literature, Insel zeros in on

Troubling early sign of risk for premature biological aging and possibly age-related chronic diseases, such as cardiovascular disease. Investigating the cause and timing of decreased telomere length—to what extent it may result from abnormalities in stress responses or is genetically influenced, for example—will be important for understanding the relationship between cellular aging, depression, and other medical conditions.

Insel ponders how such young, healthy girls could possibly show signs of aging. According to him the answer is not clear, but it might be tied to the increased stress reactivity these girls show in performing laboratory tasks.

But as Professor Caroll noted, the study just does not much evidence of “increased stress reactivity.”

neurofeedback2jpgNonetheless, Insel indicates that Gotlib’s next step is

Using neurofeedback to help these girls retrain their brain circuits and hopefully their stress responses. It will be a few years before we will know how much this intervention reduces risk for depression, but anything that prevents or slows the telomere shortening may be an early indication of success.

It’s interesting that Insel sidestepped the claim in the press release that Gotlib was trying out a cognitive behavioral intervention to affect stress reactivity. Instead he presents a fanciful notion that neural feedback will somehow retrain these girls’ brain circuits and reduce their stress response throughout their time at home and prevent them getting depressed by their mother’s depression.

Oh, if that were only so: Insel would be vindicated in his requiring for funding that researchers get down to basic mechanisms and simply bypass existing diagnoses with limited reliability, but at least some ties to patients’ verbal reports of why they are seeking treatment. In his world of science fiction, patients, or at least these young girls, which come in to have their brains retrained to forestall the telomere shortening that is threatening them not only with becoming depressed later, but with chronic diseases and middle and late life and early death.

So, let’s retrace what was said in the original Molecular Psychiatry article to what was claimed in the Stanford University press release and what was disseminated in the social media of Dr. Insel’s personal blog. Authors’ spin bad science in a peer-reviewed article. They collaborate with their university’s press relations department by providing even more exaggerated claims. And Dr. Insel’s purpose is served by simply passing them on and social media.

There’s a lot in Dr. Insel’s Personal Blog to disappoint and even outrage

  • Researchers  seeking guidance for funding priorites.
  • Clinicians in the trenches needing to do something now to deal with the symptoms and simple misery that are being presented to them.
  • Consumers looking for guidance from the Director of NIMH as to whether they should be concerned about their daughters and what they should do about it.

A lot of bad science and science fiction is being served to back up false promises about anything likely to occur in our lifetimes, if ever.

promising treatmentTaxpayers need to appreciate where Dr. Insel is taking funding of mental health with research. He will no longer fund grants that will explore different psychotherapeutic strategies for common mental health problems as they are currently understood – you know, diagnoses tied to what patients complain about. Instead he is offering a futuristic vision in which we no longer have to pay for primary care physicians or mental health clinicians spending time talking to patients about the problems in their lives. Rather, patients can bring in a saliva sample to assess the telomere length. They then can be rigged up to a videogame providing a social stress challenge. They will then be given neurofeedback and asked to provide another saliva sample. If the cortisol levels aren’t where they are supposed to be, they will come back and get some more neurofeedback and videogames.

But wait! We don’t even need to wait until people develop problems in their lives. We can start collecting spit samples when they are preteens and head off any problems developing in their life with neural feedback.

Presumably all this could be done by technicians who don’t need to be taught communication skills. And if the technicians are having problems, we can collect spit samples from them and maybe give them some neurofeedback.

Sure, mild to moderate depression in the community is a large and mixed grouping. The diagnostic category major depression loses some of its already limited reliability and validity when applied to this level of severity. But I still have a lot more confidence in this diagnosis than relying on some unproven notions about treating telomere length and cortisol parameters in people who do not currently complain about mental health or their circumstances. And the lamer notion that this can be done without any empathy or understanding.

It’s instructive to compare what Insel says in this blog post to what he recently said in another post.

He acknowledged some of the serious barriers to the development of valid, clinically useful biomarkers:

Patients with mental disorders show many biological abnormalities which distinguish them from normal volunteers; however, few of these have led to tests with clinical utility. Several reasons contribute to this delay: lack of a biological ‘gold standard’ definition of psychiatric illnesses; a profusion of statistically significant, but minimally differentiating, biological findings;‘approximate replications’ of these findings in a way that neither confirms nor refutes them; and a focus on comparing prototypical patients to healthy controls which generates differentiations with limited clinical applicability. Overcoming these hurdles will require a new approach. Rather than seek biomedical tests that can ‘diagnose’ DSM-defined disorders, the field should focus on identifying biologically homogenous subtypes that cut across phenotypic diagnosis—thereby sidestepping the issue of a gold standard.

All but the last sentence could have been part of a negative review of the Molecular Psychiatry article or the grant that provided funding for it. But the last sentence is the kind of nonsense that a director of NIMH can lay on the and research community and expect it to be reflected in their grant applications.

But just what was the theme of this other blog post from Dr. Insel? P-hacking and the crisis concerning results of biomedical research not being consistently reproducible.

The relentless quest for a significant “P” value is only one of the many problems with data analysis that could contribute to the reproducibility problem. Many mistakenly believe that “P” values convey information about the size of the difference between two groups. P values are actually only a way of estimating the likelihood that the difference you observe could have occurred by chance. In science, “significance” usually means a P value of less than 0.05 or 1 in 20, but this does not mean that the difference observed between two groups is functionally important. Perhaps the biggest problem is the tendency for scientists to report data that have been heavily processed rather than showing or explaining the details. This suggests one of the solutions for P-hacking and other problems in data analysis: provide the details, including what comparisons were planned prior to running the experiment.

Maybe because Insel is Director of NIMH, he doesn’t expect anybody to call him on the contradictions in what he is requesting. In the p-hacking blog post, he endorsed a call to action to address the problem of a lot of federal money being wasted on research that can’t lead to improvements in the health and well-being of the population because the research is simply unreliable and depends on “heavily processed” data for which investigators don’t provide the details. Yet in the Depression, Daughters, and Telomeres post he grabs an outrageous example of this being done and tells the research community he wants to see more of it.

 

porn

NIMH Biomarker Porn: Depression, Daughters, and Telomeres Part 1

Does having to cope with their mother’s depression REALLY inflict irreversible damage on daughters’ psychobiology and shorten their lives?

telomerejpg
Telomere

A recent BMJ article revived discussion of responsibility for hyped and distorted coverage of scientific work in the media. The usual suspects, self-promoting researchers, are passed over and their University press releases are implicated instead.

But university press releases are not distributed without authors’ approval.  Exaggerated statements in press releases are often direct quotes from authors. And don’t forget the churnaling journalists and bloggers who uncritically pass on press releases without getting second opinions.  Gary Schwitzer remarked:

Don’t let news-release-copying journalists off the hook so easily. It’s journalism, not stenography.

In this two-part blog post, I’ll document this process of amplification of the distortion of science from article to press release to subsequent coverage. In the first installment, I’ll provide a walkthrough commentary and critique of a flawed small study of telomere length among daughters of depressed women published in the prestigious Nature Publishing Group journal, Molecular Psychiatry. In the second, I will compare the article and press release to media coverage, specifically the personal blog of NIMH Director Thomas Insel.

whackI warn the squeamish that I will whack some bad science and outrageous assumptions with demands for evidence and pelt the study, its press release, and Insel’s interpretation with contradictory evidence.

I’m devoting a two-part blog to this effort. Bad science with misogynist, mother bashing assumptions is being touted by the  Director of NIMH as an example to be followed. When he speaks, others pay attention because he sets funding priorities. Okay, Dr. Insel, we will listen up, but we will do so skeptically.

A paper that shares an author with the Molecular Psychiatry paper was criticized by Daniel Engber for delivering

A mishmash of suspect stats and overbroad conclusions, marshaled to advance a theory that’s both unsupported by the data and somewhat at odds with existing research in the field.

The criticism applies to this paper as well.

But first, we need to understand some things about telomere length…

What is a Telomere?

Telomeres are caps on the ends of every chromosome. They protect the chromosome from losing important genes or sticking to other chromosomes. They become shorter every time the cell divides.

I have assembled some resources in an issue of Science-Based Medicine:

Skeptic’s Guide to Debunking Claims about Telomeres in the Scientific and Pseudoscientific Literature

As I say in that blog, there are many exaggerated and outright pseudoscientific claims about telomere length as a measure of “cellular aging” and therefore how long we’re going to live.

I explain the concepts of biomarker and surrogate endpoint, which are needed to understand the current fuss about telomeres. I show why the evidence is against routinely accepting telomere length as a biomarker or surrogate endpoint for accelerated aging and other health outcomes.

I note

  • A recent article in American Journal of Public Health claimed that drinking 20soda kills ounces of carbonated (but not noncarbonated) sugar-sweetened drinks was associated with shortened telomere length “equivalent to an approximately 4.6 additional years of aging.” So, effects of drinking soda on life expectancy is equivalent to what we know about smoking’s effect.
  • Rubbish. Just ignore the telomere length data and directly compare the effects of drinking 20 ounces soda to the effects of smoking on life expectancy. There is no equivalence. The authors confused differences in what they thought was a biomarker with differences in health outcomes and relied on some dubious statistics. The American Journal of Public Health soda study was appropriately skewered in a wonderful Slate article, which I strongly recommend.
  • Claims are made for telomere length as a marker for effects of chronic stress and risk of chronic disease. Telomere length has a large genetic component and is correlated with age. When appropriate controls are introduced, correlation among telomere length, stress, and health outcomes tend to disappear or get sharply reduced.
  • A 30-year birth cohort study did not find an association between exposure to stress and telomere length.
  • Articles from a small group of investigators claim findings about telomere lengths that do not typically get reproduced in larger, more transparently reported studies by independent groups. This group of investigators tends to have or have had conflicts of interest in marketing of telomere diagnostic services, as well as promotion of herbal products to slow or reverse the shortening of telomere length.
  • Generally speaking, reproducible findings concerning telomere length require large samples with well-defined phenotypes, i.e., individuals having well-defined clinical presentations of particular characteristics, and we can expect associations to be small.

Based on what I have learned about the literature concerning telomere length, I would suggest

  • Beware of small studies claiming strong associations between telomere length and characteristics other than age, race, and gender.
  • Beware of studies claiming differences in telomere length arising in cross-sectional research or in the short term if they are not reproduced in longitudinal, prospective studies.

A walk-through commentary and critique of the actual article

Gotlib, I. H., LeMoult, J., Colich, N. L., Foland-Ross, L. C., Hallmayer, J., Joormann, J., … & Wolkowitz, O. M. (2014). Telomere length and cortisol reactivity in children of depressed mothers. Molecular Psychiatry.

Molecular Psychiatry is a pay-walled journal, but a downloadable version of the article is available here.

Conflict of Interest Statement

The authors report no conflict of interest. However, in the soda article published December 2014, one of the authors of the present paper, Jun Lin disclosed being a shareholder in Telomere Diagnostics, Inc., a telomere measurement company. Links at my previous blog post take you to “Telomeres and Your Health: Get the Facts” at the website of that company. You find claims that herbal products based on traditional Chinese medicine can reduce the shortening of telomeres.

Jun Lin has a record of outrageous claims. For instance, in another article, that normal women whose minds wander may be losing four years of life, based on the association between self-reported mind wandering and telomere length. So, if we pit this claim against what is known about the effects of smoking on life expectancy, women can extend their lives almost as much by better paying attention as from quitting smoking.

Hmm, I don’t know if we have undeclared conflict of interest here, but we certainly have a credibility problem.

The Abstract

Past research shows distorted and exaggerated media portrayals of studies are often already evident in abstracts of journal articles. Authors engage in a lot of cherry picking and spin results to strengthen the case their work is innovative and significant.

The opening sentence of the abstract to this article is a mashup of wild claims about telomere length in depression and risk for physical illnesses. But I will leave commenting until we reach the introduction, where the identical statement appears with elaboration and a single reference to one of the author’s work.

The abstract goes on to state

Both MDD and telomere length have been associated independently with high levels of stress, implicating dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion in this relation.

hpa useWhen I showed this to a pioneer in the study of the HPA axis, he remarked:

If you can find coherence in this from the Abstract you are smarter than I am…The phrase dysregulation of the HPA axis has been used to support more hand waving than substance.

The abstract ends with

This study is the first to demonstrate that children at familial risk of developing MDD are characterized by accelerated biological aging, operationalized as shortened telomere length, before they had experienced an onset of depression; this may predispose them to develop not only MDD but also other age-related medical illnesses. It is critical, therefore, that we attempt to identify and distinguish genetic and environmental mechanisms that contribute to telomere shortening.

This breathless editorializing about the urgency of pursuing this line of research is not tied to the actual methods and results of the study. “Accelerated biological aging” and “predispose to develop… other age-related medical illnesses” is not a summary of the findings of the study, but only dubious assumptions.

Actually, the evidence for telomere length as a biomarker for aging is equivocal and does not meet American Federation of Aging Research criteria.  A large scale prospective study did not find that telomere length predicted onset of diabetes or cardiovascular disease.

And wait to when we examine whether the study had reproducible results concerning either shorter telomeres and depression or telomeres being related to cortisol reactivity.

The introduction

The 6-paragraph introduction packs in a lot of questionable assumptions backed by a highly selective citation of the literature.

A growing body of research demonstrates that individuals diagnosed with major depressive disorder (MDD) are characterized by shortened telomere length, which has been posited to underlie the association between depression and increased rates of medical illness, including cardiovascular disease, diabetes, metabolic syndrome, osteoporosis and dementia (see Wolkowitz et al.1 for a review).

Really? A study co-authored by Wolkowitz and cited later in the introduction actually concluded

telomere shortening does not antedate depression and is not an intrinsic feature. Rather, telomere shortening may progress in proportion to lifetime depression exposure.

“Exposure” = personal experience being depressed. This would seem to undercut the rationale for examining telomere shortening in young girls who have not yet become depressed.

But more importantly, nether the Molecular Psychiatry article nor the Wolkowitz review acknowledge the weakness of evidence for

  • Depression being characterized by shortened telomere length.
  • The association of depression and medical illness in older persons representing a causal role for depression that can be modified by or prevention or treatment of depression in young people.
  • Telomere length observed in the young underlying any association between depression and medical illnesses when they get old.

Wolkowitz’s “review” is a narrative, nonsystematic review. The article assumes at the outset that depression represents “accelerated aging” and offers a highly selective consideration of the available literature.

In neither it nor the Molecular Psychiatry article we told

  • Some large scale studies with well-defined phenotypes fail to find associations between telomeres and depressive disorder or depressive symptoms. One large-scale study co-authored by Wolkowitz found weak associations between depression and telomere length too small to be detected in the present small sample. Any apparent association may well spurious.
  • The American Heart Association does not consider depression as a (causal) risk factor for cardiovascular disease, but as a risk marker because of a lack of the evidence needed to meet formal criteria for causality. Depression after a heart attack predicts another heart attack. However, our JAMA systematic review revealed a lack of evidence that screening cardiac patients for depression and offering treatment reduces their likelihood of having another heart attack or improves their survival. An updated review confirmed our conclusions.
  • The association between recent depressive symptoms and subsequent dementia is evident with very low level of symptoms, suggesting that it reflects residual confounding and reverse causation  of depressive symptoms with other risk factors, including poor health and functioning. I published a commentary in British Medical Journal  that criticized  claim that we should begin intervening for even low symptoms of depression in order to prevent dementia. I suggested that we would be treating a confound and it would be unlikely to make a difference in outcomes.

I could go on. Depression causally linked to diabetes via differences in telomere length? Causing osteoarthritis? You gotta be kidding. I demand quality evidence. The burden of evidence is on anyone who makes such wild claims.

Sure, there is lots of evidence that if people have been depressed in the past, they are more likely to get depressed again when they have a chronic illness. And their episodes of depression will last longer.

In general, there are associations between depression and onset and outcome of chronic illness. But the simple, unadjusted association is typically seen at low levels of symptoms, increases with age and accumulation of other risk factors and other physical co-morbidities. People who are older, already showing signs of illness, or who have poor health-related behaviors tend to get sicker and die. Statistical control for these factors reduces or eliminates the apparent association of depressive symptoms with illness outcomes. So, we are probably not dealing with depression per se.  If you are interested in further discussion of this see my slide presentation, see

Negative emotion and health: why do we keep stalking bears, when we only find scat in the woods?

I explain risk factors (like bears) versus risk markers (like scat) and why shooting scat does not eliminate the health risk posed by bears,.

I doubt few people familiar with the literature believe that associations among telomeres and depression, depression and the onset of chronic illness, and telomeres and chronic illness are such that a case could be made for telomere length in young girls being importantly related to physical disease in their mid and late life. This is science fiction being falsely presented as evidence-based.

The authors of the Molecular Psychiatry paper are similarly unreliable when discussing “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” You would think that they are referring to established biomarkers for risk of depression. Actually, most biological correlates of depression are modest, nonspecific to depression, and state, not trait-related – limited to when people are actually depressed.

MDD and ND [nondepressed] individuals exhibited similar baseline and stress cortisol levels, but MDD patients had much higher cortisol levels during the recovery period than their ND counterparts.

We did not find the expected main effects of maternal depression on children’s cortisol  reactivity.

  • They misrepresent a directly relevant study that examined cortisol secretion in the saliva of adolescents as a predictor of subsequent development of depression.  It actually found no baseline measure of cortisol measures predicted development of depression except cortisol awakening response.

In general, cortisol secretion is more related to stress than to clinical depression. One study concluded

The hypothalamic—pituitary—adrenal axis is sensitive to social stress but does not mediate vulnerability to depression.

depressed girlWhat is most outrageous about the introduction, however, is the specification of the pathway between having a depressed mother and shortened telomere length:

The chronic exposure of these children to this stress as a function of living with mothers who have experienced recurrent episodes of depression could represent a mechanism of accelerated biologic aging, operationalized as having shorter telomere length.

Recognize the argument that is being set up: having to deal with the mothers’ depression is a chronic stressor for the daughters, which sets up irreversible processes before the daughters even become depressed themselves, leading to accelerated aging, chronic illness, and early death. We can ignore all the characteristics, including common social factors, that the daughter share with their mothers, that might be the source of any daughters’ problems.

This article is a dream paper for the lawyers for men seeking custody of their children in a divorce: “Your honor, sole custody for my client is the children’s only hope, if it is not already too late. His wife’s depression is irreversibly damaging the children, causing later sickness and early death. I introduced as evidence of an article by Ian Gotlib that was endorsed by the Director of the National Institute of Mental Health…

Geraldine Downey and I warned about this trap in a classic review, children of depressed parents, cited 2300 times according to Google Scholar and still going strong. We noted that depressed mothers and their children share a lot of uncharted biological, psychological, and environmental factors. But we also found that among the strongest risk factors for maternal depression are marital conflict, other life events generated by the marriage and husband, and a lack of marital support. These same factors could contribute to any problems in the children. Actually, the husband could be a source of child problems. Ignoring these possibilities constitutes a “consistent, if unintentional, ‘mother-bashing’ in the literature.”

The authors have asked readers to buy into a reductionist delusion. They assume some biological factors in depression are so clearly established that they can serve as biomarkers.  The transmission of any risk for depression associated with having a depressed mother is by way of irreversible damage to telomeres. We can forget about any other complex social and psychological processes going on, except that the mothers’ depression is stressing the daughters and we can single out a couple of biological variables to examine this.

Methods

The Methods lacks basic details necessary to evaluate the appropriateness of what was done and the conclusions drawn from any results. Nonetheless, there is good reason to believe that we are dealing with a poorly selected sample of daughters from poorly selected mothers.

We’re not told much about the mothers except that they have experienced recurrent depression during the childhood of the daughters. We have to look to other papers coming out of this research group to discover how these mothers were probably identified. What we see is that they are a mixed group, in part drawn from outpatient settings and in part from advertisements in the community.

Recall that identification of biological factors associated with depression requires well-defined phenotypes. The optimal group to study would be patients with severe depression. We know that depression is highly heterogeneous and that “depressed” people in the community who are not in specialty treatment are likely to just barely meet criteria. We are dealing with milder disorder that is less likely to be characterized by any of the biological features of more severe disorder. Social factors likely play more of a role in their misery. In many countries, medication would not be the first line of treatment.

Depression is a chronic, remitting, recurrent disorder with varying degrees of severity of overall course and in particular episodes. It has its onset in adolescence or early adulthood. By the time women have daughters who are 10 to 14 years old, they are likely to have had multiple episodes. But in a sample selected from the community, these episodes may have been mild and not necessarily treated, nor even noticeable by the daughters. The bottom line is we should not be too impressed with the label “recurrent depression” without better documentation of the length, severity, and associated impairment of functioning.

Presumably the depressed mothers in the study were selected because they were currently depressed. That makes it difficult to separate out enduring factors in the mothers and their social context versus those that are tied to the women currently being depressed. And because we know that most biological factors associated with depression are state dependent, we may be getting a skewed picture of the biology of these women – and their daughters, for that matter – then at other times.

Basically, we are dealing with a poorly selected sample of daughters from a poorly selected sample of mothers with depression. The authors are not telling us crucial details that we need to understand any results they get. Apparently they are not measuring relevant variables and have too a small sample to apply statistical controls anyway.As I said about another small study making claims for a blood test for depression, these authors are

Looking for love biomarkers in all the wrong places.

Recall that I also said that results from small samples like this one often conflict with results from larger, epidemiologic studies with larger samples and better defined phenotypes. I think we can see the reasons why developing here. The small sample consist only of daughters who have a depressed mother, but who have not yet become depressed themselves and have low scores on a child depression checklist. Just how representative is the sample? What proportion of daughters this age of depressed women would meet these criteria? How are they similar or different from daughters who have already become depressed? Do the differences lie in their mothers or in the daughters or both? We can’t address any of these questions, but they are highly relevant. That’s why we need more larger clinical epidemiologic studies and fewer small studies of poorly defined samples. Who knows what selection biases are operating?

Searching the literature for what this lab group was doing in other studies in terms of mother and daughter recruitment, I came across a number of small studies of various psychological and psychobiological characteristics of the daughters. We have no idea whether the samples are overlapping or distinct. We have no idea about how the results of these other modest studies confirm or contradict results of the present one. But integrating their results with the results of the present study could have been a start in better understanding it.

As noted in my post at Science Based Medicine, we get a sense of the methods section of the Molecular Psychiatry article of unreliability in single assessments of telomeres. Read the description of the assay of telomere length in the article to get a sense of the authors having to rely on multiple measurements, as well as the unreliability of any single assessment. Look at the paragraph beginning

To control for interassay variability…

This description reflects the more general problems in the comparability of assessment of telomeres across individuals, samples, and laboratories problems that, that preclude recommending telomere length as a biomarker or surrogate outcome with any precision.

Results and Interpretation

As in the methods, the authors fail to supply basic details of the results and leave us having to trust them. There is a striking lack of simple descriptive statistics and bivariate relations, i.e., simple correlations. But we can see signs of unruly, difficult to tame data and spun statistics. And in the end, there are real doubts that there is any connection in these data between telomeres and cortisol.

The authors report a significant difference in telomere length between the daughters of depressed women versus daughters in the control group. Given how the data had to be preprocessed, I would really like to see a scatter plot and examine the effects of outliers before I came to a firm conclusion. With only 50 daughters of depressed mothers and 40 controls, differences could have arose from the influence of one or two outliers.

We are told that the two groups of young girls did not differ in Tanner scores, i.e., self-reported signs of puberty. If the daughters of depressed women had indeed endured “accelerated aging,” would it be reflected in Tanner scores? The authors and for that matter, Insel, seem to take quite literally this accelerated aging thing.

I think we have another seemingly large difference coming from a small sample that is statistically improbable to yield such a difference, given past findings. I could be convinced by these data of group differences in telomere length, but only if findings were replicated in an independent, adequately sized sample. And I still would not know what to make of them.

The authors fuss about  anticipating a “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” They indicate that the cortisol data was highly skewed and had to be tamed by winsorizing, i.e., substituting arbitrary values for outliers. We are not told for how many subjects this was done or from which group they came. The authors then engaged in some fancy multivariate statistics, “a piecewise linear growth model to fit the quadratic nature of the [winsorized] data.”  We need to keep in mind that multilevel modeling is not a magic wand to transform messy data. Rather, it involves some assumptions that need to be tested and not assumed. We get no evidence of the assumptions being tested and the small sample sizes is such that they could not be reliably tested.

The authors found no differences in baseline cortisol secretion. Moreover, they found no differences in distress recovery for telomere length, group (depressed versus nondepressed mother), or group by telomere interaction. They found no effect for group or group by telomere interaction, but they did find a just significant (p< .042) main effect for telomere length on cortisol reactivity. This would not to seem to offer much support for a dysregulation of the HPA axis or anomalous levels of cortisol secretion associated with group membership (having a depressed versus nondepressed mother). If we are guided by the meta-analysis of depression and cortisol secretion, the authors should have obtained a group difference in recovery, which they didn’t. I really doubt this is reproducible in a larger, independent sample, with transparently reported statistics.

Recognize what we have here: prestigious journals like Molecular Psychiatry have a strong publication bias in requiring statistical significance. Authors therefore must chase and obtain statistical significance. There is miniscule difference from p<.042 and p<.06 – or p<.07, for that matter – particularly in the context of multivariate statistics being applied to skewed and winsorized data. The difference is well within the error of messy measurements. Yet if the authors had obtained p<.06 or p<.07, we probably wouldn’t get to read their story, at least in Molecular Psychiatry.*

Stay tuned for my next installment in which I compare results of this study to the press release and coverage in Insel’s personal blog.  I particularly welcome feedback before then.

*For a discussion of whether “The number of p-values in the psychology literaturethat barely meet the criterion for statistical significance (i.e., that fall just below .05) is unusually large,” see Masicampo and LaLande (2012)  and Lakens (2015).