Wisdom of the Ego: Childhood Adverse Experiences Are Not Destiny

Today’s readers probably can’t appreciate how radical George Valliant’s work was in its day.

George Valliant drew upon a longitudinal study of adult development to challenge the Freudian idea of childhood adverse experiences as destiny.

mind the brain logo

Free download of George Valliant’s Wisdom of the Ego

wisdom of the ego

 

 

Today’s readers probably can’t appreciate how radical George Valliant’s work was in its day.

George Valliant drew upon a longitudinal study of adult development to challenge the Freudian idea of childhood adverse experiences as destiny.

 

 

You can learn more about the study Valliant headed

Harvard study of development 

And

Summary of the Harvard Grant Study: Triumphs of Experience

I know, in his last book, George Valliant turned into kind of a positive psychology guru of sorts, using results of the study to espouse views about how to lead a happy and meaningful life. I’ll just have to live with that and maybe some of the liberties he took in interpreting his data.

But now the important thing is that his classic book, Wisdom of the Ego, is available free for download. Get it here

The website is perfectly safe. I’ve made one of my own books available there. After having made lots of money from publishing mainly psychoanalytically and psychodynamically oriented psychotherapy books, Jason Aronson, Publisher is on a mission to give a lot of books away free

As of October 1, 2018 readers just like you from 200 countries and territories around the world have saved $55,685,206.30 on 1,149,012 FREE downloads of classic psychotherapy books.

From the original blurb for the book:

Freud tells us that the first five years of life constitute destiny. If this were so, Vaillant asks, then how could so many deeply troubled youths become well-adjusted, productive adults? Drawing on the Study of Adult Development, based at Harvard University, this book takes us into the lives of such individuals—thriving men and women who suffered grievous disadvantages and abuses during childhood—to show us that the mind’s remarkable defense develop well into adulthood, that the maladjustments of adolescence can evolve into the virtues of maturity. In one fascinating case after another, he introduces us to middle-aged men and women learning how to love, to make meaning, to reorder chaos.

Because creativity is so intrinsic to this alchemy of the ego, Vaillant mingles these life studies with psychobiographies of famous artists and others. We meet Florence Nightingale, the intractable hypochondriac and hopeless dreamer who, at the age of thirty-one, wrote in her diary, “I see nothing desirable but death,” and we watch as she transforms her anguish into altruism, her hapless fantasies into fantastic success. In the tormented life of Sylvia Plath, we see psychosis as not only a defect but also an effort at repair, her poetry as an extraordinary illustration of the adaptive process. We witness the mature working of the mind’s defenses in the career of Anna Freud, their greatest elucidator. And we see the wisdom of the ego at work as Eugene O’Neill evolves from self-destructive youth to creator of great art.

In these compelling portraits of obscure and famous lives, Vaillant charts the evolution of the ego’s defenses, from the psychopathic to the sublime, and from the mundane to the most ingenious. An account of the boundless psychological resilience of adult development, The Wisdom of the Ego is a brilliant summation of the mind’s amazing power to fashion creative victories out of life’s would-be defeats (1041 pgs).

From a couple of reviews at the time:

“A richly textured, elegantly written, and humane book by the person who is becoming the Anna Freud of his day. Vaillant’s sympathetic treatment of the defenses is itself wise and creative.” —Robert Kegan, Harvard University and Massachusetts School of Professional Psychology

“Vaillant tells us that ego defenses are not pathological formations or symptoms of mental illness. They are ingenious self-deceptions that serve adaptation… He is to be commended for bringing certain unconscious processes into focus and for illuminating the various ways in which ego defenses contribute to a person’s adaptation to life.”—Louise J. Kaplan, The Boston Sunday Globe

You may also be interested in two of my controversial, but most heavily accessed blog posts:

Stop using the Adverse Childhood Experiences Checklist to make claims about trauma causing physical and mental health problems

And

In a classic study of early childhood abuse and neglect, effects on later mental health nearly disappeared when….

 

 

How to get a flawed systematic review and meta-analysis withdrawn from publication: a detailed example

Cochrane normally requires authors to agree to withdraw completed reviews that have been published. This withdrawal in the face of resistance from the authors is extraordinary.

There is a lot to be learned from this letter and the accompanying documents in terms of Courtney calmly and methodically laying out a compelling case for withdrawal of a review with important clinical practice and policy implications.

mind the brain logo

Robert Courtney’s wonderfully detailed cover letter probably proved decisive in getting the Cochrane review withdrawn, along with the work of another citizen scientist/patient advocate, Tom Kindlon.

Cochrane normally requires authors to agree to withdraw completed reviews that have been published. This withdrawal in the face of resistance from the authors is extraordinary.

There is a lot to be learned from this letter and the accompanying documents in terms of Courtney calmly and methodically laying out a compelling case for withdrawal of a review with important clinical practice and policy implications.

Especially take a look at the exchanges with the author Lillebeth Larun that are included in the letter.

Excerpt from the cover letter below:

It is my opinion that the published Cochrane review unfortunately fails to meet the standards expected by the public of Cochrane in terms of publishing rigorous, unbiased, transparent and independent analysis; So I would very much appreciate it if you could investigate all of the problems I raised in my submitted comments and ensure that corrections are made or, at the very least, that responses are provided which allow readers to understand exactly why Cochrane believe that no corrections are required, with reference to Cochrane guidelines.

On this occasion, in certain respects, I consider the review to lack rigour, to lack clarity, to be misleading, and to be flawed. I also consider the review (including the discussions, some of the analyses, and unplanned changes to the protocol) to indicate bias in favour of the treatments which it investigates.

robert bob courtneyAnother key excerpt summarized Courtney’s four comments on the Cochrane review that had not yet succeeded in getting the review withdrawn:

In summary, my four submissions focus on, but are not restricted to the following issues:

  • The review authors switched their primary outcomes in the review, and used unplanned analyses, which has had the effect of substantially transforming some of the interpretation and reporting of the primary outcomes of the review;

  • The review fails to prominently explain and describe the primary outcome switching and to provide a prominent sensitivity analysis. In my opinion, the review also fails to justify the primary outcome switching;

  • The review fails to clearly report that there were no significant treatment effects at follow-up for any pooled outcomes in any measures of health (except for sleep, a secondary outcome), but instead the review gives the impression that most follow-up outcomes indicated significant improvements, and that the treatments were largely successful at follow-up;

  • The review uses some unpublished and post-hoc data from external studies, despite the review-authors claiming that they have included only formally published data and pre-specified outcome data. Using post-hoc and unpublished data, which contradicts the review’s protocol and stated methodology, may have had a significant effect on the review outcomes, possibly even changing the review outcomes from non-significant to significant;

  • The main discussion sections in the review include incorrect and misleading reports of the review’s own outcomes, giving a.false overall impression of the efficacy of the reviewed therapies;

  • The review includes an inaccurate assessment of bias (according to the Cochrane guidelines for reporting bias) with respect to some of the studies included in the review’s analyses.

These are all serious issues, that I believe we should not be seeing in a Cochrane review.

Digression: My Correspondence with Tom Kindlon regarding this blog post

James Coyne <jcoynester@gmail.com>

Oct 18, 2018, 12:45 PM (3 days ago)

to Tom

I’m going to be doing a couple of blog posts about Bob, one of them about the details of the lost year of his life (2017) which he shared with me in February 2018, shortly before he died. But the other blog post is going to be basically this long email posted with commentary. I am concerned that you get your proper recognition as fully sharing the honors with him for ultimately forcing the withdrawal of the exercise review. Can you give me some suggestion how that might be assured? references? blogs

Do you know the details of Bob ending his life? I know it was a deliberate decision, but was it an accompanied suicide? More people need to know about his involuntary hospitalization and stupid diagnosis of anorexia.

Kind regards

tom Kindlon
Tom Kindlon

Tom Kindlon’s reply to me

Tom Kindlon

Oct 18, 2018, 1:01 PM (3 days ago)

Hi James/Jim,

It is great you’re going to write on this.

I submitted two long comments on the Cochrane review of exercise therapy for CFS, which can be read here:

<https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub7/detailed-comment/en?messageId=157054020&gt;

<https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub7/detailed-comment/en?messageId=157052118&gt;

Robert Courtney then also wrote comments. When he was not satisfied with the responses, he made a complaint.

All the comments can be read on the review here:

<https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub7/read-comments&gt;

but as I recall the comments by people other than Robert and myself were not substantial.

I will ask what information can be given out about Bob’s death.

Thanks again for your work on this,

Tom

The Cover Letter: Did it break the impasse about withdrawing the review?

from:     Bob <brightonbobbob@yahoo.co.uk>

to:            James Coyne <jcoynester@gmail.com>

date:     Feb 18, 2018, 5:06 PM

subject:                Fw: Formal complaint – Cochrane review CD003200Sun, Feb 18, 1:15 PM

THIS IS A COPY OF A FORMAL COMPLAINT SENT TO DR DAVID TOVEY.

Formal Complaint

12th February 2018

From:

Robert Courtney.

UK

To:

Dr David Tovey

Editor in Chief of the Cochrane Library

Cochrane Editorial Unit

020 7183 7503

dtovey@cochrane.org

Complaint with regards to:

Cochrane Database of Systematic Reviews.

Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2017; CD003200. DOI: 10.1002/14651858.CD003200.pub7

Dear Dr David Tovey,

This is a formal complaint with respect to the current version of “Exercise therapy for chronic fatigue syndrome” by L. Larun et al. (Cochrane Database Syst Rev. 2017; CD003200.)

First of all, I would like to apologise for the length of my submissions relating to this complaint. The issues are technical and complex and I hope that I have made them easy to read and understand despite the length of the text.

I have attached four PDF files to this email which outline the details of my complaint. In 2016, I submitted each of these documents as part of the Cochrane comments facility. They have now been published in the updated version of the review. (For your convenience, the details of these submissions are listed at the end of this email with a weblink to an online copy of each document.)

I have found the responses to my comments, by L. Larun, the lead author of the review, to be inadequate, especially considering the seriousness of some of the issues raised.

It is my opinion that the published Cochrane review unfortunately fails to meet the standards expected by the public of Cochrane in terms of publishing rigorous, unbiased, transparent and independent analysis; So I would very much appreciate it if you could investigate all of the problems I raised in my submitted comments and ensure that corrections are made or, at the very least, that responses are provided which allow readers to understand exactly why Cochrane believe that no corrections are required, with reference to Cochrane guidelines.

On this occasion, in certain respects, I consider the review to lack rigour, to lack clarity, to be misleading, and to be flawed. I also consider the review (including the discussions, some of the analyses, and unplanned changes to the protocol) to indicate bias in favour of the treatments which it investigates.

Exercise as a therapy for chronic fatigue syndrome is a highly controversial subject, and so there may be more of a need for independent oversight and scrutiny of this Cochrane review than might usually be the case.

In addition to the technical/methodological issues raised in my four submitted comments, I would also like you to consider whether there may be a potential lack of independence on the part of the authors of this review.

All of the review authors, bar Price, are currently working in collaboration on another Cochrane project with some of the authors of the studies included in this review. (The project involves co-authoring a protocol for a future Cochrane review) [2]. One of the meetings held to develop the protocol for this new review was funded by Peter White’s academic fund [1]. White is the Primary Investigator for the PACE trial (a study included in this Cochrane review).

It is important that Cochrane is seen to uphold high standards of independence, transparency and rigour.

Please refer to my four separate submissions (attached) for the details of my complaint regarding the contents of the review. As way of an introduction, only, I will also briefly discuss, below, some of the points I have raised in my four documents.

In summary, my four submissions focus on, but are not restricted to the following issues:

  • The review authors switched their primary outcomes in the review, and used unplanned analyses, which has had the effect of substantially transforming some of the interpretation and reporting of the primary outcomes of the review;
  • The review fails to prominently explain and describe the primary outcome switching and to provide a prominent sensitivity analysis. In my opinion, the review also fails to justify the primary outcome switching;
  • The review fails to clearly report that there were no significant treatment effects at follow-up for any pooled outcomes in any measures of health (except for sleep, a secondary outcome), but instead the review gives the impression that most follow-up outcomes indicated significant improvements, and that the treatments were largely successful at follow-up;
  • The review uses some unpublished and post-hoc data from external studies, despite the review-authors claiming that they have included only formally published data and pre-specified outcome data. Using post-hoc and unpublished data, which contradicts the review’s protocol and stated methodology, may have had a significant effect on the review outcomes, possibly even changing the review outcomes from non-significant to significant;
  • The main discussion sections in the review include incorrect and misleading reports of the review’s own outcomes, giving a.false overall impression of the efficacy of the reviewed therapies;
  • The review includes an inaccurate assessment of bias (according to the Cochrane guidelines for reporting bias) with respect to some of the studies included in the review’s analyses.

These are all serious issues, that I believe we should not be seeing in a Cochrane review.

These issues have already caused misunderstanding and misreporting of the review in academic discourse and publishing. (See an example of this below.)

All of the issues listed above are explained in full detail in the four PDF files attached to this email. They should be considered to be the basis of this complaint.

For the purposes of this correspondence, I will illustrate some specific issues in more detail.

In the review, the following health indicators were used as outcomes to assess treatment effects: fatigue, physical function, overall health, pain, quality of life, depression, anxiety, and sleep. All of these health indicators, except uniquely for sleep (a secondary outcome) demonstrated a non-significant outcome for pooled treatment effects at follow-up for exercise therapy versus passive control. But a reader would not be aware of this from reading any of the discussion in the review. I undertook a lengthy and detailed analysis of the data in the review before i could comprehend this. I would like these results to be placed in a prominent position in the review, and reported correctly and with clarity, so that a casual reader can quickly understand these important outcomes. These outcomes cannot be understood from reading the discussion, and some outcomes have been reported incorrectly in the discussion. In my opinion, Cochrane is not maintaining its expected standards.

Unfortunately, there is a prominent and important error in the review, which I believe helps to give the mis-impression that the investigated therapies were broadly effective. Physical function and overall-health (both at follow-up) have been mis-reported in the main discussion as being positive outcomes at follow-up, when in fact they were non-significant outcomes. This seems to be an important failing of the review that I would like to be investigated and corrected.

Regarding one of the points listed above, copied here:

“The review fails to clearly report that there were no significant treatment effects at follow-up for any pooled outcomes in any measures of health (except for sleep, a secondary outcome), but instead the review gives the impression that most follow-up outcomes indicated significant improvements, and that the treatments were largely successful at follow-up”

This is one of the most substantial issues that I have highlighted. This issue is related to the primary outcome switching in the review.

(This relates to assessing fatigue at long-term follow-up for exercise therapy vs passive control.)

An ordinary (i.e. casual) reader of the review may easily be left with the impression that the review demonstrates that the investigated treatment has almost universal beneficial health effects. However there were no significant treatment effects for pooled outcome analyses at follow-up for any health outcomes except for sleep (a secondary outcome ). The lack of universal treatment efficacy at follow-up is not at all clear from a casual read of the review, or even from a thorough read. Instead, a careful analysis of the data is necessary to understand the outcomes. I believe that the review is unhelpful in the way it has presented the outcomes, and lacks clarify.

These follow-up outcomes are a very important issue for medical, patient and research communities, but I believe that they have been presented in a misleading and unhelpful way in the discussions of the review. This issue is discussed mainly in my submission no.4 (see my list of PDF documents at the bottom of this correspondence), and also a little in submission no.3.

I will briefly explain some of the specific details, as way of an introduction, but please refer to my attached documents for the full details.

The pre-specified primary outcomes were pooled treatment effects (i.e. using pooled data from all eligible studies) immediately after treatment and at follow-up.

However, for fatigue, this pre-specified primary outcome (i.e. pooled treatment effects for the combination of data from all eligible studies) was abandoned/switched (for what i consider to be questionable reasons) and replaced with a non-pooled analysis. The new unplanned analysis did not pool the data from all eligible studies but analysed data from studies grouped together by the specific measure used to assess fatigue (i.e. grouped by the various different fatigue questionnaire assessments).

Looking at these post-hoc grouped outcomes, for fatigue at follow-up , two out of the three grouped outcomes had significant treatment effects, and the other outcome was a non-significant effect. This post-hoc analysis indicates that the majority of outcomes ( i.e. two out of three) demonstrated a significant treatment effect , however, this does not mean that the pre-specified pooled analysis of all eligible studies would have demonstrated a positive treatment effect. Therefore switching outcomes, and using a post-hoc analysis, allows for the potential introduction of bias to the review. Indeed, on careful inspection of the minutia of the review, the pre-specified analysis of pooled outcomes demonstrates a non-significant treatment effect, for fatigue at follow-up (exercise therapy versus passive control)

The (non-significant) outcome of this pre-specified pooled analysis of fatigue at follow-up is somewhat buried within the data tables of review, and is very difficult to find; It is not discussed prominently or highlighted. Furthermore, the explanation that the primary outcome was switched, is only briefly mentioned and can easily be missed. Uniquely, for the main outcomes, there is no table outlining the details of the pre-specified pooled analysis of fatigue at follow-up. In contrast, the post-hoc analysis, which has mainly positive outcomes, has been given high prominence throughout the review with little explanation that it is a post-hoc outcome.

So, to reiterate, the (two out of three significant, and one non-significant) post-hoc outcomes for fatigue at follow-up were reported as primary outcomes instead of the (non-significant) pre-specified pooled treatment effect for all eligible studies. Two out of three post-hoc outcomes were significant in effect, however, the pre-specified pooled treatment effect, for the same measures, were not significant (for fatigue at follow-up – exercise therapy versus passive control). Thus, the outcome switching transformed one of the main outcomes of the review, from a non-insignificant effect to a mainly significant effect.

Furthermore, for exercise therapy versus passive control at follow-up, all the other health outcomes were non-significant (except sleep – a secondary outcome), but I believe the casual reader would be unaware of this because it is not explained clearly or prominently in the discussion, and some outcomes have been reported erroneously in the discussion as indicating a significant effect.

All of the above is outlined in my four PDF submissions, with detailed reference to specific sections of the review and specific tables etc.

I believe that the actual treatment effects at follow-up are different to the impression gained from a casual read of the review, or even a careful read of the review. It’s only by an in-depth analysis of the entire review that these issues would be noticed.

In what i believe to be a reasonable request in my submissions, i asked the reviewers to: “Clearly and unambiguously explain that all but one health indicator (i.e. fatigue, physical function, overall health, pain, quality of life, depression, and anxiety, but not sleep) demonstrated a non-significant outcome for pooled treatment effects at follow-up for exercise therapy versus passive control”. My request was not acted upon.

The Cochrane reviewers did provide a reason for the change to the protocol, from a pooled analysis to analyses of groups of mean difference values: “We realise that the standardised mean difference (SMD) is much more difficult to conceptualise and interpret than the normal mean difference (MD) […]”.

However, this is a questionable and unsubstantiated claim, and in my opinion isn’t an adequate explanation or justification for changing the primary outcomes; personally, I find it easier to interpret a single pooled analysis than a group of different analyses with each analysis using a different non-standardised scale to measure fatigue.

Using a SMD is standard practice for Cochrane reviews; Cochrane’s guidance recommends using pooled analyses when the outcomes use different measures, which was the case in this review; Thus i struggle to understand why (in an unplanned change to methodology) using a SMD was considered unhelpful by the reviewers in this case. My PDF document no.4 challenges the reviewers’ reason, with reference to the official Cochrane reviewers’ guidelines.

This review has already led to an academic misunderstanding and mis-reporting of its outcomes, which is demonstrated in the following published letter from one of the co-authors of the IPD protocol……

CMAJ (Canada) recommends exercise for CFS [http://www.cmaj.ca/content/188/7/510/tab-e-letters ]

The letter claims: “We based the recommendations on the Cochrane systematic review which looked at 8 randomised trials of exercise for chronic fatigue, and together showed a consistent modest benefit of exercise across the different patient groups included. The clear and consistent benefit suggests indication rather than contraindication of exercise.”

However, there was not a “consistent modest benefit of exercise” and there was not a “clear and consistent benefit” considering that there were no significant treatment effects for any pre-specified (pooled) health outcomes at follow-up, except for sleep. The actual outcomes of the review seem to contradict the interpretation expressed in the letter.

Even if we include the unplanned analyses in our considerations, then it would still be the case that most outcomes did not indicate a beneficial treatment effect at follow-up for exercise therapy versus passive control. Furthermore, one of the most important outcomes, physical function, did not indicate a significant improvement at follow up (despite the discussion erroneously stating that it was a significant effect).

Two of my submissions discuss other issues, which I will outline below.

My first submission is in relation to the following…

The review states that all the analysed data had previously been formally published and was pre-specified in the relevant published studies. However, the review includes an analysis of external data that had not been formally published and is post-hoc in nature, despite alternative data being available that has been formally published and had been pre-specified in the relevant study. The post-hoc data relates to the FINE trial (Wearden 2010). The use of this data was not in accordance with the Cochrane review’s protocol and also contradicts the review’s stated methodology and the discussion of the review.

Specifically, the fatigue data taken from the FINE trial was not pre-specified for the trial and was not included in the original FINE trial literature. Instead, the data had been informally posted on a BMJ rapid response by the FINE trial investigators[3].

The review analyses post-hoc fatigue data from the FINE trial which is based on the Likert scoring system for the Chalder fatigue questionnaire, whereas the formally published FINE trial literature uses the same Chalder fatigue questionnaires but uses the biomodal scoring system, giving different outcomes for the same patient questionnaires. The FINE trial’s post-hoc Likert fatigue data (used in the review) was initially published by the FINE authors only in a BMJ rapid response post [3], apparently as an after-thought.

This is the response to my first letter…

Larun
Larun said she was “extremely concerned and disappointed” with the Cochrane editors’ actions. “I disagree with the decision and consider it to be disproportionate and poorly justified,” she said.

———————-

Larun said:

Dear Robert Courtney

Thank you for your detailed comments on the Cochrane review ‘Exercise Therapy for Chronic Fatigue Syndrome’. We have the greatest respect for your right to comment on and disagree with our work. We take our work as researchers extremely seriously and publish reports that have been subject to rigorous internal and external peer review. In the spirit of openness, transparency and mutual respect we must politely agree to disagree.

The Chalder Fatigue Scale was used to measure fatigue. The results from the Wearden 2010 trial show a statistically significant difference in favour of pragmatic rehabilitation at 20 weeks, regardless whether the results were scored bi-modally or on a scale from 0-3. The effect estimate for the 70 week comparison with the scale scored bi-modally was -1.00 (CI-2.10 to +0.11; p =.076) and -2.55 (-4.99 to -0.11; p=.040) for 0123 scoring. The FINE data measured on the 33-point scale was published in an online rapid response after a reader requested it. We therefore knew that the data existed, and requested clarifying details from the authors to be able to use the estimates in our meta-analysis. In our unadjusted analysis the results were similar for the scale scored bi-modally and the scale scored from 0 to 3, i.e. a statistically significant difference in favour of rehabilitation at 20 weeks and a trend that does not reach statistical significance in favour of pragmatic rehabilitation at 70 weeks. The decision to use the 0123 scoring did does not affect the conclusion of the review.

Regards,

Lillebeth Larun

——————

In her response, above, Larun discusses the FINE trial and quotes an effect size for post-hoc outcome data (fatigue at follow-up) from the FINE trial that is included in the review. Her quoted figures accurately reflect the data quoted by the FINE authors in their BMJ rapid-response comment [3] but, confusingly, these are slightly different from the data in the Cochrane review. In her response, Larun states that the FINE trial effect size for fatigue at 70 weeks using Likert data is -2.55 (-4.99 to -0.11; p=.040), whereas the Cochrane Review states that it is -2.12 [-4.49, 0.25].

This inconsistency makes this discussion confusing. Unfortunately there is no authoritative source for the data because it had not been formally published when the Cochrane review was published.

It seems that, in her response, Larun has quoted the BMJ rapid response data by Wearden et al.[3], rather than her own review’s data. Referring to her review’s data, Larun says that in “our unadjusted analysis the results were similar for the scale scored bi-modally and the scale scored from 0 to 3, i.e. a statistically significant difference in favour of rehabilitation at 20 weeks and a trend that does not reach statistical significance in favour of pragmatic rehabilitation at 70 weeks”.

It is not clear exactly why there are now two different Likert effect sizes, for fatigue at 70 weeks, but we can be sure that the use of this data undermines the review’s claim that “for this updated review, we have not collected unpublished data for our outcomes…”

This confusion, perhaps, demonstrates one of the pitfalls of using unpublished data. The difference between the data published in the review and the data quoted by Larun in her response (which are both supposedly the same unpublished data from the FINE trial) raises the question of exactly what data has been analysed in the review, and what exactly is the source . If it is unpublished data, and seemingly variable in nature, how are readers expected to scrutinise or trust the Cochrane analysis?

With respect to the FINE trial outcomes (fatigue at 70 week follow-up), Larun has provided the mean differences (effect size) for the (pre-specified) bimodal data and for (post-hoc) Likert data. These two different scoring methods (bimodel and Likert), are used for identical patient Chalder fatigue questionnaires, and provide different effect sizes, so switching the fatigue scoring methods may possibly have had an impact on the review’s primary outcomes for fatigue.

Larun hasn’t provided the effect estimates for fatigue at end-of-treatment, but these would also demonstrate variance between bimodal and Likert scoring, so switching the outcomes might have had a significant impact on the primary outcome of the Cochrane review at end-of-treatment, as well as at follow-up.

Note that the effect estimates outlined in this correspondence, for the FINE trial, are mean differences (this is the data taken from the FINE trial), rather than standardised mean differences (which are sometimes used in the meta-analyses in the Cochrane review); It is important not to get confused between the two different statistical analyses.

Larun said: “The decision to use the 0123 [i.e. Likert] scoring did does [sic] not affect the conclusion of the review.”

But it is not possible for a reader to verify that because Larun has not provided any evidence to demonstrate that switching outcomes has had no effect on the conclusion of the review. i.e. There is no sensitivity analysis, despite the review switching outcomes and using unpublished post-hoc data instead of published pre-specified data. This change in methodology means that the review does not conform to its own protocol and stated methodology. This seems like a significant issue.

Are we supposed to accept the word of the author, rather than review the evidence for ourselves? This is a Cochrane review – renowned for rigour and impartiality.

Note that Larun has acknowledged that I am correct with respect to the FINE trial data used in the review (i.e. that the data was unpublished and not part of the formally published FINE trial study, but was simply posted informally in a BMJ rapid response). Larun confirms that: “…the 33-point scale was published in an online rapid response after a reader requested it. We therefore knew that the data existed, and requested clarifying details from the authors…” But then Larun confusingly (for me) says we must “agree to disagree”.

Larun has not amended her literature to resolve the situation; Larun has not changed her unplanned analysis back to her planned analyses (i.e. to use published pre-specified data as per the review protocol, rather than unpublished post-hoc data); nor has she amended the text of the review so that it clearly and prominently indicates that the primary outcomes were switched. Neither has a sensitivity analysis been published using the FINE trial’s published pre-specified data.

Note the difference in the effect estimates at 70 weeks for bimodal scoring [-1.00 (CI -2.10 to +0.11; p =.076)] vs Likert scoring [-2.55 (-4.99 to -0.11; p=.040)] (as per the Cochrane analysis) or -2.12 [-4.49, 0.25] (also Likert scoring) as per Larun’s response and the BMJ rapid response where the data was initially presented to the public.

Confusingly, there are two different effect sizes for the same (Likert) data; one shows a significant treatment effect and the other shows a non-significant treatment effect. This seems like a rather chaotic situation for a Cochrane review . The data is neither consistent nor transparent. The unplanned Cochrane analysis uses data which has not been published and cannot be scrutinised.

Furthermore, we now have three sets of data for the same outcomes. Because an unplanned analysis was used in the review, it is nearly impossible to work out what is what.

In her response, above, Larun says that both fatigue outcomes (i.e. bimodal & Likert scoring systems) at 70 weeks are non-significant. This is true of the data published in the Cochrane review but, confusingly, this isn’t true if we consider the data that Larun has provided in her response, above. The bimodal and Likert data (fatigue at 70 weeks) presented in the review both have a non-significant effect, however, the Likert data quoted in Larun’s correspondence (which reflects the data in the FINE trial authors’ BMJ rapid response) shows a significant outcome. This may reflect the use of adjusted vs unadjusted data, but it isn’t clear.

Using post-hoc data may allow bias to creep into the review; For example, the Cochrane reviewers might have seen the post hoc data for the FINE trial , because it was posted in an open-access BMJ rapid response [3] prior to the Cochrane review publication date. I am not accusing the authors of conscious bias but Cochrane guidelines are put in place to avoid doubt and to maintain rigour and transparency. Hypothetically, a biased author may have seen that a post-hoc Likert analysis allowed for better outcomes to be reported for the FINE trial. The Cochrane guidelines are established in order to avoid such potential pitfalls and bias, and to avoid the confusion that is inherent in this review.

Note that the review still incorrectly says that all the data is previously published data – even though Larun admits in the letter that it isn’t. (i.e. the data are not formally published in a peer-reviewed journal; i assume that the review wasn’t referring to data that might be informally published in blogs or magazines etc, because the review pretends to analyse formally published data only.)

The authors have practically dismissed my concerns and have not amended anything in the review, despite admitting in the response that they’ve used post-hoc data.

The fact that this is all highly confusing, even after I have studied it in detail, demonstrates that these issues need to be straightened out and fixed.

It surely shouldn’t be the case, in a Cochrane review, that we ( for the same outcomes ) have three sets of results being bandied about, and the data used in a post hoc analysis seems to vary over time, and change from a non-significant treatment effect to a significance treatment effect, depending on where it is quoted. Because it is unpublished, independent scrutiny is made more difficult.

For your information, the BMJ rapid response (Wearden et al.) includes the following data : “Effect estimates [95% confidence intervals] for 20 week comparisons are: PR versus GPTAU -3.84 [-6.17, -1.52], SE 1.18, P=0.001; SL versus GPTAU +0.30 [-1.73, +2.33], SE 1.03, P=0.772. Effect estimates [95% confidence intervals] for 70 week comparisons are: PR versus GPTAU -2.55 [-4.99,-0.11], SE 1.24, P=0.040; SL versus GPTAU +0.36 [-1.90, 2.63], SE 1.15, P=0.752.”

My second submission was in relation to the following…

I believe that properly applying the official Cochrane guidelines would require the review to categorise the PACE trial (White 2011) data as ‘unplanned’ rather than ‘pre-specified’, and would require the risk of bias in relation to ‘selective reporting’ to be categorised accordingly. The Cochrane review currently categorises the risk of ‘selective reporting’ bias for the PACE trial as “low”, whereas the official Cochrane guidelines indicate (unambiguously) that the risk of bias for the PACE data should be “high”. I believe that my argument is fairly robust and water-tight.

This is the response to my second letter…

———————–

Larun said:

Dear Robert Courtney

Thank you for your detailed comments on the Cochrane review ‘Exercise Therapy for Chronic Fatigue Syndrome’. We have the greatest respect for your right to comment on and disagree with our work. We take our work as researchers extremely seriously and publish reports that have been subject to rigorous internal and external peer review. In the spirit of openness, transparency and mutual respect we must politely agree to disagree.

Cochrane reviews aim to report the review process in a transparent way, for example, are reasons for the risk of bias stated. We do not agree that Risk of Bias for the Pace trial (White 2011) should be changed, but have presented it in a way so it is possible to see our reasoning. We find that we have been quite careful in stating the effect estimates and the certainty of the documentation. We note that you read this differently.

Regards,

Lillebeth

————————-

I do not understand what is meant by: “We do not agree that Risk of Bias for the Pace trial (White 2011) should be changed, but have presented it in a way so it is possible to see our reasoning.” …

The review does not discuss the issue of the PACE data being unplanned and I, for one, do not understand the reasoning for not correcting the category for the risk of selective reporting bias. The response to my submission fails to engage with the substantive and serious issues that I raised.

To date, nearly all the issues raised in my letters have been entirely dismissed by Larun. I find this surprising, especially considering that some of the points that I have made were factual (i.e. not particularly open to interpretation) and difficult to dispute. Indeed, Larun’s response even accepts the factual point that I made, in relation to the FINE data, but then confusingly dismisses my request for the issue to be remedied.

There is more detail in the four PDF submissions which are attached to this email, and which have now been published in the latest version of the Cochrane review. I will stop this email now so as not to overwhelm you, and so I don’t repeat myself .

Again, I apologise for the complexity. My four submissions , attached to this email as PDF files, form the basis of my complaint so I ask you to consider them to be the central basis of my complaint . I hope that they will be sufficiently clear.

I trust that you will wish to investigate these issues, with a view to upholding the high standards expected from a Cochrane review.

I look forward to hearing from you in due course. Please feel free to email me at any time with any questions, of if you believe it would be helpful to discuss any of the issues raised.

Regards,

Robert Courtney.

My ‘comments’ (submitted to the Cochrane review authors):

Please note that the four attached PDF documents form the basis of this complaint.

For your convenience, I have included a weblink to a downloadable online copy of each document, and I have attached copies to this email as PDF files, and the comments have now been published in the latest updated version of the review.

The dates refer to the date the comments were submitted to Cochrane.

  1. Query re use of post-hoc unpublished outcome data: Scoring system for the Chalder fatigue scale, Wearden 2010.

Robert Courtney

16th April 2016

https://sites.google.com/site/mecfsnotes/submissions-to-the-cochrane-review-of-exercise-therapy-for-chronic-fatigue-syndrome/fine-trial-unpublished-data

  1. Assessment of Selective Reporting Bias in White 2011.

Robert Courtney

1st May 2016

https://sites.google.com/site/mecfsnotes/submissions-to-the-cochrane-review-of-exercise-therapy-for-chronic-fatigue-syndrome/pace-trial-selective-reporting-bias

  1. A query regarding the way outcomes for physical function and overall health have been described in the abstract, conclusion and discussions of the review.

Robert Courtney

12th May 2016

[ https://sites.google.com/site/mecfsnotes/submissions-to-the-cochrane-review-of-exercise-therapy-for-chronic-fatigue-syndrome/misreporting-of-outcomes-for-physical-function ]

  1. Concerns regarding the use of unplanned primary outcomes in the Cochrane review.

Robert Courtney

3rd June 2016

https://sites.google.com/site/mecfsnotes/submissions-to-the-cochrane-review-of-exercise-therapy-for-chronic-fatigue-syndrome/primary-outcome-switching

References:

  1. Quote from Cochrane reference CD011040:

“Acknowledgements[…]The author team held three meetings in 2011, 2012 and 2013 which were funded as follows: […]2013 via Peter D White’s academic fund (Professor of Psychological Medicine, Centre for Psychiatry, Wolfson Institute of Preventive Medicine, Barts and The London School of Medicine and Dentistry, Queen Mary University of London).”

  1. Larun L, Odgaard-Jensen J, Brurberg KG, Chalder T, Dybwad M, Moss-Morris RE, Sharpe M, Wallman K, Wearden A, White PD, Glasziou PP. Exercise therapy for chronic fatigue syndrome (individual patient data) (Protocol). Cochrane Database of Systematic Reviews 2014, Issue 4. Art. No.: CD011040.

http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD011040/abstract

http://www.cochrane.org/CD011040/DEPRESSN_exercise-therapy-for-chronic-fatigue-syndrome-individual-patient-data

 

  1. Wearden AJ, Dowrick C, Chew-Graham C, et al. Fatigue scale. BMJ Rapid Response. 2010.

http://www.bmj.com/rapid-response/2011/11/02/fatigue-scale-0 (accessed Feb 21, 2016).

End.

Cochrane complaints procedure:

http://www.cochranelibrary.com/help/the-cochrane-library-complaints-procedure.html

The lost last year of one of the key two people in getting the Cochrane review of exercise withdrawn

Did the struggle to get the Cochrane review withdrawn kill Robert Courtney? Or the denial of his basic human rights by the medical system?

mind the brain logo

An incomplete  story that urgently needs to be told. We need to get some conversations going.

Did the struggle to get the Cochrane review withdrawn kill Robert Courtney? Or did the denial of his basic human rights by the medical system?

LONDON, Oct 17 (Reuters) – A respected science journal is to withdraw a much-cited review of evidence on an illness known as chronic fatigue syndrome (CFS) amid fierce criticism and pressure from activists and patients.

robert courtney
Robert Courtney from https://www.meaction.net/2018/03/19/a-tribute-to-robert-courtney/

Citizen scientists and patient advocates Tom Kindlon and Robert Courtney played a decisive role in getting the Cochrane review withdrawn.

In the next few days, I will provide the cover letter email sent by Robert Courtney to Senior Cochrane Editor David Tovey that accompanied his last decisive contribution.  Robert is now deceased.

I will also provide links to Tom Kindlon’s contributions that are just as important.

Readers will be able to see from what David Tuller calls their cogent, persuasive and unassailable submissions that the designation of these two as citizen scientists is well-deserved.

Background

Since 2015, I have kept in touch with an advisory group of about a dozen patients with myalgic encephalomyelitis/chronic fatigue syndrome (ME/cfs). I send emails to myself with this group blind copied. The rationale was that any one of them could respond to me and not have the response revealed to anyone else. A number of patients requested that kind of confidentiality, given the divisions within the patient community.

Robert Courtney was a valued, active member of that group, but then he mysteriously disappeared in January 2017. Patients have their own reasons for entering and withdrawing from social engagement. Sometimes they announce taking leave, sometimes not. I’ve learned to respect absences without challenge, but  I sometimes ask around. In the case of Robert, I could learn nothing from the community except he was not well.

Then in February 2018, Robert reemerged with the email message below. I had assumed his recovery would continue and he would participate in telling his story. Obviously there were a lot more details to tell, but he died by suicide a few weeks later.

Long, unbroken periods of being housebound and often bedridden is one of the curses of having  severe ME/cfs. Able-bodied persons need to understand the reluctance of patients to invite them into their homes.  Even able-bodied persons who believe that they have forged strong bonds with patients on social media.

I nonetheless occasionally make such offers to meet, as I travel through Europe.  I’m typically told things like “sorry, I only leave my house for medical appointments and a twice a year holiday with my family.”

We have to learn not to be offended.

Consequently, few  people who were touched by Robert Courtney and his efforts have ever met him. Most know little about him beyond his strong presence in social media.

From MEpedia, a crowd-sourced encyclopedia of ME and CFS science and history:

Robert Courtney (d. March 7, 2018) was a patient advocate for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and an outspoken critic of the PACE trial and the biopsychosocial model of chronic fatigue syndrome. He authored numerous published letters in medical journals regarding the PACE trial and, also, filed freedom of information requests in an attempt to get the authors of the PACE trial to release the full trial data to the public for scrutiny.

The day after I received the email below, Robert Courtney sent off to  David Tovey of the Senior Editor Cochrane his final comments.

The email describes the horrible conditions of his last year and his mistreatment and the denial of basic human rights by the medical system. I think airing his story as a wake up call can become another of his contributions to the struggle for the dignity and rights of the patient community.

An excerpt from the email, repeated below.

It seems that this type of mistreatment is all too typical for ME patients. Since I’ve been out of hospital, many patients have told me that they have similar nutritional difficulties, and that they are too scared to seek medical assistance, and that quite a lot of them have been threatened with detention or indeed have been detained under the mental health act. It is a much worse situation than I ever realised.-Robert “Bob” Courtney

We can never know whether Bob’ determined effort to get the review withdrawn led to his medical collapse. The speculation is not just a mindless invoking of “stress kills.” One of the cardinal, defining symptoms of myalgic encephalomyelitis is post exertion malaise.

We usually think of the “exertion” as being physical, but patients with severe form of the illness learn to anticipate that sustained emotional arousal can, within 48 hours or so, put them in their beds for weeks. That applies to positive emotion, like a birthday party, and certainly to negative emotion. Aside from the stress, frustration, and uncertainty of trying to get bad science out of the literature, Bob and other members of the patient community had to contend with enormous vilification and gaslighting, which  still continues today.

After the anorexia diagnosis, they rediagnosed my ME symptoms as being part of a somatoform disorder, and placed me on an eating disorders unit. .-Robert “Bob” Courtney

On Sat, Feb 17, 2018 at 2:44 PM, Bob <brightonbobbob@yahoo.co.uk> wrote:

Hi James,

I don’t know if you’ll remember me. I am an ME patient who was in regular contact with you in 2016. Unfortunately I had a health crisis in early 2017 and I was hospitalised for most of the year. I had developed severe food intolerances and associated difficulties with eating and nutrition. When I admitted myself to hospital they quickly decided there was nothing medically wrong with me and then diagnosed me with anorexia ( to my shock and bewilderment ), and subsequently detained me under the mental health act. I’m not anorexic. The level of ignorance, mistreatment, neglect, abuse, and miscommunication was staggering. After the anorexia diagnosis, they rediagnosed my ME symptoms as being part of a somatoform disorder, and placed me on an eating disorders unit. Then they force-fed me.  It is a very long and troubling story and I’ll spare you the details. I’d quite like a journalist to write up my story but that will have to wait while I address my ongoing health issues.

Unfortunately, it seems that this type of mistreatment is all too typical for ME patients. Since I’ve been out of hospital, many patients have told me that they have similar nutritional difficulties, and that they are too scared to seek medical assistance, and that quite a lot of them have been threatened with detention or indeed have been detained under the mental health act. It is a much worse situation than I ever realised. It is only by sharing my story that people have approached me and been able to tell me what had happened to them. It is such an embarrassing situation both to have eating difficulties and to be detained. The detention is humiliating and the eating difficulties are also excruciatingly embarrassing. Having difficulties with food makes one feel subhuman. So I have discovered that many patients keep their stories to themselves.

You might remember that in 2016 I submitted four lengthy comments to Cochrane with respect to the exercise therapy for chronic fatigue syndrome review. . Before hospital, I had also written an incomplete draft complaint to follow up my submitted comments, but my health crisis interrupted the process and so I haven’t yet sent it .

I am out of hospital now and have finished editing the complaint and I am about to send it. I am going to blind copy you into the complaint so this email is just to let you know to expect it. I’ll probably send it within the next 24 hours. The complaint isn’t as concise or carefully formatted as it could be because I’m still unwell and I have limited capacity.

Anyway this is just to give you some advance notice. I hope this email finds you in good spirits. I haven’t been keeping up to date with the news and activities, while I’ve been away, but I see there’s been a lot of activity. Thanks so much your ongoing efforts.

Best wishes,

Bob (Robert Courtney)

My replies

James Coyne <jcoynester@gmail.com>

Feb 17, 2018, 2:50 PM

to Bob

Bob, I remember you well as one of the heroes of the patient movement, and a particularly exemplary hero because you so captured my idea or of the citizen scientist gathering the data and the sense of methodology to understand the illness and battle the PACE people. I’m so excited to see your reemergence. I look forward to what you send.

Warmest regards

Jim

James Coyne <jcoynester@gmail.com>

Feb 17, 2018, 3:11 PM

to Bob

Your first goal must be to look after yourself and keep yourself as active and well as possible. You know, the patient conception of pacing. You are an important model and resource for lots of people

But when you are ready, I look forward to your telling your story and how it fits with others.

Warmest of regards

Jim

Lessons we need to learn from a Lancet Psychiatry study of the association between exercise and mental health

The closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

giphyThe closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

Apparently, the editor of Lancet Psychiatry and reviewers did not give the study a close look before it was accepted.

The article was used to raise funds for a startup company in which one of the authors was heavily invested. This was disclosed, but doesn’t let the authors off the hook for promoting a seriously flawed study. Nor should the editor of Lancet Psychiatry or reviewers escape criticism, nor the large number of people on Twitter who thoughtlessly retweeted and “liked” a series of tweets from the last author of the study.

This blog post is intended to raise consciousness about bad science appearing in prestigious journals and to allow citizen scientists to evaluate their own critical thinking skills in terms of their ability to detect misleading and exaggerated claims.

1.Sometimes a disclosure of extensive conflicts of interest alerts us not to pay serious attention to a study. Instead, we should question why the study got published in a prestigious peer-reviewed journal when it had such an obvious risk of bias.

2.We need citizen scientists with critical thinking skills to identify such promotional efforts and alert others in their social network that hype and hokum are being delivered.

3.We need to stand up to authors who use scientific papers for commercial purposes, especially when they troll critics.

Read on and you will see what a skeptical look at the paper and its promotion revealed.

  • The study failed to capitalize on the potential of multiple years of data for developing and evaluating statistical models. Bigger is not necessarily better. Combining multiple years of data was wasteful and served only the purpose of providing the authors bragging rights and the impressive, but meaningless p-values that come from overly large samples.
  • The study relied on an unvalidated and inadequate measure of mental health that confounded recurring stressful environmental conditions in the work or home with mental health problems, even where validated measures of mental health would reveal no effects.
  • The study used an odd measure of history of mental health problems that undoubtedly exaggerated past history.
  • The study confused physical activity with (planned) exercise. Authors amplified their confusion by relying on an exceedingly odd strategy for getting estimate of how much participants exercised: Estimates of time spent in a single activity was used in analyses of total time spent exercising. All other physical activity was ignored.
  • The study made a passing acknowledgment of the problems interpreting simple associations as causal, but then went on to selectively sample the existing literature to make the case that interventions to increase exercise improve mental health.
  • Taken together, a skeptical of assessment of this article provides another demonstration that disclosure of substantial financial conflicts of interests should alert readers to a high likelihood of a hyped, inaccurately reported study.
  • The article was pay walled so that anyone interested in evaluating the authors claims for themselves had to write to the author or have access to the article through a university library site. I am waiting for the authors to reply to my requests for the supplementary tables that are needed to make full sense of their claims. In the meantime, I’ll just complain about authors with significant conflicts of interest heavily promoting studies that they hide behind paid walls.

I welcome you to  examine the author’s thread of tweets. Request the actual article from the author if you want to evaluate independently my claims. This can be great material for a masters or honors class on critical appraisal, whether in psychology or journalism.

title of article

Let me know if you think that I’ve been too hard on this study.

A thread of tweets  from the last author celebrated the success of well orchestrated publicity campaign for a new article concerning exercise and mental health in Lancet Psychiatry.

The thread started:

Our new @TheLancetPsych paper was the biggest ever study of exercise and mental health. it caused quite a stir! here’s my guided tour of the paper, highlighting some of our excitements and apprehensions along the way [thread] 1/n

And ended with pitch for the author’s do-good startup company:

Where do we go from here? Over @spring_health – our mental health startup in New York City – we’re using these findings to develop personalized exercise plans. We want to help every individual feel better—faster, and understand exactly what each patient needs the most.

I wasn’t long into the thread before my skepticism was stimulated. The fourth tweet in the thread had a figure that didn’t get any comments about how bizarre it was.

The tweet

It looks like those differences mattered. for example, people who exercised for about 45 minutes seemed to have better mental health than people who exercised for less than 30, or more than 60 minutes. — a sweet spot for mental health, perhaps?

graphs from paper

Apparently the author does not comment on an anomaly either. Housework appears to be better for mental health than a summary score of all exercise and looks equal to or better than cycling or jogging. But how did housework slip into the category “exercise”?

I begin wondering what the authors meant by “exercise” or if they’d given the definition serious consideration when constructing their key variable from the survey data.

But then that tweet was followed by another one that generated more confusion with a  graph the seemingly contradicted the figures in the last one

the type of exercise people did seems important too! People doing team sports or cycling had much better mental health than other sports. But even just walking or doing household chores was better than nothing!

Then a self-congratulatory tweet for a promotional job well done.

for sure — these findings are exciting, and it has been overwhelming to see the whole world talking openly and optimistically about mental health, and how we can help people feel better. It isn’t all plain sailing though…

The author’s next tweet revealed a serious limitation to the measure of mental health used in the study in a screenshot.

screenshot up tweet with mental health variable

The author acknowledged the potential problem, sort of:

(1b- this might not be the end of the world. In general, most peple have a reasonable understanding of their feelings, and in depressed or anxious patients self-report evaluations are highly correlated with clinician-rated evaluations. But we could be more precise in the future)

“Not the end of the world?” Since when does the author of the paper in the Lancet family of journals so casually brush off a serious methodological issue? A lot of us who have examined the validity of mental health measures would be skeptical of this dismissal  of a potentially fatal limitation.

No validation is provided for this measure. On the face of it, respondents could endorse it on basis of facing  recurring stressful situations that had no consequences for their mental health. This reflects ambiguity of the term stress for both laypersons and scientists. “Stress” could variously refer to an environmental situation, a subjective experience of stress, or an adaptational outcome. Waitstaff could consider Thursday when the chef is off, a recurrent, weekly stress. Persons with diagnosable persistent depressive disorder would presumably endorse more days than not as being a mental health challenge. But they would mean something entirely different.

The author acknowledged that the association between exercise and mental health might be bidirectional in terms of causality

adam on lots of reasons to believe relationship goes both ways.PNG

But then made a strong claim for increased exercise leading to better mental health.

exercise increases mental health.PNG

[Actually, as we will see, the evidence from randomized trials of exercise to improve mental health is modest, and entirely disappears one limits oneself to the quality studies.]

The author then runs off the rail with the claim that the benefits of exercise exceed benefits of having greater than poverty-level income.

why are we so excited.PNG

I could not resist responding.

Stop comparing adjusted correlations obtained under different circumstances as if they demonstrated what would be obtained in RCT. Don’t claim exercising would have more effect than poor people getting more money.

But I didn’t get a reply from the author.

Eventually, the author got around to plugging his startup company.

I didn’t get it. Just how did this heavy promoted study advance the science fo such  “personalized recommendation?

Important things I learned from others’ tweets about the study

I follow @BrendonStubbs on Twitter and you should too. Brendon often makes wise critical observations of studies that most everyone else is uncritically praising. But he also identifies some studies that I otherwise would miss and says very positive things about them.

He started his own thread of tweets about the study on a positive note, but then he identified a couple of critical issues.

First, he took issue with the author’s week claiming to have identified a tipping point, below which exercise is beneficial, and above which exercise could prove detrimental the mental health.

4/some interpretations are troublesome. Most confusing, are the assumptions that higher PA is associated/worsens your MH. Would we say based on cross sect data that those taking most medication/using CBT most were making their MH worse?

A postdoctoral fellow @joefirth7  seconded that concern:

I agree @BrendonStubbs: idea of high PA worsening mental health limited to observation studies. Except in rare cases of athletes overtraining, there’s no exp evidence of ‘tipping point’ effect. Cross-sect assocs of poor MH <–> higher PA likely due to multiple other factors…

Ouch! But then Brendan follows up with concerns that the measure of physical activity has not been adequately validated, noting that such self-report measures prove to be invalid.

5/ one consideration not well discussed, is self report measures of PA are hopeless (particularly in ppl w mental illness). Even those designed for population level monitoring of PA https://journals.humankinetics.com/doi/abs/10.1123/jpah.6.s1.s5 … it is also not clear if this self report PA measure has been validated?

As we will soon see, the measure used in this study is quite flawed in its conceptualization and its odd methodology of requiring participants to estimate the time spent exercising for only one activity, with 70 choices.

Next, Brandon points to a particular problem using self-reported physical activity in persons with mental disorder and gives an apt reference:

6/ related to this, self report measures of PA shown to massively overestimate PA in people with mental ill health/illness – so findings of greater PA linked with mental illness likely bi-product of over-reporting of PA in people with mental illness e.g Validity and Value of Self-reported Physical Activity and Accelerometry in People With Schizophrenia: A Population-Scale Study of the UK Biobank [ https://academic.oup.com/schizophreniabulletin/advance-article/doi/10.1093/schbul/sbx149/4563831 ]

7/ An additional point he makes: anyone working in field of PA will immediately realise there is confusion & misinterpretation about the concepts of exercise & PA in the paper, which is distracting. People have been trying to prevent this happening over 30 years

Again, Brandon provides a spot-on citation clarifying the distinction between physical activity and exercise:, Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research 

The mysterious pseudonymous Zad Chow @dailyzad called attention to a blog post they had just uploaded and let’s take a look at some of the key points.

Lessons from a blog post: Exercise, Mental Health, and Big Data

Zad Chow is quite balanced in dispensing praise and criticism of the Lancet Psychiatry paper. They noted the ambiguity of any causality in cross-sectional correlation and that investigated the literature on their own.

So what does that evidence say? Meta-analyses of randomized trials seem to find that exercise has large and positive treatment effects on mental health outcomes such as depression.

Study Name     # of Randomized Trials             Effects (SMD) + Confidence Intervals

Schuch et al. 2016       25         1.11 (95% CI, 0.79-1.43)

Gordon et al. 2018      33         0.66 (95% CI, 0.48-0.83)

Krogh et al. 2017          35         −0.66 (95% CI, -0.86, -0.46)

But, when you only pool high-quality studies, the effects become tiny.

“Restricting this analysis to the four trials that seemed less affected of bias, the effect vanished into −0.11 SMD (−0.41 to 0.18; p=0.45; GRADE: low quality).” – Krogh et al. 2017

Hmm, would you have guessed this from the Lancet Psychiatry author’s thread of tweets?

Zad Chow showed the hype and untrustworthiness of the press coverage in prestigious media with a sampling of screenshots.

zad chou screenshots of press coverage

I personally checked and don’t see that Zad Chow’s selection of press coverage was skewed. Coverage in the media all seemed to be saying the same thing. I found the distortion to continue with uncritical parroting – a.k.a. churnaling – of the claims of the Lancet Psychiatry authors in the Wall Street Journal. 

The WSJ repeated a number of the author’s claims that I’ve already thrown into question and added a curiosity:

In a secondary analysis, the researchers found that yoga and tai chi—grouped into a category called recreational sports in the original analysis—had a 22.9% reduction in poor mental-health days. (Recreational sports included everything from yoga to golf to horseback riding.)

And the NHS England totally got it wrong:

NHS getting it wrong.PNG

So, we learned that the broad category “recreational sports” covers yoga and tai chi , as well as golf and  horseback riding. This raises serious questions about the lumping and splitting of categories of physical activity in the analyses that are being reported.

I needed to access the article in order to uncover some important things 

I’m grateful for the clues that I got from Twitter, and especially Zad Chow that I used in examining the article itself.

I got hung up on the title proclaiming that the study involved 1·2 million individuals. When I checked the article, I saw that the authors use three waves of publicly available data to get that number. Having that many participants gave them no real advantage except for bragging rights and the likelihood that modest associations could be expressed in expressed in spectacular p-values, like p<2・2 × 10–16. I don’t understand why the authors didn’t conduct analyses with one-way and Qwest validate results in another.

The obligatory Research in Context box made it sound like a systematic search of the literature had been undertaken. Maybe, but the authors were highly selective in what they chose to comment upon, as seen in its contradiction by the brief review of Zad Chow. The authors would have us believe that the existing literature is quite limited and inconclusive, supporting the need for like their study.

research in context

Caveat Lector, a strong confirmation bias is likely ahead in this article.

Questions accumulated quickly as to the appropriateness of the items available from a national survey undoubtedly constructed with other purposes. Certainly these items would not have been selected if the original investigators were interested in the research question at the center of this article.

Participants self-reported a previous diagnosis of depression or depressive episode on the basis of the following question: “Has a doctor, nurse, or other health professional EVER told you that you have a depressive disorder, including depression, major depression, dysthymia, or minor depression?”

Our own work has cast serious doubt on the correspondence of reports of a history of depression in response to a brief question embedded in a larger survey with results of a structured interview in which respondents’ answers can be probed. We found that answers to such questions were more related to current distress, then to actual past diagnoses and treatment of depression. However, the survey question used in the Lancet Psychiatry study added the further ambiguity and invalidity with the added  “or minor depression.” I am not sure under what circumstances a health care professional would disclose a diagnosis of “minor depression” to a patient, but I doubt it would be in context in which the professional felt treatment was needed.

Despite the skepticism that I was developing about the usefulness of the survey data, I was unprepared for the assessment of “exercise.”

Other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?” Participants who answered yes to this question were then asked: “What type of physical activity or exercise did you spend the most time doing during the past month?” A total of 75 types of exercise were represented in the sample, which were grouped manually into eight exercise categories to balance a diverse representation of exercises with the need for meaningful cell sizes (appendix).

Participants indicated the number of times per week or month that they did this exercise and the number of minutes or hours that they usually spend exercising in this way each time.

I had already been tipped off by the discussion on twitter that there would be a thorough confusion of planned exercise and mere physical activity. But now that was compounded. Why was physical activity during employment excluded? What if participants were engaged in a number of different physical activities,  like both jogging and bicycling? If so, the survey obtained data for only one of these activities, with the other excluded, and the choice could’ve been quite arbitrary as to which one the participant identified as the one to be counted.

Anyone who has ever constructed surveys would be alert to the problems posed by participants’ awareness that saying “yes” to exercising would require contemplating  75 different options, arbitrarily choosing one of them for a further question how much time the participant engaged in this activity. Unless participants were strongly motivated, then there was an incentive to simply say no, they didn’t exercise.

I suppose I could go on, but it was my judgment that any validity what the authors were claiming  had been ruled out. Like someone once said on NIH grant review panel, there are no vital signs left, let’s move on to the next item.

But let’s refocus just a bit on the overall intention of these authors. They want to use a large data set to make statements about the association between physical activity and a measure of mental health. They have used matching and statistical controls to equate participants. But that strategy effectively eliminates consideration of crucial contextual variables. Persons’ preferences and opportunities to exercise are powerfully shaped by their personal and social circumstances, including finances and competing demands on their time. Said differently, people are embedded in contexts in which a lot of statistical maneuvering has sought to eliminate.

To suggest a small number of the many complexities: how much physical activity participants get  in their  employment may be an important determinant of choices for additional activity, as well as how much time is left outside of work. If work typically involves a lot of physical exertion, people may simply be left too tired for additional planned physical activity, a.k.a. exercise, and the physical health may require it less. Environments differ greatly in terms of the opportunities and the safety of engaging in various kinds of physical activities. Team sports require other people being available. Etc., etc.

What I learned from the editorial accompanying the Lancet Psychiatry article

The brief editorial accompanying the article aroused my curiosity as to whether someone assigned to reading and commenting on this article would catch things that apparently the editor and reviewer missed.

Editorial commentators are chosen to praise, not to bury articles. There are strong social pressures to say nice things. However, this editorial leaked a number of serious concerns.

First

In presenting mental health as a workable, unified concept, there is a presupposition that it is possible and appropriate to combine all the various mental disorders as a single entity in pursuing this research. It is difficult to see the justification for this approach when these conditions differ greatly in their underlying causes, clinical presentation, and treatment. Dementia, substance misuse, and personality disorder, for example, are considered as distinct entities for research and clinical purposes; capturing them for study under the combined banner of mental health might not add a great deal to our understanding.

The problem here of categorisation is somewhat compounded by the repeated uncomfortable interchangeability between mental health and depression, as if these concepts were functionally equivalent, or as if other mental disorders were somewhat peripheral.

Then:

A final caution pertains to how studies approach a definition of exercise. In the current study, we see the inclusion of activities such as childcare, housework, lawn-mowing, carpentry, fishing, and yoga as forms of exercise. In other studies, these activities would be excluded for not fulfilling the definition of exercise as offered by the American College of Sports Medicine: “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.” 11 The study by Chekroud and colleagues, in its all-encompassing approach, might more accurately be considered a study in physical activity rather than exercise.

The authors were listening for a theme song with which they could promote their startup company in a very noisy data set. They thought they had a hit. I think they had noise.

The authors’ extraordinary disclosure of interests (see below this blog post) should have precluded publication of this serious flawed piece of work, either simply for reason of high likelihood of bias or because it promoted the editor and reviewers to look more carefully at the serious flaws hiding in plain sight.

Postscript: Send in the trolls.

On Twitter, Adam Chekroud announced he felt no need to respond to critics. Instead, he retweeted and “liked” trolling comments directed at critics from the twitter accounts of his brother, his mother, and even the official Twitter account of a local fried chicken joint @chickenlodge, that offered free food for retweets and suggested including Adam Chekroud’s twitter handle if you wanted to be noticed.

chicken lodge

Really, Adam, if you can’t stand the heat, don’t go near  where they are frying chicken.

The Declaration of Interests from the article.

declaration of interest 1

declaration of interest 2

 

Headspace mindfulness training app no better than a fake mindfulness procedure for improving critical thinking, open-mindedness, and well-being.

The Headspace app increased users’ critical thinking and being open-minded. So did practicing a sham mindfulness procedure- participants simply sat with their eyes closed, but thought they were meditating.

mind the brain logo

The Headspace app increased users’ critical thinking and open-mindedness. So did practicing a sham mindfulness procedure. Participants simply sat with their eyes closed, but thought they were meditating.

cat_ dreamstime_164683 (300 x 225)Results call into question claims about Headspace  coming from other studies that did not have such a credible, active control group comparison.

Results also call into question the widespread use of standardized self-report measures of mindfulness to establish whether someone is in the state of mindfulness. These measures don’t distinguish between the practice of standard versus fake mindfulness.

Results can be seen as further evidence that practicing mindfulness depends on nonspecific factors (AKA placebo), rather than any active, distinctive ingredient.

Hopefully this study will prompt better studies evaluating the Headspace App, as well as evaluations of mindfulness training more generally, using credible active treatments, rather than no treatment or waitlist controls.

Maybe it is time for a moratorium on trials of mindfulness without such an active control or at least a tempering of claims based on poorly controlled  trials.

This study points to the need for development of more psychometrically sophisticated measures of mindfulness that are not so vulnerable to experiment expectations and demand characteristics.

Until the accumulation of better studies with better measures, claims about the effects of practicing mindfulness ought to be recognized as based on relatively weak evidence.

The study

Noone, C & Hogan,M. Randomised active-controlled trial of effects of online mindfulness intervention on executive control, critical thinking and key thinking dispositionsBMC Psychology, 2018

Trial registration

The study was initially registered in the AEA Social Science Registry before the recruitment was initiated (RCT ID: AEARCTR-0000756; 14/11/2015) and retrospectively registered in the ISRCTN registry (RCT ID: ISRCTN16588423) in line with requirements for publishing the study protocol.

Excerpts from the Abstract

The aim of this study was…investigating the effects of an online mindfulness intervention on executive function, critical thinking skills, and associated thinking dispositions.

Method

Participants recruited from a university were randomly allocated, following screening, to either a mindfulness meditation group or a sham meditation group. Both the researchers and the participants were blind to group allocation. The intervention content for both groups was delivered through the Headspace online application, an application which provides guided meditations to users.

And

Primary outcome measures assessed mindfulness, executive functioning, critical thinking, actively open-minded thinking, and need for cognition. Secondary outcome measures assessed wellbeing, positive and negative affect, and real-world outcomes.

Results

Significant increases in mindfulness dispositions and critical thinking scores were observed in both the mindfulness meditation and sham meditation groups. However, no significant effects of group allocation were observed for either primary or secondary measures. Furthermore, mediation analyses testing the indirect effect of group allocation through executive functioning performance did not reveal a significant result and moderation analyses showed that the effect of the intervention did not depend on baseline levels of the key thinking dispositions, actively open-minded thinking, and need for cognition.

The authors conclude

While further research is warranted, claims regarding the benefits of mindfulness practice for critical thinking should be tempered in the meantime.

Headscape Be used on an iPhone

The active control condition

The sham treatment control condition was embarrassingly straightforward and simple. But as we will see, participants found it credible.

This condition presented the participants with guided breathing exercises. Each session began by inviting the participants to sit with their eyes closed. These exercises were referred to as meditation but participants were not given guidance on how to control their awareness of their body or breath. This approach was designed to control for the effects of expectations surrounding mindfulness and physiological relaxation to ensure that the effect size could be attributed to mindfulness practice specifically. This content was also delivered by Andy Puddicombe and was developed based on previous work by Zeidan and colleagues [55, 57, 58].

What can we conclude about the standard self-report measures of the state of mindfulness?

The study used the Five Facet Mindfulness Questionnaire, which is widely used to assess whether people are in a state of mindfulness. It has been cited almost 4000 times.

Participants assigned to the mindfulness condition had significant changes for all five facets from baseline to follow up: observing, non-reactivity, non-judgment, acting with awareness, and describing. In the absence of a comparison with change in the sham mindfulness group, these pre-post results would seem to suggest that the measure was sensitive to whether participants had practiced mindfulness. However, there were no differences from the changes observed for the participants assigned to mindfulness and those which were simply asked to sit with their eyes closed.

I asked Chris Noone about the questionnaires his group used to assess mindfulness:

The participants genuinely thought they were meditating in the sham condition so I think both non-specific and demand characteristics were roughly equivalent across both groups. I’m also skeptical regarding the ability of the Five-Facet Mindfulness Questionnaire (or any mindfulness questionnaire for that matter) to capture anything other than “perceived mindfulness”. The items used in these questionnaires feature similar content to the scripts used by the people delivering the mindfulness (and sham) guided meditations. The improvement in critical thinking across both groups is just a mix of learning across a semester and habituation to the task (as the same problems were posed at both measurements).

What I like about this trial

The trial provides a critical test of a key claim for mindfulness:

Mindfulness should facilitate critical thinking in higher-education, based on early Buddhist conceptualizations of mindfulness as clarity of thought.

The trial was registered before recruitment and departures from protocol were noted.

Sample size was determined by power analysis.

The study had a closely matched, active control condition, a sham mindfulness treatment.

The credibility and equivalence of this sham condition versus the active treatment under study was repeatedly assessed.

“Manipulation checks were carried out to assess intervention acceptability, technology acceptance and meditation quality 2 weeks after baseline and 4 weeks after baseline.”

The study tested some a priori hypotheses about mediators and moderation:

Analyses were intention to treat.

 How the study conflicts with past studies

Previous studies claimed to show positive effects of mindfulness on aspects of executive functioning [25 and  26]

How the contradiction of past studies by these results is resolved

 “There are many studies using guided meditations similar to those in our mindfulness meditation condition, delivered through smartphone applications [49, 50, 52, 90, 91], websites [92, 93, 94, 95, 96, 97] and CDs [98, 99], which show effects on measures of outcomes reliably associated with increases in mindfulness such as depression, anxiety, stress, wellbeing and compassion. There are two things to note about these studies – they tend not to include a measure of dispositional mindfulness (e.g. only 4% of all mindfulness intervention studies reviewed in a recent meta-analysis include such measures at baseline and follow-up; [54]) and they usually employ a weak form of control group such as a no-treatment control or waitlist control [54]. Therefore, even when change in mindfulness is assessed in mindfulness meditation intervention studies, it is usually overestimated and this must be borne in mind when comparing the results of this study with those of previous studies. This combined with generally only moderate correlations with behavioural outcomes [54] suggests that when mindfulness interventions are effective, dispositional measures do not fully capture what has changed.”

The broader take away messages

“Our results show that, for most outcomes, there were significant changes from baseline to follow-up but none which can be specifically attributed to the practice of mindfulness.’

This creative use of a sham mindfulness control condition is a breakthrough that should be widely followed. First, it allowed a fair test of whether mindfulness is any better than another active, credible treatment. Second, because the active treatment was a sham, results provide a challenge to the notion that apparent effects of mindfulness on critical thinking are anything more than a placebo effect.

The Headspace App is enormously popular and successful, based on claims about what benefits its use will provide. Some of these claims may need to be tempered, not only in terms of critical thinking, but effects on well-being.

The Headspace App platform lends itself to such critical evaluations with respect to a sham treatment with a degree of standardization that is not readily possible with face-to-face mindfulness training. This opportunity should be exploited further with other active control groups constructed on the basis of specific hypotheses.

There is far too much research on the practice of mindfulness being done that does not advance understanding of what works or how it works. We need a lot fewer studies, and more with adequate control/comparison groups.

Perhaps we should have a moratorium on evaluations of mindfulness without adequate control groups.

Perhaps articles being aimed at audiences making enthusiastic claims for the benefits of mindfulness should routinely note whether these claims are based on adequately controlled studies. Most are not.

Creating TED talks from peer-reviewed growth mindset research papers with colored brain pictures

The TED talk fallacy – When you confuse what presenters say about a peer-reviewed article – the breathtaking, ‘breakthrough’ strength of findings demanded for a TED talk – with what a transparent, straightforward analysis and reporting of relevant findings would reveal. 

mind the brain logo

The TED talk fallacy – When you confuse what presenters say about a peer-reviewed article – the breathtaking, ‘breakthrough’ strength of findings demanded for a TED talk – with what a transparent, straightforward analysis and reporting of relevant findings would reveal. 

 fixed vs growth mind setA reminder that consumers, policymakers, and other stakeholders should not rely on TED talks for their views of what constitutes solid “science’ or “best evidence,” even when presenters are established scientists.

The authors of this modest, but overhyped paper do not give TED talks. But this article became the basis for a number of TED and TED-related talks by a psychologist who integrated a story of its findings with stories about her own publications. She has a booking agent for expensive talks and a line of self-help products. This raises the question:  Should such information routinely be a reported conflict of interests in in publications?  

We will contrast the message of  the paper under discussion in this post, along with the TED talk with a new pair of comprehensive meta analyses. The meta analyses show that growth mindset and academic achievement are weak and interventions to improve mindset are ineffectual.

The study

 Moser JS, Schroder HS, Heeter C, Moran TP, Lee YH. Mind your errors: Evidence for a neural mechanism linking growth mind-set to adaptive posterror adjustments. Psychological Science. 2011 Dec;22(12):1484-9.

 Key issues with the study.

The abstract is uninformative as a guide to what was done and what was found in this study. It ends with a rousing promotion of growth mind set as a way of understanding and improving academic achievement.

A study with N = 25 is grossly underpowered for most purposes and should not be used to generate estimates of associations.

Key details of methods and results needed for independent evaluation are not available in article.

The colored brain graphics in the article were labeled “for illustrative purposes only.”

Where would you find such images of the brain not tied to the data in a credible neuroscience journal?  Articles in real such journals are increasingly retracted because of the discovery of suspected pasted-in or altered brain graphics.

The discussion has a strong confirmation bias, ignoring relevant literature and overselling the use of event-related potentials for monitoring and evaluating the determinants of academic achievement.

The press release issued by Association for Psychological Science.

How Your Brain Reacts To Mistakes Depends On Your Mindset

Concludes:

The research shows that these people are different on a fundamental level, Moser says. “This might help us understand why exactly the two types of individuals show different behaviors after mistakes.” People who think they can learn from their mistakes have brains that are tuned to pay more attention to mistakes, he says. This research could help in training people to believe that they can work harder and learn more, by showing how their brain is reacting to mistakes.

The abstract.

The abstract does not report basic details of methods and results, except what is consistent with the authors’ intended message. The crucial final sentence is quote worthy and headed for clickbait. When we look at what was done and what was found in this study, this conclusion is grossly overstated.

How well people bounce back from mistakes depends on their beliefs about learning and intelligence. For individuals with a growth mind-set, who believe intelligence develops through effort, mistakes are seen as opportunities to learn and improve. For individuals with a fixed mind-set, who believe intelligence is a stable characteristic, mistakes indicate lack of ability. We examined performance-monitoring event-related potentials (ERPs) to probe the neural mechanisms underlying these different reactions to mistakes. Findings revealed that a growth mind-set was associated with enhancement of the error positivity component (Pe), which reflects awareness of and allocation of attention to mistakes. More growth-minded individuals also showed superior accuracy after mistakes compared with individuals endorsing a more fixed mind-set. It is critical to note that Pe amplitude mediated the relationship between mind-set and posterror accuracy. These results suggest that neural mechanisms indexing on-line awareness of and attention to mistakes are intimately involved in growth-minded individuals’ ability to rebound from mistakes.

The introduction.

The introduction opens with:

Decades of research by Dweck and her colleagues indicate that academic and occupational success depend not only on cognitive ability, but also on beliefs about learning and intelligence (e.g., Dweck, 2006).

This sentence echoes the Amazon blurb for the pop psychology book  that is being cited:

After decades of research, world-renowned Stanford University psychologist Carol S. Dweck, Ph.D., discovered a simple but groundbreaking idea: the power of mindset. In this brilliant book, she shows how success in school, work, sports, the arts, and almost every area of human endeavor can be dramatically influenced by how we think about our talents and abilities.

Nowhere in the introduction are there balancing references to studies investigating Carol Dweck’s theory independently, from outside her group, nor any citing of any inconsistent findings. This is a selective, strongly confirmation-driven review of the relevant literature. (Contrast this view with an independent assessment from a recent comprehensive meta analysis at the end of the this post).

The method.

Twenty-five native-English-speaking undergraduates (20 female, 5 male; mean age = 20.25 years) participated for course credit.

There is no discussion of why a sample of only 25 participants was chosen or any mention of a power analysis.

If we stick to simple bivariate correlations with the full sample of N= 25:

R = .40 p <.05  (p= 0.0475)

R=  .51  p <.01 (p = 0.0092)

N = 25 does not allow reliable detection of a small to moderate sized,  statistically significant relationship where one exists.

Any significant findings will of necessity be large, r >.40 for p<.05 and  r> .51 for p<.01.

As been noted elsewhere:

In systematic studies of psychological and biomedical effect sizes (e.g., Meyer et al., 2001)  one rarely encounters correlations greater than .4.

How growth mindset scores were calculated is crucially important, but the information that is presented about the measure is inadequate. There is no reference to an established scale with psychometric data and cross validation. Rather:

Following the flanker [a noise letter version of the Eriksen flanker task (Eriksen & Eriksen,  1974)  task, participants completed a TOI scale that asked respondents to rate the extent to which they agreed with four fixed-mind-set statements on a 6-point Likert-type scale (1 = strongly disagree, 6 = strongly agree). These statements (e.g., “You have a certain amount of intelligence and you really cannot do much to change it”) were drawn from previous studies measuring TOI (e.g., Hong, Chiu, Dweck, Lin, & Wan, 1999). TOI items were reverse-scored so that higher scores indicated more endorsement of a growth mind-set, and lower scores indicated more of a fixed mind-set,

Details in the referenced Hong et al (1999) study are difficult to follow, but the paper lays out the following requirement:

Those participants who believe that intelligence is fixed (entity theorists) should consistently endorse responses at the lower (agree) end of the scale (yielding a mean score of 3.0 or lower), whereas participants who believe that intelligence is malleable (incremental theorists) should consistently endorse responses at the upper (disagree) end of the scale (yielding a mean score of 4.0 or above).

If this distribution occurred naturally, it would be an extraordinary set of questions. In the Hong et al (1999) study, this distribution was achieved by throwing away data in the middle of the distribution that didn’t fit the investigators’ preconceived notion.

Excluding the middle third of a distribution of scores with only N = 25 compounds the errors associated with the practice with a larger sample. With the small number of scores now reduced to N= 17, the influence of single outlier participant would be increased. Any generalization to the larger population would be even more problematic.  We cannot readily evaluate whether scores in the present sample were neatly and naturally bimodal. We are not provided the basic data, not even the means and standard deviations in text or table. However, as we will see, one graphic representation leaves some doubts.

Overview of data analyses.

Repeated measures analyses of variance (ANOVAs) were first conducted on behavioral and ERP measures without regard to individual differences in TOIs in order to establish baseline experimental effects. ANOVAs conducted on behavioral measures and the ERN included one 2-level factor: accuracy (error vs. correct response). The Pe [error positivity component ]was analyzed using a 2 (accuracy: error vs. correct response) × 2 (time window: 150–350 ms vs. 350–550 ms) ANOVA. Subsequently, TOI scores were entered into ANOVAs as covariates to assess the main and interactive effects of mind-set on behavioral and ERP measures. When significant effects of TOI score were detected, we conducted follow-up correlational analyses to aid in the interpretation of results.

Thus, multiple post hoc analyses examine the effects of the growth mindset (TOI), based on whether significant main and interaction effects were obtained in other analyses, which in turn, were followed up with correlational analyses.

Highlights of the results.

 Only a few of numerous analyses produced significant results for TOI. Given the sample size and multiple tests without correction, we probably should not attach substantive interpretations to them.

Behavioral data.

Overall accuracy was not correlated with TOI (r = .06, p > .79).

[Speed on error vs correct trials]  trials] When TOI was entered into the ANOVA as a covariate, there were no significant effects (Fs < 1.78, ps > .19, ηp 2s < .08) [where ‘ps’ and ‘no significant effects’ refer to either a main or interaction effects].

[Posterror adjustments] When TOI was entered into the ANOVA as a covariate, there were no significant effects (Fs <1.15, ps > .29, ηp 2 s  < .05).

When entered into the ANOVA as a covariate, however, TOI scores interacted with postresponse accuracy, F(1, 23) = 5.22, p < .05, ηp2= .19. Correlational analysis showed that as TOI scores increased, indicating a growth mind-set, so did accuracy on trials immediately following errors relative to accuracy on trials immediately following correct responses (i.e., posterror accuracy – postcorrect-response accuracy; r = .43, p < .05).

ERPs (event-related potentials).

As expected, the ANOVA confirmed greater ERP negativity on error trials (M = –3.43 μV, SD = 4.76 μV) relative to correct trials (M = –0.23 μV, SD = 4.20 μV), F(1, 24) = 24.05, p < .001, ηp2 = .50, in the 0- to 100-ms postresponse time window. This result is consistent with the presence of an ERN. There were no significant effects involving TOI (Fs < 1.24, ps > .27, ηp2s < .06).

When entered as a covariate, TOI showed a significant interaction with accuracy, F(1, 23) = 8.64, p < .01, ηp2 = .27. Correlational analysis demonstrated that as TOI scores increased so did positivity on error trials relative to correct trials averaged across both time windows (i.e., error activity – correct-response activity; r = .52,1 p < .01)

Mediation analysis.

As Figure 2 illustrates, controlling for Pe amplitude significantly attenuated the relationship between TOI scores and posterror accuracy. The 95% confidence intervals derived from the bootstrapping test did not include zero (.01–.04), and thus indicated significant mediation.

So, a priori conditions for testing for a significant mediation was met because a statistical test barely excluded zero (.01–.04, with no correction for the many tests of TOI in the study. But what are we doing exploring mediation with N = 25?

Distribution of TOI [growth mindset] scores.

Let’s look at the distribution of TOI scores in a graph available as the x-axis in Figure 1.

graph with outlier

Any dichotomization of these continuous scores would be arbitrary. Close scores clustered around different sides of the median would  be considered  different, but  diverging  scores on the same side of the median  would be treated as the same.  Any association between TOI and ERPs (event-related potentials) could be due to one or a few interindividual differences in brains or intraindividual variability of ERP over occasions. These are not the kind of data from which generalizable estimates of effects can be obtained.

The depiction of brains with fixed versus growth mind sets.

The one picture of brains in the main body of this article supposedly contrasts fixed versus growth mindsets. The differences appear dramatic, in sharply contrasting colors. But in the article itself, no such dichotomization is discussed. Nor should it be. Furthermore, the simulation is based on an isolation of one of the few significant effects of TOI. Readers are cautioned that the picture is “for illustrative purposes only.”

fixed vs growth mind set

The discussion.

Similar to the introduction, there is a selective citation of the literature with a strong confirmation bias. There is no reference to weak or null findings or any controversy concerning growth mindset that might have accumulated over a decade of research. There is no acknowledgment of the folly of making substantive interpretations of significant findings from such a small, underpowered study. Results of the mediation analysis are confidently presented, with no indication of doubts whether they should even have been conducted. Or that, even under the best of circumstances, such mediational analyses remain correlational  and provide only weak evidence of causal mechanisms. Event-related evoked potentials are proposed as biomarkers and as surrogate outcomes in implementations of growth mindset interventions. A lot of misunderstanding and neurononsense are crammed into a few sentences. There is no mention of any limitations to the study.

The APS Observer press release revisited.

Why was this article recognized with a special press release by the APS? The press release is much more tied to the author’s claims about their study, rather than to their actual methods and results. The press release provides an opportunity to publicize the study with further exaggeration of what it accomplished.

This is an unfortunate message to authors about what they need to do to be promoted by APS. Your intended message can override your actual results if you strategically emphasize the message and downplay any discrepancy with the results. Don’t mention any limitations of your study.

The TED talks.

A number of TED and TED-related talks incorporate a discussion of the study, with its picture of fixed versus growth mindset brains. There is remarkable overlap among these talks. I have chosen TEDxNorrkoping The power of believing that you can improve  because it had a handy transcript available.

 same screenshot in TED talk1

On the left, you see the fixed-mindset students. There’s hardly any activity. They run from the error. They don’t engage with it. But on the right, you have the students with the growth mindset, the idea that abilities can be developed. They engage deeply. Their brain is on fire with yet. They engage deeply. They process the error. They learn from it and they correct it.

“On fire”? The presented exploits the arbitrary red color chosen for the for-illustrative-purposes-only picture.

The brain graphic is reduced to a cartoon in a comic book level account of action heroes engaging their errors deeply, learning from them, and correcting their next response when ordinary mortals are running, like cowards.

The presenter soon introduces another cartoon for her comic book depiction of the effects of growth mindset on the brain. But first, here is an overview of how this talk fits the predictable structure of a TED talk.

The TED talk begins with a personal testimony concerning  “a critical event early in my career, a real turning point.” It is recognizable to TED talk devotees as an epiphany (an “epiphimony” if you like ) through which the speaker shares a personal journey of insight and realisation, its triumphs and tribulations. In telling the story, the presenter introduces an epic struggle between the children of the darkness (the “now” of a fixed mindset) versus children of the light (the “yet” or “not yet” of a growth mindset).

There is much more of a sense of a televangelist than academic presenting an accurate summary of her research to a lay audience. Sure, the live audience and the millions of viewers of this and related talks were not seeking a colloquium or even a Cafe Scientifique. The audience came to be entertained with a good story. But how much license can be taken with the background science? After all, the information being discussed is relevant to their personal decisions as parents and as citizens and communities making important choices about how to improve academic performance. The issue becomes more serious when the presenter gets to claims of dramatic transformations of impoverished students in economically deprived school settings.

The presenter cites one of her studies for an account of what students “gripped with the tyranny of now” did in difficult learning experiences:

So what do they do next? I’ll tell you what they do next. In one study, they told us they would probably cheat the next time instead of studying more if they failed a test. In another study, after a failure, they looked for someone who did worse than they did so they could feel really good about themselves.

cheat vs study

We are encouraged to think ‘Students with a fixed mind set cheat instead of studying more. How horrible!’ But I looked up the study:

Blackwell LS, Trzesniewski KH, Dweck CS. Implicit Theories of Intelligence Predict Achievement Across an Adolescent Transition: A Longitudinal Study and an InterventionChild Development. 2007 Jan 1;78(1):246-63.

I searched for “cheat” and found one mention:

Students rated how likely they would be to engage in positive, effort-based strategies (e.g., ‘‘I would work harder in this class from now on’’ ‘‘I would spend more time studying for tests’’) or negative, effort-avoidant strategies (e.g., ‘‘I would try not to take this subject ever again’’ ‘‘I would spend less time on this subject from now on’’ ‘‘I would try to cheat on the next test’’). Positive and negative items were combined to form a mean Positive Strategies score.

All subsequent reporting of results was in terms of this composite Positive Strategies. So, I was unable to evaluate how common endorsement occurred of “I would try to cheat…”

Three minutes into the talk, the speaker introduces an element of moral panic about a threat to Western civilization as we know it:

How are we raising our children? Are we raising them for now instead of yet? Are we raising kids who are obsessed with getting As? Are we raising kids who don’t know how to dream big dreams? Their biggest goal is getting the next A, or the next test score? And are they carrying this need for constant validation with them into their future lives? Maybe, because employers are coming to me and saying, “We have already raised a generation of young workers who can’t get through the day without an award.”

Less than a minute later, the presenter gets ready to roll out her solution.

So what can we do? How can we build that bridge to yet?

Praising performance in terms of fixed characteristics like IQ or ability is ridiculed. However, great promises are made for praising process, regardless of outcome.

Here are some things we can do. First of all, we can praise wisely, not praising intelligence or talent. That has failed. Don’t do that anymore. But praising the process that kids engage in, their effort, their strategies, their focus, their perseverance, their improvement. This process praise creates kids who are hardy and resilient.

“Yet” or “not yet” becomes a magical incantation.  The presenter builds on her comic book science of the effects of growth mindset, by introducing by cartoon of a synapse (mislabeled as a neuron),  linked to her own research only by some wild speculation.

build stronger connections synapse

Just the words “yet” or “not yet,” we’re finding, give kids greater confidence, give them a path into the future that creates greater persistence. And we can actually change students’ mindsets. In one study, we taught them that every time they push out of their comfort zone to learn something new and difficult, the neurons in their brain can form new, stronger connections, and over time, they can get smarter.

I found no relevant measurements of brain activity in Dweck’s studies, but let’s not ruin a good story.

Look what happened: In this study, students who were not taught this growth mindset continued to show declining grades over this difficult school transition, but those who were taught this lesson showed a sharp rebound in their grades. We have shown this now, this kind of improvement, with thousands and thousands of kids, especially struggling students.

Up until now, we have disappointingly hyped and inaccurate accounts of how to foster academic achievement. But soon turns into a cruel hoax when claims are made about improving the performance of under privileged children in under resource settings.

So let’s talk about equality. In our country, there are groups of students who chronically underperform, for example, children in inner cities, or children on Native American reservations. And they’ve done so poorly for so long that many people think it’s inevitable. But when educators create growth mindset classrooms steeped in yet, equality happens. And here are just a few examples. In one year, a kindergarten class in Harlem, New York scored in the 95th percentile on the national achievement test. Many of those kids could not hold a pencil when they arrived at school. In one year, fourth-grade students in the South Bronx, way behind, became the number one fourth-grade class in the state of New York on the state math test. In a year, to a year and a half, Native American students in a school on a reservation went from the bottom of their district to the top, and that district included affluent sections of Seattle. So the Native kids outdid the Microsoft kids.

This happened because the meaning of effort and difficulty were transformed. Before, effort and difficulty made them feel dumb, made them feel like giving up, but now, effort and difficulty, that’s when their neurons are making new connections, stronger connections. That’s when they’re getting smarter.

So the Native kids outdid the Microsoft kids.” There is some kind of poetic license being taken here in describing the results of an intervention. The message is that subjective mindset can trump entrenched structural inequalities and accumulated deficits in skills and knowledge, as well as limits on ability. All school staff and parents need to do is wave the magic wand and recite the incantation “Not yet.” How reassuring to those in politics who control resources who don’t want to adequately fund the school settings. They just need to exhort anyone who wants to improve outcomes to recite the magic.

And what do we say when we don’t witness dramatic improvements? Who is to blame when such failures need to be explained. . The cruel irony is that school boards will blame principals, who blame teachers, and parents will blame schools and their children. All will be held to unrealistic expectations.

But it gets worse. The presenter ends with a call to action arguing that that not buying into her program would violate the human rights of vulnerable children.

Let’s not waste any more lives, because once we know that abilities are capable of such growth, it becomes a basic human right for children, all children, to live in places that create that growth, to live in places filled with “yet”.

Paradox: Do poor kids with a growth mindset suffer negative consequences?

Maybe so, suggests some recent research concerning the longer term outcomes of disadvantaged African American children.

A newly published study in the peer-reviewed journal Child Development …finds traditionally marginalized youth who grew up believing in the American ideal that hard work and perseverance naturally lead to success show a decline in self-esteem and an increase in risky behaviors during their middle-school years. The research is considered the first evidence linking preteens’ emotional and behavioral outcomes to their belief in meritocracy, the widely held assertion that individual merit is always rewarded.

“If you’re in an advantaged position in society, believing the system is fair and that everyone could just get ahead if they just tried hard enough doesn’t create any conflict for you … [you] can feel good about how [you] made it,” said Erin Godfrey, the study’s lead author and an assistant professor of applied psychology at New York University’s Steinhardt School. But for those marginalized by the system—economically, racially, and ethnically—believing the system is fair puts them in conflict with themselves and can have negative consequences.

We know surprisingly little about the adverse events associated with growth mindset interventions or their negative unintended consequences for children and school systems. Cost/benefit analyses of mindset interventions should be done with respect to academic interventions known to be effective when conducted with the equivalent resources, not no treatment.

Overall associations of growth mind set with academic achievement are weak and interventions are not effective.

Sisk VF, Burgoyne AP, Sun J, Butler JL, Macnamara BN. To What Extent and Under Which Circumstances Are Growth Mind-Sets Important to Academic Achievement? Two Meta-Analyses. Psychological Science. 2018 Mar 1:0956797617739704.

This newly published article published in Psychological Science started by noting  the influence of growth mind set.

These ideas have led to the establishment of nonprofit organizations (e.g., Project for Education Research that Scales [PERTS]), for-profit entities (e.g., Mindset Works, Inc.), schools purchasing mind-set intervention programs (e.g., Brainology), and millions of dollars in funding to individual researchers, nonprofit organizations, and for-profit companies (e.g., Bill and Melinda Gates Foundation,1 Department of Education,2 Institute of Educational Sciences3).

In our first meta-analysis (k = 273, N = 365,915), we examined the strength of the relationship between mind-set and academic achievement and potential moderating factors. In our second meta-analysis (k = 43, N = 57,155), we examined the effectiveness of mind-set interventions on academic achievement and potential moderating factors. Overall effects were weak for both meta-analyses.

The first meta analysis integrated 273 effect sizes. The overall effect was very weak, by conventional standards, hardly consistent with the TED talks.

The meta-analytic average correlation (i.e., the average of various population effects) between growth mind-set and academic achievement is r⎯⎯ = .10, 95% confidence interval (CI) = [.08, .13], p < .001.

The data set of effects of growth mindset interventions integrated 43 effect sizes and 37 of the 43 effect sizes (86%) are not significantly different from zero.

The authors conclude:

Some researchers have claimed that mind-set interventions can “lead to large gains in student achievement” and have “striking effects on educational achievement” (Yeager & Walton, 2011, pp. 267 and 268, respectively). Overall, our results do not support these claims. Mind-set interventions on academic achievement were nonsignificant for adolescents, typical students, and students facing situational challenges (transitioning to a new school, experiencing stereotype threat). However, our results support claims that academically high-risk students and economically disadvantaged students may benefit from growth-mind-set interventions (see Paunesku et al., 2015; Raizada & Kishiyama, 2010), although these results should be interpreted with caution because (a) few effect sizes contributed to these results, (b) high-risk students did not differ significantly from non-high-risk students, and (c) relatively small sample sizes contributed to the low-SES group.

Part of the reshaping effort has been to make funding mind-set research a “national education priority” (Rattan et al., 2015, p. 723) because mind-sets have “profound effects” on school achievement (Dweck, 2008, para. 2). Our meta-analyses do not support this claim.

And

From a practical perspective, resources might be better allocated elsewhere than mind-set interventions. Across a range of treatment types, Hattie, Biggs, and Purdie (1996) [https://www.teachertoolkit.co.uk/wp-content/uploads/2014/04/effect-of-learning-skills.pdf ] found that the meta-analytic average effect size for a typical educational intervention on academic performance is 0.57. All meta-analytic effects of mind-set interventions on academic performance were < 0.35, and most were null. The evidence suggests that the “mindset revolution” might not be the best avenue to reshape our education system.

The presenter’s speaker fees.

Presenters of TED talks are not paid, but a successful talk can lead to lucrative speaking engagements. It is informative to Google the speaking fees of the presenters of highly accessed Ted talks. In the case of Carol Dweck, I found the booking agency,  All American Speakers.

carol dweck speaking

fee range

Mindsetonline provides products for sale as well as success stories about people and organizations adopting a growth mindset.

buy the bookbuy the software

businessa nd leadership

There is even a 4-item measure of mindset you can complete on line.  Each of the items is some paraphrasing of ‘you can’t change your intelligence very much’ either stated straightforwardly or reverse, ‘you can.’

Consumers beware! TED talks are not reliable dissemination of best evidence.

TED talks are to best evidence like historical fiction is to history.

Even TED talks by eminent psychologists often are little more than informercials for the self-help and lucrative speaking engagements and workshops.

Academics are under increasing pressure to demonstrate that there is more to the  impact of their work, in terms of citations of publications in prestigious journals. Social impact is being used to balance journal impact factors.

It is also being recognized that outreach involves the need to equip lay audiences to be able to grasp what are initially difficult or confusing concepts.

But pictures of color brains can be used to dumb down consumers and to disarm their intuitive skepticism about behavioral science working magic and miracles. Even PhD psychologists are inclined to be  overly impressed with references to neuroscience and pictures of color brains are introduced into the discussion. The vulnerability of lay audiences to neurononsense or neurobollocks is even greater.

False and exaggerated claims about academic interventions harm school systems, teachers, and ultimately, students. In communicating to lay audiences, psychologists need to be sensitive to the possible misunderstandings they are reinforcing. They have an ethical responsibility to do their best to critical thinking skills of their audiences, not damage it.

TED talks and declarations of potential conflicts of interest.

Personally, I found that calling out the pseudoscience behind claims for unproven medicine like acupuncture or homeopathy does not produce much blowback except mostly from proponents of these treatments. Similarly, campaigning for better disclosure of potential conflicts of interest does not meet much resistance when the focus is on pharmaceutical companies.

However, it’s a whole different matter to call out the pseudoscience behind self-help and exaggerated outbreak false claims about behavioral science being able to work miracles and magic. It seems to be a double standard in psychology by which is inappropriate to exaggerate the strength of findings when communicating with other professionals. On the other hand, in communicating with lay audiences, it’s perfectly okay.

We need to think about TED talks more like we think about talks by opinion leaders with ties to the pharmaceutical industry. Presenters  should start with a standard slide disclosing financial interests that may influence opinions offered about specific products mentioned in the talk. Given the pressure to get findings that will fit into the next TED talk, presenters should routinely disclose in their peer review articles that they give TED talks or have a booking agent.

 

Can we predict suicide from Twitter language?

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

mind the brain logo

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

Nick Brown and I (*) recently posted a preprint:

No Evidence That Twitter Language Reliably Predicts Heart Disease: A Reanalysis of Eichstaedt et al. (2015a)

We reanalyze Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of U.S. counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates…We conclude that there is no evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.

You can find the original article here:

Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, Jha S, Agrawal M, Dziurzynski LA, Sap M, Weeg C. Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science. 2015 Feb;26(2):159-69.

 

A press release from Association for Psychological Science heaped lavish praise on the original article. It can be found here.

“Twitter seems to capture a lot of the same information that you get from health and demographic indicators,” co-author Gregory Park said, “but it also adds something extra. So predictions from Twitter can actually be more accurate than using a set of traditional variables.

 Our overarching conclusion:

… There is a very large amount of noise in the measures of the meaning of Twitter data used by Eichstaedt et al., and these authors’ complex analysis techniques (involving, for example, several steps to deal with high multicollinearity) are merely modeling this noise to produce the illusion of a psychological mechanism that acts at the level of people’s county of residence.

Our look at key assumptions and re-analyses

The choice of atherosclerotic heart disease (AHD) as the health outcome fits with lay understanding of what causes heart attacks of interest, but was unfortunate.

Folk beliefs about negative emotion causing heart attacks had been bolstered by some initial promising findings in small samples suggesting a link between Type A behavior pattern (TABP) and cardiac events and mortality. In our preprint, we discuss how subsequent, better controlled studies did not confirm these results.

Type A behavior pattern cannot readily be distinguished from other negative emotion variables. These negative emotion variables converge in what is been called by Paul Meehl a “crud factor” or by others, a “big mess.” Such negative affect variables are non-informative risk markers, not true risk factors. These variables have too many correlates in background, pre-existing variables, including poor physical health; and in concurrent variables that cannot readily be separated in statistical analyses, even with prospective data. See “Negative emotions and health: why do we keep stalking bears when we only find scat”for a further discussion.

While we were finishing up our manuscript, an article came out that analyzed and succinctly summarized this issue:

A substantial part of the distress–IHD [ischaemic heart disease] association is explained by confounding and functional limitations . . . . Emphasis should be on psychological distress as a marker of healthcare need and IHD risk, rather than a causative factor.”

AHD is actually a chronic condition, slowly developing over a lifetime. Many of the crucial determinants of whether someone later shows signs and symptoms of AHD occur in childhood or adolescence.

Americans are a highly mobile population, and when they reach middle age with its increase in heart attacks, they may have moved geographically far away from where they lived when their chronic disease developed. The counties in which participants are identified for the purposes of this Twitter study are not the counties in which they developed their condition.

Most of the people who are tweeting in a county are younger than the people likely to be dying from AHD. So, we are assessing one population to predict health events in another.

Some of our other findings that are discussed more fully in our preprint:

Coding of AHD as the cause of death in this study was highly unreliable and subject to major variability across counties.

The process for selecting counties to be included in the study was biased.

The Twitter-based dictionaries used for coding appear not to be a faithful summary of the words that were actually typed by users. There were puzzling omissions.

Arbitrary and presumably post-hoc choices were apparently made in some of the dictionary-based analyses and these choices strengthened the appearance of an association between Twitter language and death from AHD.

There were numerous problems associated with the use of counties as the unit of analysis, which vary greatly in size (between) as well as heterogeneity (within) of sociodemographic or socioemotional factors, as well as the proportion of county residents who were actually on Twitter.

The predictive power of the model, including the associated maps, appears to be questionable.

While we were working on the manuscript that became a preprint, another relevant paper came out:

Jensen, E. A. (2017). Putting the methodological brakes on claims to measure national happiness through Twitter: Methodological limitations in social media analytics. PLOS ONE, 12(9), e0180080.

We  endorse its conclusion:

When researchers approach a data set, they need to understand and publicly account for not only the limits of the data set, but also the limits of which questions they can ask . . . and what interpretations are appropriate (p. 6).

Using Twitter data to predict death by suicide

Ok, I have already spoiled the story by giving up front the argument that trying to predict health outcomes from big Twitter data is not a good idea.

But a case can be made that if we are going to predict a health outcome from Twitter, suicide is a better candidate than AHD. This was Nick’s idea, but I wanted to emphasize it more than he did.

Although suicide can be the result of long-term mental health problems and other stressors, a person’s psychological state in the months and days leading up to the point at which they take their own life clearly has a substantial degree of relevance to their decision. Hence, we might expect any county-level psychological factors that act directly on the health and welfare of members of the local community to be more closely reflected in the mortality statistics for suicide than those for a chronic disease such as AHD.

We [collective “we” the authors, but actually Nick] also downloaded comparable mortality data for the ICD-10 categories X60–X84, collectively labeled “Intentional self-harm”—in order to test the idea that suicide might be at least as well predicted by Twitter language as AHD—as well as the data for several other causes of death (including all-cause mortality) for comparison purposes.

We therefore examined the relationship of the set of causes of death listed by the CDC as “self-harm” with Twitter language usage, using the procedures reported in the first subsections entitled “Language variables from Twitter” and “Statistical analysis” of Eichstaedt et al.’s (2015a, p. 161) Method section. Because of the limitation of the CDC Wonder database, noted earlier, whereby mortality rates are only available when at least 10 deaths per year are recorded in a given county, data for self-harm were only available for 741 counties; however, these represented 89.9% of the population of Eichstaedt et al.’s set of 1,347 counties.

Our findings

self-harm and twitter

 

In the “Dictionaries” analysis, we found that mortality from self-harm was negatively correlated with all five “negative” language factors, with three of these correlations (for anger, negative-relationship, and negative-emotion words) being statistically significant at the .05 level (see our Table 1). That is, counties whose residents made greater use of negative language on Twitter had lower rates of suicide, or, to borrow Eichstaedt et al.’s (2015a, p. 162) words, use of negative language was “significantly protective” against self-harm; this statistical significance was unchanged when income and education were added as covariates. In a further contrast to AHD mortality, two of the three positive language factors (positive relations and positive emotions) were positively correlated with mortality from self-harm, although these correlations were not statistically significant.

Next, we analyzed the relationship between Twitter language and self-harm outcomes at the “Topics” level. Among the topics most highly correlated with increased risk of self-harm were those associated with spending time surrounded by nature (e.g., grand, creek, hike; r = .214, CI[1] = [.144, .281]), romantic love (e.g., beautiful, love, girlfriend; r = .176, CI = [.105, .245]), and positive evaluation of one’s social situation (e.g., family, friends, wonderful; r = .175, CI = [.104, .244]). There were also topics of discussion that appeared to be strongly “protective” against the risk of self-harm, such as baseball (e.g., game, Yankees, win; r = −.317, CI = [−.381, −.251]), binge drinking (e.g., drunk, sober, hungover; r = −.249, CI = [−.316, −.181]), and watching reality TV (e.g., Jersey, Shore, episode; r = −.200, CI = [−.269, −.130]). All of the correlations between these topics and self-harm outcomes, both positive and negative, were significant at the same Bonferroni-corrected significance level (i.e., .05/2,000 = .000025) used by Eichstaedt et al. (2015a), and remained significant at that level after adjusting for income and education. That is, several topics that were ostensibly associated with “positive,” “eudaimonic” approaches to life predicted higher rates of county-level self-harm mortality, whereas apparently hedonistic topics were associated with lower rates of self-harm mortality, and the magnitude of these associations was at least as great—and in a few cases, even greater—than those found by Eichstaedt et al. These topics are shown in “word cloud” form (generated at https://www.jasondavies.com/wordcloud/) in our Figure 2 (cf. Eichstaedt et al.’s Figure 1).

time spent with nature

baseball

If anyone insists on giving this finding a substantive interpretation…

This discovery would seem to pose a problem for Eichstaedt et al.’s (2015a, p. 166) claim to have shown the existence of “community-level psychological factors that are important for the cardiovascular health of communities.” Apparently the “positive” versions of these factors, while acting via some unspecified mechanism to make the community as a whole less susceptible to developing hardening of the arteries, also simultaneously manage to make the same people more likely to commit suicide, and vice versa. It seems that more research into the possible risks of increased levels of self-harm would seem to be needed before any program to enhance these “community-level psychological factors” were to be undertaken.

But actually, no, we don’t want to do that.

Of course, there is no suggestion that the study of the language used on Twitter by the inhabitants of any particular county has any real predictive value for the local suicide rate; we believe that such associations are likely to be the entirely spurious results of imperfect measurements and chance factors, and to use Twitter data to predict which areas might be about to experience higher suicide rates is likely to prove extremely inaccurate (and perhaps ethically questionable as well).

Note

*When published, this preprint will serve as one of the articles that will be bundled in Nick Brown’s PhD thesis submitted to University Medical Centre., Groningen. As Nick’s adviser, I was pleased to have a role that justified an authorship. I want to be clear, however, my role was more like a midwife observing a natural birth than an OBGyn having to induce labor. Nick can’t say what I can say: there is some real brilliance to this paper. The brilliance belongs to Nick, not me.  And I mean brilliance in the restricted American sense, not promiscuous British sense, like that is a brilliant dessert.

I encourage you to dig in and enjoy. There are lots of treats and curious observations. Nick notably retrieved and analyzed the data, but also did some programming to capture the color depiction of counties and ADHD rates. He identified some anomalies and then developed his own depiction with some corrections to the original. Truly amazing.

map differences