Lessons we need to learn from a Lancet Psychiatry study of the association between exercise and mental health

The closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

giphyThe closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

Apparently, the editor of Lancet Psychiatry and reviewers did not give the study a close look before it was accepted.

The article was used to raise funds for a startup company in which one of the authors was heavily invested. This was disclosed, but doesn’t let the authors off the hook for promoting a seriously flawed study. Nor should the editor of Lancet Psychiatry or reviewers escape criticism, nor the large number of people on Twitter who thoughtlessly retweeted and “liked” a series of tweets from the last author of the study.

This blog post is intended to raise consciousness about bad science appearing in prestigious journals and to allow citizen scientists to evaluate their own critical thinking skills in terms of their ability to detect misleading and exaggerated claims.

1.Sometimes a disclosure of extensive conflicts of interest alerts us not to pay serious attention to a study. Instead, we should question why the study got published in a prestigious peer-reviewed journal when it had such an obvious risk of bias.

2.We need citizen scientists with critical thinking skills to identify such promotional efforts and alert others in their social network that hype and hokum are being delivered.

3.We need to stand up to authors who use scientific papers for commercial purposes, especially when they troll critics.

Read on and you will see what a skeptical look at the paper and its promotion revealed.

  • The study failed to capitalize on the potential of multiple years of data for developing and evaluating statistical models. Bigger is not necessarily better. Combining multiple years of data was wasteful and served only the purpose of providing the authors bragging rights and the impressive, but meaningless p-values that come from overly large samples.
  • The study relied on an unvalidated and inadequate measure of mental health that confounded recurring stressful environmental conditions in the work or home with mental health problems, even where validated measures of mental health would reveal no effects.
  • The study used an odd measure of history of mental health problems that undoubtedly exaggerated past history.
  • The study confused physical activity with (planned) exercise. Authors amplified their confusion by relying on an exceedingly odd strategy for getting estimate of how much participants exercised: Estimates of time spent in a single activity was used in analyses of total time spent exercising. All other physical activity was ignored.
  • The study made a passing acknowledgment of the problems interpreting simple associations as causal, but then went on to selectively sample the existing literature to make the case that interventions to increase exercise improve mental health.
  • Taken together, a skeptical of assessment of this article provides another demonstration that disclosure of substantial financial conflicts of interests should alert readers to a high likelihood of a hyped, inaccurately reported study.
  • The article was pay walled so that anyone interested in evaluating the authors claims for themselves had to write to the author or have access to the article through a university library site. I am waiting for the authors to reply to my requests for the supplementary tables that are needed to make full sense of their claims. In the meantime, I’ll just complain about authors with significant conflicts of interest heavily promoting studies that they hide behind paid walls.

I welcome you to  examine the author’s thread of tweets. Request the actual article from the author if you want to evaluate independently my claims. This can be great material for a masters or honors class on critical appraisal, whether in psychology or journalism.

title of article

Let me know if you think that I’ve been too hard on this study.

A thread of tweets  from the last author celebrated the success of well orchestrated publicity campaign for a new article concerning exercise and mental health in Lancet Psychiatry.

The thread started:

Our new @TheLancetPsych paper was the biggest ever study of exercise and mental health. it caused quite a stir! here’s my guided tour of the paper, highlighting some of our excitements and apprehensions along the way [thread] 1/n

And ended with pitch for the author’s do-good startup company:

Where do we go from here? Over @spring_health – our mental health startup in New York City – we’re using these findings to develop personalized exercise plans. We want to help every individual feel better—faster, and understand exactly what each patient needs the most.

I wasn’t long into the thread before my skepticism was stimulated. The fourth tweet in the thread had a figure that didn’t get any comments about how bizarre it was.

The tweet

It looks like those differences mattered. for example, people who exercised for about 45 minutes seemed to have better mental health than people who exercised for less than 30, or more than 60 minutes. — a sweet spot for mental health, perhaps?

graphs from paper

Apparently the author does not comment on an anomaly either. Housework appears to be better for mental health than a summary score of all exercise and looks equal to or better than cycling or jogging. But how did housework slip into the category “exercise”?

I begin wondering what the authors meant by “exercise” or if they’d given the definition serious consideration when constructing their key variable from the survey data.

But then that tweet was followed by another one that generated more confusion with a  graph the seemingly contradicted the figures in the last one

the type of exercise people did seems important too! People doing team sports or cycling had much better mental health than other sports. But even just walking or doing household chores was better than nothing!

Then a self-congratulatory tweet for a promotional job well done.

for sure — these findings are exciting, and it has been overwhelming to see the whole world talking openly and optimistically about mental health, and how we can help people feel better. It isn’t all plain sailing though…

The author’s next tweet revealed a serious limitation to the measure of mental health used in the study in a screenshot.

screenshot up tweet with mental health variable

The author acknowledged the potential problem, sort of:

(1b- this might not be the end of the world. In general, most peple have a reasonable understanding of their feelings, and in depressed or anxious patients self-report evaluations are highly correlated with clinician-rated evaluations. But we could be more precise in the future)

“Not the end of the world?” Since when does the author of the paper in the Lancet family of journals so casually brush off a serious methodological issue? A lot of us who have examined the validity of mental health measures would be skeptical of this dismissal  of a potentially fatal limitation.

No validation is provided for this measure. On the face of it, respondents could endorse it on basis of facing  recurring stressful situations that had no consequences for their mental health. This reflects ambiguity of the term stress for both laypersons and scientists. “Stress” could variously refer to an environmental situation, a subjective experience of stress, or an adaptational outcome. Waitstaff could consider Thursday when the chef is off, a recurrent, weekly stress. Persons with diagnosable persistent depressive disorder would presumably endorse more days than not as being a mental health challenge. But they would mean something entirely different.

The author acknowledged that the association between exercise and mental health might be bidirectional in terms of causality

adam on lots of reasons to believe relationship goes both ways.PNG

But then made a strong claim for increased exercise leading to better mental health.

exercise increases mental health.PNG

[Actually, as we will see, the evidence from randomized trials of exercise to improve mental health is modest, and entirely disappears one limits oneself to the quality studies.]

The author then runs off the rail with the claim that the benefits of exercise exceed benefits of having greater than poverty-level income.

why are we so excited.PNG

I could not resist responding.

Stop comparing adjusted correlations obtained under different circumstances as if they demonstrated what would be obtained in RCT. Don’t claim exercising would have more effect than poor people getting more money.

But I didn’t get a reply from the author.

Eventually, the author got around to plugging his startup company.

I didn’t get it. Just how did this heavy promoted study advance the science fo such  “personalized recommendation?

Important things I learned from others’ tweets about the study

I follow @BrendonStubbs on Twitter and you should too. Brendon often makes wise critical observations of studies that most everyone else is uncritically praising. But he also identifies some studies that I otherwise would miss and says very positive things about them.

He started his own thread of tweets about the study on a positive note, but then he identified a couple of critical issues.

First, he took issue with the author’s week claiming to have identified a tipping point, below which exercise is beneficial, and above which exercise could prove detrimental the mental health.

4/some interpretations are troublesome. Most confusing, are the assumptions that higher PA is associated/worsens your MH. Would we say based on cross sect data that those taking most medication/using CBT most were making their MH worse?

A postdoctoral fellow @joefirth7  seconded that concern:

I agree @BrendonStubbs: idea of high PA worsening mental health limited to observation studies. Except in rare cases of athletes overtraining, there’s no exp evidence of ‘tipping point’ effect. Cross-sect assocs of poor MH <–> higher PA likely due to multiple other factors…

Ouch! But then Brendan follows up with concerns that the measure of physical activity has not been adequately validated, noting that such self-report measures prove to be invalid.

5/ one consideration not well discussed, is self report measures of PA are hopeless (particularly in ppl w mental illness). Even those designed for population level monitoring of PA https://journals.humankinetics.com/doi/abs/10.1123/jpah.6.s1.s5 … it is also not clear if this self report PA measure has been validated?

As we will soon see, the measure used in this study is quite flawed in its conceptualization and its odd methodology of requiring participants to estimate the time spent exercising for only one activity, with 70 choices.

Next, Brandon points to a particular problem using self-reported physical activity in persons with mental disorder and gives an apt reference:

6/ related to this, self report measures of PA shown to massively overestimate PA in people with mental ill health/illness – so findings of greater PA linked with mental illness likely bi-product of over-reporting of PA in people with mental illness e.g Validity and Value of Self-reported Physical Activity and Accelerometry in People With Schizophrenia: A Population-Scale Study of the UK Biobank [ https://academic.oup.com/schizophreniabulletin/advance-article/doi/10.1093/schbul/sbx149/4563831 ]

7/ An additional point he makes: anyone working in field of PA will immediately realise there is confusion & misinterpretation about the concepts of exercise & PA in the paper, which is distracting. People have been trying to prevent this happening over 30 years

Again, Brandon provides a spot-on citation clarifying the distinction between physical activity and exercise:, Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research 

The mysterious pseudonymous Zad Chow @dailyzad called attention to a blog post they had just uploaded and let’s take a look at some of the key points.

Lessons from a blog post: Exercise, Mental Health, and Big Data

Zad Chow is quite balanced in dispensing praise and criticism of the Lancet Psychiatry paper. They noted the ambiguity of any causality in cross-sectional correlation and that investigated the literature on their own.

So what does that evidence say? Meta-analyses of randomized trials seem to find that exercise has large and positive treatment effects on mental health outcomes such as depression.

Study Name     # of Randomized Trials             Effects (SMD) + Confidence Intervals

Schuch et al. 2016       25         1.11 (95% CI, 0.79-1.43)

Gordon et al. 2018      33         0.66 (95% CI, 0.48-0.83)

Krogh et al. 2017          35         −0.66 (95% CI, -0.86, -0.46)

But, when you only pool high-quality studies, the effects become tiny.

“Restricting this analysis to the four trials that seemed less affected of bias, the effect vanished into −0.11 SMD (−0.41 to 0.18; p=0.45; GRADE: low quality).” – Krogh et al. 2017

Hmm, would you have guessed this from the Lancet Psychiatry author’s thread of tweets?

Zad Chow showed the hype and untrustworthiness of the press coverage in prestigious media with a sampling of screenshots.

zad chou screenshots of press coverage

I personally checked and don’t see that Zad Chow’s selection of press coverage was skewed. Coverage in the media all seemed to be saying the same thing. I found the distortion to continue with uncritical parroting – a.k.a. churnaling – of the claims of the Lancet Psychiatry authors in the Wall Street Journal. 

The WSJ repeated a number of the author’s claims that I’ve already thrown into question and added a curiosity:

In a secondary analysis, the researchers found that yoga and tai chi—grouped into a category called recreational sports in the original analysis—had a 22.9% reduction in poor mental-health days. (Recreational sports included everything from yoga to golf to horseback riding.)

And the NHS England totally got it wrong:

NHS getting it wrong.PNG

So, we learned that the broad category “recreational sports” covers yoga and tai chi , as well as golf and  horseback riding. This raises serious questions about the lumping and splitting of categories of physical activity in the analyses that are being reported.

I needed to access the article in order to uncover some important things 

I’m grateful for the clues that I got from Twitter, and especially Zad Chow that I used in examining the article itself.

I got hung up on the title proclaiming that the study involved 1·2 million individuals. When I checked the article, I saw that the authors use three waves of publicly available data to get that number. Having that many participants gave them no real advantage except for bragging rights and the likelihood that modest associations could be expressed in expressed in spectacular p-values, like p<2・2 × 10–16. I don’t understand why the authors didn’t conduct analyses with one-way and Qwest validate results in another.

The obligatory Research in Context box made it sound like a systematic search of the literature had been undertaken. Maybe, but the authors were highly selective in what they chose to comment upon, as seen in its contradiction by the brief review of Zad Chow. The authors would have us believe that the existing literature is quite limited and inconclusive, supporting the need for like their study.

research in context

Caveat Lector, a strong confirmation bias is likely ahead in this article.

Questions accumulated quickly as to the appropriateness of the items available from a national survey undoubtedly constructed with other purposes. Certainly these items would not have been selected if the original investigators were interested in the research question at the center of this article.

Participants self-reported a previous diagnosis of depression or depressive episode on the basis of the following question: “Has a doctor, nurse, or other health professional EVER told you that you have a depressive disorder, including depression, major depression, dysthymia, or minor depression?”

Our own work has cast serious doubt on the correspondence of reports of a history of depression in response to a brief question embedded in a larger survey with results of a structured interview in which respondents’ answers can be probed. We found that answers to such questions were more related to current distress, then to actual past diagnoses and treatment of depression. However, the survey question used in the Lancet Psychiatry study added the further ambiguity and invalidity with the added  “or minor depression.” I am not sure under what circumstances a health care professional would disclose a diagnosis of “minor depression” to a patient, but I doubt it would be in context in which the professional felt treatment was needed.

Despite the skepticism that I was developing about the usefulness of the survey data, I was unprepared for the assessment of “exercise.”

Other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?” Participants who answered yes to this question were then asked: “What type of physical activity or exercise did you spend the most time doing during the past month?” A total of 75 types of exercise were represented in the sample, which were grouped manually into eight exercise categories to balance a diverse representation of exercises with the need for meaningful cell sizes (appendix).

Participants indicated the number of times per week or month that they did this exercise and the number of minutes or hours that they usually spend exercising in this way each time.

I had already been tipped off by the discussion on twitter that there would be a thorough confusion of planned exercise and mere physical activity. But now that was compounded. Why was physical activity during employment excluded? What if participants were engaged in a number of different physical activities,  like both jogging and bicycling? If so, the survey obtained data for only one of these activities, with the other excluded, and the choice could’ve been quite arbitrary as to which one the participant identified as the one to be counted.

Anyone who has ever constructed surveys would be alert to the problems posed by participants’ awareness that saying “yes” to exercising would require contemplating  75 different options, arbitrarily choosing one of them for a further question how much time the participant engaged in this activity. Unless participants were strongly motivated, then there was an incentive to simply say no, they didn’t exercise.

I suppose I could go on, but it was my judgment that any validity what the authors were claiming  had been ruled out. Like someone once said on NIH grant review panel, there are no vital signs left, let’s move on to the next item.

But let’s refocus just a bit on the overall intention of these authors. They want to use a large data set to make statements about the association between physical activity and a measure of mental health. They have used matching and statistical controls to equate participants. But that strategy effectively eliminates consideration of crucial contextual variables. Persons’ preferences and opportunities to exercise are powerfully shaped by their personal and social circumstances, including finances and competing demands on their time. Said differently, people are embedded in contexts in which a lot of statistical maneuvering has sought to eliminate.

To suggest a small number of the many complexities: how much physical activity participants get  in their  employment may be an important determinant of choices for additional activity, as well as how much time is left outside of work. If work typically involves a lot of physical exertion, people may simply be left too tired for additional planned physical activity, a.k.a. exercise, and the physical health may require it less. Environments differ greatly in terms of the opportunities and the safety of engaging in various kinds of physical activities. Team sports require other people being available. Etc., etc.

What I learned from the editorial accompanying the Lancet Psychiatry article

The brief editorial accompanying the article aroused my curiosity as to whether someone assigned to reading and commenting on this article would catch things that apparently the editor and reviewer missed.

Editorial commentators are chosen to praise, not to bury articles. There are strong social pressures to say nice things. However, this editorial leaked a number of serious concerns.

First

In presenting mental health as a workable, unified concept, there is a presupposition that it is possible and appropriate to combine all the various mental disorders as a single entity in pursuing this research. It is difficult to see the justification for this approach when these conditions differ greatly in their underlying causes, clinical presentation, and treatment. Dementia, substance misuse, and personality disorder, for example, are considered as distinct entities for research and clinical purposes; capturing them for study under the combined banner of mental health might not add a great deal to our understanding.

The problem here of categorisation is somewhat compounded by the repeated uncomfortable interchangeability between mental health and depression, as if these concepts were functionally equivalent, or as if other mental disorders were somewhat peripheral.

Then:

A final caution pertains to how studies approach a definition of exercise. In the current study, we see the inclusion of activities such as childcare, housework, lawn-mowing, carpentry, fishing, and yoga as forms of exercise. In other studies, these activities would be excluded for not fulfilling the definition of exercise as offered by the American College of Sports Medicine: “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.” 11 The study by Chekroud and colleagues, in its all-encompassing approach, might more accurately be considered a study in physical activity rather than exercise.

The authors were listening for a theme song with which they could promote their startup company in a very noisy data set. They thought they had a hit. I think they had noise.

The authors’ extraordinary disclosure of interests (see below this blog post) should have precluded publication of this serious flawed piece of work, either simply for reason of high likelihood of bias or because it promoted the editor and reviewers to look more carefully at the serious flaws hiding in plain sight.

Postscript: Send in the trolls.

On Twitter, Adam Chekroud announced he felt no need to respond to critics. Instead, he retweeted and “liked” trolling comments directed at critics from the twitter accounts of his brother, his mother, and even the official Twitter account of a local fried chicken joint @chickenlodge, that offered free food for retweets and suggested including Adam Chekroud’s twitter handle if you wanted to be noticed.

chicken lodge

Really, Adam, if you can’t stand the heat, don’t go near  where they are frying chicken.

The Declaration of Interests from the article.

declaration of interest 1

declaration of interest 2

 

When psychotherapy trials have multiple flaws…

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

mind the brain logo

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

We can learn to spot features of psychotherapy trials that are likely to lead to exaggerated claims of efficacy for treatments or claims that will not generalize beyond the sample that is being studied in a particular clinical trial. We can look to the adequacy of sample size, and spot what Cochrane collaboration has defined as risk of bias in their handy assessment tool.

We can look at the case-mix in the particular sites where patients were recruited.  We can examine the adequacy of diagnostic criteria that were used for entering patients to a trial. We can examine how blinded the trial was in terms of whoever assigned patients to particular conditions, but also what the patients, the treatment providers, and their evaluaters knew which condition to which particular patients were assigned.

And so on. But what about combinations of these factors?

We typically do not pay enough attention multiple flaws in the same trial. I include myself among the guilty. We may suspect that flaws are seldom simply additive in their effect, but we don’t consider whether they may be even synergism in the negative effects on the validity of a trial. As we will see in this analysis of a clinical trial, multiple flaws can provide more threats to the validity trial than what we might infer when the individual flaws are considered independently.

The particular paper we are probing is described in its discussion section as the “largest RCT to date testing the efficacy of group CBT for patients with CFS.” It also takes on added importance because two of the authors, Gijs Bleijenberg and Hans Knoop, are considered leading experts in the Netherlands. The treatment protocol was developed over time by the Dutch Expert Centre for Chronic Fatigue (NKCV, http://www.nkcv.nl; Knoop and Bleijenberg, 2010). Moreover, these senior authors dismiss any criticism and even ridicule critics. This study is cited as support for their overall assessment of their own work.  Gijs Bleijenberg claims:

Cognitive behavioural therapy is still an effective treatment, even the preferential treatment for chronic fatigue syndrome.

But

Not everybody endorses these conclusions, however their objections are mostly baseless.

Spoiler alert

This is a long read blog post. I will offer a summary for those who don’t want to read through it, but who still want the gist of what I will be saying. However, as always, I encourage readers to be skeptical of what I say and to look to my evidence and arguments and decide for themselves.

Authors of this trial stacked the deck to demonstrate that their treatment is effective. They are striving to support the extraordinary claim that group cognitive behavior therapy fosters not only better adaptation, but actually recovery from what is internationally considered a physical condition.

There are some obvious features of the study that contribute to the likelihood of a positive effect, but these features need to be considered collectively, in combination, to appreciate the strength of this effort to guarantee positive results.

This study represents the perfect storm of design features that operate synergistically:

perfect storm

 Referral bias – Trial conducted in a single specialized treatment setting known for advocating psychological factors maintaining physical illness.

Strong self-selection bias of a minority of patients enrolling in the trial seeking a treatment they otherwise cannot get.

Broad, overinclusive diagnostic criteria for entry into the trial.

Active treatment condition carry strong message how patients should respond to outcome assessment with improvement.

An unblinded trial with a waitlist control lacking the nonspecific elements (placebo) that confound the active treatment.

Subjective self-report outcomes.

Specifying a clinically significant improvement that required only that a primary outcome be less than needed for entry into the trial

Deliberate exclusion of relevant objective outcomes.

Avoidance of any recording of negative effects.

Despite the prestige attached to this trial in Europe, the US Agency for Healthcare Research and Quality (AHRQ) excludes this trial from providing evidence for its database of treatments for chronic fatigue syndrome/myalgic encephalomyelitis. We will see why in this post.

factsThe take away message: Although not many psychotherapy trials incorporate all of these factors, most trials have some. We should be more sensitive to when multiple factors occur in the same trial, like bias in the site for patient recruitment; lacking of blinding; lack of balance between active treatment and control condition in terms of nonspecific factors, and subjective self-report measures.

The article reporting the trial is

Wiborg JF, van Bussel J, van Dijk A, Bleijenberg G, Knoop H. Randomised controlled trial of cognitive behaviour therapy delivered in groups of patients with chronic fatigue syndrome. Psychotherapy and Psychosomatics. 2015;84(6):368-76.

Unfortunately, the article is currently behind a pay wall. Perhaps readers could contact the corresponding author Hans.knoop@radboudumc.nl  and request a PDF.

The abstract

Background: Meta-analyses have been inconclusive about the efficacy of cognitive behaviour therapies (CBTs) delivered in groups of patients with chronic fatigue syndrome (CFS) due to a lack of adequate studies. Methods: We conducted a pragmatic randomised controlled trial with 204 adult CFS patients from our routine clinical practice who were willing to receive group therapy. Patients were equally allocated to therapy groups of 8 patients and 2 therapists, 4 patients and 1 therapist or a waiting list control condition. Primary analysis was based on the intention-to-treat principle and compared the intervention group (n = 136) with the waiting list condition (n = 68). The study was open label. Results: Thirty-four (17%) patients were lost to follow-up during the course of the trial. Missing data were imputed using mean proportions of improvement based on the outcome scores of similar patients with a second assessment. Large and significant improvement in favour of the intervention group was found on fatigue severity (effect size = 1.1) and overall impairment (effect size = 0.9) at the second assessment. Physical functioning and psychological distress improved moderately (effect size = 0.5). Treatment effects remained significant in sensitivity and per-protocol analyses. Subgroup analysis revealed that the effects of the intervention also remained significant when both group sizes (i.e. 4 and 8 patients) were compared separately with the waiting list condition. Conclusions: CBT can be effectively delivered in groups of CFS patients. Group size does not seem to affect the general efficacy of the intervention which is of importance for settings in which large treatment groups are not feasible due to limited referral

The trial registration

http://www.isrctn.com/ISRCTN15823716

Who was enrolled into the trial?

Who gets into a psychotherapy trial is a function of the particular treatment setting of the study, the diagnostic criteria for entry, and patient preferences for getting their care through a trial, rather than what is being routinely provided in that setting.

 We need to pay particular attention to when patients enter psychotherapy trials hoping they will receive a treatment they prefer and not to be assigned to the other condition. Patients may be in a clinical trial for the betterment of science, but in some settings, they are willing to enroll because of a probability of getting treatment they otherwise could not get. This in turn also affects the evaluation of both the condition in which they get the preferred treatment, but also their evaluation of the condition in which they are denied it. Simply put, they register being pleased with what they wanted or not being pleased if they did not get what they wanted.

The setting is relevant to evaluating who was enrolled in a trial.

The authors’ own outpatient clinic at the Radboud University Medical Center was the site of the study. The group has an international reputation for promoting the biopsychosocial model, in which psychological factors are assumed to be the decisive factor in maintaining somatic complaints.

All patients were referred to our outpatient clinic for the management of chronic fatigue.

There is thus a clear referral bias  or case-mix bias but we are not provided a ready basis for quantifying it or even estimating its effects.

The diagnostic criteria.

The article states:

In accordance with the US Center for Disease Control [9], CFS was defined as severe and unexplained fatigue which lasts for at least 6 months and which is accompanied by substantial impairment in functioning and 4 or more additional complaints such as pain or concentration problems.

Actually, the US Center for Disease Control would now reject this trial because these entry criteria are considered obsolete, overinclusive, and not sufficiently exclusive of other conditions that might be associated with chronic fatigue.*

There is a real paradigm shift happening in America. Both the 2015 IOM Report and the Centers for Disease Control and Prevention (CDC) website emphasize Post Exertional Malaise and getting more ill after any effort with M.E. CBT is no longer recommended by the CDC as treatment.

cdc criteriaThe only mandatory symptom for inclusion in this study is fatigue lasting 6 months. Most properly, this trial targets chronic fatigue [period] and not the condition, chronic fatigue syndrome.

Current US CDC recommendations  (See box  7-1 from the IoM document, above) for diagnosis require postexertional malaise for a diagnosis of myalgic encephalomyelitis (ME). See below.

pemPatients meeting the current American criteria for ME would be eligible for enrollment in this trial, but it’s unclear what proportion of the patients enrolled actually met the American criteria. Because of the over-inclusiveness of the entry diagnostic criteria, it is doubtful whether the results would generalize to American sample. A look at patient flow into the study will be informative.

Patient flow

Let’s look at what is said in the text, but also in the chart depicting patient flow into the trial for any self-selection that might be revealed.

In total, 485 adult patients were diagnosed with CFS during the inclusion period at our clinic (fig. 1). One hundred and fifty-seven patients were excluded from the trial because they declined treatment at our clinic, were already asked to participate in research incompatible with inclusion (e.g. research focusing on individual CBT for CFS) or had a clinical reason for exclusion (i.e. they received specifically tailored interventions because they were already unsuccessfully treated with individual CBT for CFS outside our clinic or were between 18 and 21 years of age and the family had to be involved in the therapy). Of the 328 patients who were asked to engage in group therapy, 99 (30%) patients indicated that they were unwilling to receive group therapy. In 25 patients, the reason for refusal was not recorded. Two hundred and four patients were randomly allocated to one of the three trial conditions. Baseline characteristics of the study sample are presented in table 1. In total, 34 (17%) patients were lost to follow-up. Of the remaining 170 patients, 1 patient had incomplete primary outcome data and 6 patients had incomplete secondary outcome data.

flow chart

We see that the investigators invited two thirds of patients attending the clinic to enroll in the trial. Of these, 41% refused. We don’t know the reason for some of the refusals, but almost a third of the patients approached declined because they did not want group therapy. The authors left being able to randomize 42% of patients coming to the clinic or less than two thirds of patients they actually asked. Of these patients, a little more than two thirds received the treatment to which were randomized and were available for follow-up.

These patients receiving treatment to which they were randomized and who were available for follow-up are self-selected minority of the patients coming to the clinic. This self-selection process likely reduced the proportion of patients with myalgic encephalomyelitis. It is estimated that 25% of patients meeting the American criteria a housebound and 75% are unable to work. It’s reasonably to infer that patients being the full criteria would opt out of a treatment that require regular attendance of a group session.

The trial is biased to ambulatory patients with fatigue and not ME. Their fatigue is likely due to some combinations of factors such as multiple co-morbidities, as-yet-undiagnosed medical conditions, drug interactions, and the common mild and subsyndromal  anxiety and depressive symptoms that characterize primary care populations.

The treatment being evaluated

Group cognitive behavior therapy for chronic fatigue syndrome, either delivered in a small (4 patients and 1 therapist) or larger (8 patients and 2 therapists) group format.

The intervention consisted of 14 group sessions of 2 h within a period of 6 months followed by a second assessment. Before the intervention started, patients were introduced to their group therapist in an individual session. The intervention was based on previous work of our research group [4,13] and included personal goal setting, fixing sleep-wake cycles, reducing the focus on bodily symptoms, a systematic challenge of fatigue-related beliefs, regulation and gradual increase in activities, and accomplishment of personal goals. A formal exercise programme was not part of the intervention.

Patients received a workbook with the content of the therapy. During sessions, patients were explicitly invited to give feedback about fatigue-related cognitions and behaviours to fellow patients. This aspect was introduced to facilitate a pro-active attitude and to avoid misperceptions of the sessions as support group meetings which have been shown to be insufficient for the treatment of CFS.

And note:

In contrast to our previous work [4], we communicated recovery in terms of fatigue and disabilities as general goal of the intervention.

Some impressions of the intensity of this treatment. This is a rather intensive treatment with patients having considerable opportunities for interactions with providers. This factor alone distinguishes being assigned to the intervention group versus being left in the wait list control group and could prove powerful. It will be difficult to distinguish intensity of contact from any content or active ingredients of the therapy.

I’ll leave for another time a fuller discussion of the extent to which what was labeled as cognitive behavior therapy in this study is consistent with cognitive therapy as practiced by Aaron Beck and other leaders of the field. However, a few comments are warranted. What is offered in this trial does not sound like cognitive therapy as Americans practice it. What is often in this trial seems emphasize challenging beliefs, pushing patients to get more active, along with psychoeducational activities. I don’t see indications of the supportive, collaborative relationship in which patients are encouraged to work on what they want to work on, engage in outside activities (homework assignments) and get feedback.

What is missing in this treatment is what Beck calls collaborative empiricism, “a systemic process of therapist and patient working together to establish common goals in treatment, has been found to be one of the primary change agents in cognitive-behavioral therapy (CBT).”

Importantly, in Beck’s approach, the therapist does not assume cognitive distortions on the part of the patient. Rather, in collaboration with the patient, the therapist introduces alternatives to the interpretations that the patient has been making and encourages the patient to consider the difference. In contrast, rather than eliciting goal statements from patients, therapist in this study imposes the goal of increased activity. Therapists in this study also seem ready to impose their views that the patients’ fatigue-related beliefs are maladaptive.

The treatment offered in this trial is complex, with multiple components making multiple assumptions that seem quite different from what is called cognitive therapy or cognitive behavioral therapy in the US.

The authors’ communication of recovery from fatigue and disability seems a radical departure not only from cognitive behavior therapy for anxiety and depression and pain, but for cognitive behavior therapy offered for adaptation to acute and chronic physical illnesses. We will return to this “communication” later.

The control group

Patients not randomized to group CBT were placed on a waiting list.

Think about it! What do patients think about having gotten involved in all the inconvenience and burden of a clinical trial in hope that they would get treatment and then being assigned to the control group with just waiting? Not only are they going to be disappointed and register that in their subjective evaluations of the outcome assessments patients may worry about jeopardizing the right to the treatment they are waiting for if they overly endorse positive outcomes. There is a potential for  nocebo effect , compounding the placebo effect of assignment to the CBT active treatment groups.

What are informative comparisons between active treatments and  control conditions?

We need to ask more often what inclusion of a control group accomplishes for the evaluation of a psychotherapy. In doing so, we need to keep in mind that psychotherapies do not have effect sizes, only comparisons of psychotherapies and control condition have effect sizes.

A pre-post evaluation of psychotherapy from baseline to follow-up includes the effects of any active ingredient in the psychotherapy, a host of nonspecific (placebo) factors, and any changes that would’ve occurred in the absence of the intervention. These include regression to the mean– patients are more likely to enter a clinical trial now, rather than later or previously, if there has been exacerbation of their symptoms.

So, a proper comparison/control condition includes everything that the patients randomized to the intervention group get except for the active treatment. Ideally, the intervention and the comparison/control group are equivalent on all these factors, except the active ingredient of the intervention.

That is clearly not what is happening in this trial. Patients randomized to the intervention group get the intervention, the added intensity and frequency of contact with professionals that the intervention provides, and all the support that goes with it; and the positive expectations that come with getting a therapy that they wanted.

Attempts to evaluate the group CBT versus the wait-list control group involved confounding the active ingredients of the CBT and all these nonspecific effects. The deck is clearly being stacked in favor of CBT.

This may be a randomized trial, but properly speaking, this is not a randomized controlled trial, because the comparison group does not control for nonspecific factors, which are imbalanced.

The unblinded nature of the trial

In RCTs of psychotropic drugs, the ideal is to compare the psychotropic drug to an inert pill placebo with providers, patients, and evaluate being blinded as to whether the patients received psychotropic drug or the comparison pill.

While it is difficult to achieve a comparable level of blindness and a psychotherapy trial, more of an effort to achieve blindness is desirable. For instance, in this trial, the authors took pains to distinguish the CBT from what would’ve happened in a support group. A much more adequate comparison would therefore be CBT versus either a professional or peer-led support group with equivalent amounts of contact time. Further blinding would be possible if patients were told only two forms of group therapy were being compared. If that was the information available to patients contemplating consenting to the trial, it wouldn’t have been so obvious from the outset to the patients being randomly assigned that one group was preferable to the other.

Subjective self-report outcomes.

The primary outcomes for the trial were the fatigue subscale of the Checklist Individual Strength;  the physical functioning subscale of the Short Health Survey 36 (SF-36); and overall impairment as measured by the Sickness Impact Profile (SIP).

Realistically, self-report outcomes are often all that is available in many psychotherapy trials. Commonly these are self-report assessments of anxiety and depressive symptoms, although these may be supplemented by interviewer-based assessments. We don’t have objective biomarkers with which to evaluate psychotherapy.

These three self-report measures are relatively nonspecific, particularly in a population that is not characterized by ME. Self-reported fatigue in a primary care population lacks discriminative validity with respect to pain, anxiety and depressive symptoms, and general demoralization.  The measures are susceptible to receipt of support and re-moralization, as well as gratitude for obtaining a treatment that was sought.

Self-report entry criteria include a score 35 or higher on the fatigue severity subscale. Yet, a score of less than 35 on this scale at follow up is part of what is defined as a clinically significant improvement with a composite score from combined self-report measures.

We know from medical trials that differences can be observed with subjective self-report measures that will not be found with objective measures. Thus, mildly asthmatic patients will fail to distinguish in their subjective self-reports between [  between the effective inhalant albuterol, an inert inhalant, and sham acupuncture, but will rate improvement better than getting no intervention.  However,  there will be a strong advantage over the other three conditions with an objective measure, maximum forced expiratory volume in 1 second (FEV1) as assessed  with spirometry.

The suppression of objective outcome measures

We cannot let these the authors of this trial off the hook in their dependence on subjective self-report outcomes. They are instructing patients that recovery is the goal, which implies that it is an attainable goal. We can reasonably be skeptical about acclaim of recovery based on changes in self-report measures. Were the patients actually able to exercise? What was their exercise capacity, as objectively measured? Did they return to work?

These authors have included such objective measurements in past studies, but not included them as primary outcomes, nor, even in some cases, reported them in the main paper reporting the trial.

Wiborg JF, Knoop H, Stulemeijer M, Prins JB, Bleijenberg G. How does cognitive behaviour therapy reduce fatigue in patients with chronic fatigue syndrome? The role of physical activity. Psychol Med. 2010 Jan 5:1

The senior authors’ review fails to mention their three studies using actigraphy that did not find effects for CBT. I am unaware of any studies that did find enduring effects.

Perhaps this is what they mean when they say the protocol has been developed over time – they removed what they found to be threats to the findings that they wanted to claim.

Dismissing of any need to consider negative effects of treatment

Most psychotherapy fail to assess any adverse effects of treatment, but this is usually done discretely, without mention. In contrast, this article states

Potential harms of the intervention were not assessed. Previous research has shown that cognitive behavioural interventions for CFS are safe and unlikely to produce detrimental effects.

Patients who meet stringent criteria for ME would be put at risk for pressure to exert themselves. By definition they are vulnerable to postexertional malaise (PEM). Any trail of this nature needs to assess that risk. Maybe no adverse effects would be found. If that were so, it would strongly indicate the absence of patients with appropriate diagnoses.

Timing of assessment of outcomes varied between intervention and control group.

I at first did not believe what I was reading when I encountered this statement in the results section.

The mean time between baseline and second assessment was 6.2 months (SD = 0.9) in the control condition and 12.0 months (SD = 2.4) in the intervention group. This difference in assessment duration was significant (p < 0.001) and was mainly due to the fact that the start of the therapy groups had to be frequently postponed because of an irregular patient flow and limited treatment capacities for group therapy at our clinic. In accordance with the treatment manual, the second assessment was postponed until the fourteenth group session was accomplished. The mean time between the last group session and the second assessment was 3.3 weeks (SD = 3.5).

So, outcomes were assessed for the intervention group shortly after completion of therapy, when nonspecific (placebo) effects would be stronger, but a mean of six months later than for patients assigned to the control condition.

Post-hoc statistical controls are not sufficient to rescue the study from this important group difference, and it compounds other problems in the study.

Take away lessons

Pay more attention to how limitations any clinical trial may compound each other in terms of the trial provide exaggerated estimates of the effects of treatment or the generalizability of the results to other settings.

Be careful of loose diagnostic criteria because a trial may not generalize to the same criteria being applied in settings that are different either in terms of patient population of the availability of different treatments. This is particularly important when a treatment setting has a bias in referrals and only a minority of patients being invited to participate in the trial actually agree and are enrolled.

Ask questions about just what information is obtained in comparing active treatment group and the study to its control/comparison. For start, just what is being controlled and how might that affect the estimates of the effectiveness of the active treatment?

Pay particular attention to the potent combination of the trial being unblinded, a weak comparision/control, and an active treatment that is not otherwise available to patients.

Note

*The means of determining whether the six months of fatigue might be accounted for by other medical factors was specific to the setting. Note that a review of medical records for sufficient for an unknown proportion of patients, with no further examination or medical tests.

The Department of Internal Medicine at the Radboud University Medical Center assessed the medical examination status of all patients and decided whether patients had been sufficiently examined by a medical doctor to rule out relevant medical explanations for the complaints. If patients had not been sufficiently examined, they were seen for standard medical tests at the Department of Internal Medicine prior to referral to our outpatient clinic. In accordance with recommendations by the Centers for Disease Control, sufficient medical examination included evaluation of somatic parameters that may provide evidence for a plausible somatic explanation for prolonged fatigue [for a list, see [9]. When abnormalities were detected in these tests, additional tests were made based on the judgement of the clinician of the Department of Internal Medicine who ultimately decided about the appropriateness of referral to our clinic. Trained therapists at our clinic ruled out psychiatric comorbidity as potential explanation for the complaints in unstructured clinical interviews.

workup

Is Donald Trump suffering from Pick’s Disease (frontotemporal dementia)?

Changing the conversation about Donald Trump’s fitness for office from whether he has a personality disorder to whether he has an organic brain disorder.

mind the brain logoChanging the conversation about Donald Trump’s fitness for office from whether he has a personality disorder to whether he has an organic brain disorder.

Trump.jpgFor a long while there has been an ongoing debate about whether Donald Trump suffers from a personality disorder that might contribute to his being unfit the President of the United States. Psychiatrists have ethical constraints in what they say because of the so-called Goldwater rule, barring them from commenting on the mental health of political figures that they have not personally  interviewed.

I am a clinical psychologist, not a psychiatrist. I feel the need to speak out that the behavior of Donald Trump is abnormal and we should caution against normalizing it. The problem with settling on his behavior being simply that of a bad person or con man is it doesn’t prepare us for just how erratic his behavior can be.

I’ll refrain from making a formal psychiatric diagnosis. I actually think that in clinical practice, a lot of mental health professionals too casually make diagnoses of personality disorders for patients (or privately, even for colleagues) they find difficult or annoying.  If they ever gave these people a structured interview,  I suspect they would be found to fall  below the threshold for any particular personality disorder.

Changing the conversation

But now an article in Stat has changed the conversation to whether Donald Trump suffers from personality disorder to whether he is developing an organic brain disorder.

I’m a brain specialist. I think Trump should be tested for a degenerative brain disease

When President Trump slurred his words during a news conference this week, some Trump watchers speculated that he was having a stroke. I watched the clip and, as a physician who specializes in brain function and disability, I don’t think a stroke was behind the slurred words. But having evaluated the chief executive’s remarkable behavior through my clinical lens for almost a year, I do believe he is displaying signs that could indicate a degenerative brain disorder.

As the president’s demeanor and unusual decisions raise the potential for military conflict in two regions of the world, the questions surrounding his mental competence have become urgent and demand investigation.

And

I see worrisome symptoms that fall into three main categories: problems with language and executive function; problems with social cognition and behavior; and problems with memory, attention, and concentration. None of these are symptoms of being a bad or mean person. Nor do they require spelunking into the depths of his psyche to understand. Instead, they raise concern for a neurocognitive disease process in the same sense that wheezing raises the alarm for asthma.

In addition to being a medical journalist, the author Ford Vox of the article is a neurorehabilitation physician who is board-certified physical medicine and rehabilitation physician with additional subspecialty board certification in brain injury medicine.

I was alerted by the possibility of a diagnosis of frontotemporal dementia by a tweet by Barney Carroll. He is a senior psychiatrist whom I have come to trust as a mentor on social media, even though we’ve never overlapped in the same department at the same time.

barney forget psychnoanalysis

And then there was this tweet about the Stat story, but I could judge its credibility because I did not know the tweeter or her source:

trump's disease

I followed up with a Google search and came across an article from August 2016, before the election:

Finally figured out Trump’s medical diagnosis after watching this:

It’s called Pick’s Disease, or frontotemporal dementia

Look at the symptoms, all of these which fit Trump quite closely:

  • Impulsivity and poor judgment
  • Extreme restlessness (early stages)
  • Overeating or drinking to excess
  • Sexual exhibitionism or promiscuity
  • Decline in function at work and home
  • Repetitive or obsessive behavior

And especially these, listed earlier in the article:

Excess protein build-up causes the frontal and temporal lobes of the brain, which control speech and personality, to slowly atrophy. 

Then I followed up with more Google searches, hitting MedLine Plus,  the website maintained by the National Institutes of Health’s Web site for patients and their families and friends and produced by the National Library of Medicine.

Pick disease

Pick disease is a rare form of dementia that is similar to Alzheimer disease, except that it tends to affect only certain areas of the brain.

Causes

People with Pick disease have abnormal substances (called Pick bodies and Pick cells) inside nerve cells in the damaged areas of the brain.

Pick bodies and Pick cells contain an abnormal form of a protein called tau. This protein is found in all nerve cells. But some people with Pick disease have an abnormal amount or type of this protein.

The exact cause of the abnormal form of the protein is unknown. Many different abnormal genes have been found that can cause Pick disease. Some cases of Pick disease are passed down through families.

Pick disease is rare. It can occur in people as young as 20. But it usually begins between ages 40 and 60. The average age at which it begins is 54.

Symptoms

The disease gets worse slowly. Tissues in parts of the brain shrink over time. Symptoms such as behavior changes, speech difficulty, and problems thinking occur slowly and get worse.

Early personality changes can help doctors tell Pick disease apart from Alzheimer disease. (Memory loss is often the main, and earliest, symptom of Alzheimer disease.)

People with Pick disease tend to behave the wrong way in different social settings. The changes in behavior continue to get worse and are often one of the most disturbing symptoms of the disease. Some persons have more difficulty with decision making, complex tasks, or language (trouble finding or understanding words or writing).

The website notes

A brain biopsy is the only test that can confirm the diagnosis.

However, some alternative diagnoses can be ruled out:

Your doctor might order tests to help rule out other causes of dementia, including dementia due to metabolic causes. Pick disease is diagnosed based on symptoms and results of tests, including:

Assessment of the mind and behavior (neuropsychological assessment)

Brain MRI

Electroencephalogram (EEG)

Examination of the brain and nervous system (neurological exam)

Examination of the fluid around the central nervous system (cerebrospinal fluid) after a lumbar puncture

Head CT scan

Tests of sensation, thinking and reasoning (cognitive function), and motor function

Back to Ford Vox in his Stats article:

In Trump’s case, we have no relevant testing to review. His personal physician issued a thoroughly unsatisfying letter before the election that didn’t contain much in the way of hard data. That’s a situation many people want to correct via an independent medical panel that can objectively evaluate the president’s fitness to serve. But the prospects for getting Congress to use the 25th Amendment in this way seem poor at the moment.

What we do have are a growing array of signs and symptoms displayed in public for all to see. It’s time to discuss these issues in a clinical context, even if this is a very atypical form of examination. It’s all we have. And even if the president has a physical exam early next year and releases the records, as announced by the White House, what he really needs is thorough cognitive testing.

So?

Before biting the bullet, I also spoke with Dr. Dennis Agliano, who chairs the AMA’s Council on Ethical and Judicial Affairs, the panel that wrote the new ethical guidance. He advised me to be careful: “You can get yourself into hot water, since there are people who like Trump, and they may submit a complaint to the AMA,” the Tampa otolaryngologist told me. Ultimately, he reassured me that I should just do what I think is right.

Which is warn the president that he needs to be evaluated for a brain disease.

Good luck, Dr Vox, but at least we have a reasonable hypothesis on the table. As Barney Carroll says “Time will tell.”

slurred speech

Using F1000 “peer review” to promote politics over evidence about delivering psychosocial care to cancer patients

The F 1000 platform allowed authors and the reviewers whom they nominated to collaborate in crafting more of their special interest advocacy that they have widely disseminated elsewhere. Nothing original in this article and certainly not best evidence!

 

mind the brain logo

A newly posted article on the F1000 website raises questions about what the website claims is a “peer-reviewed” open research platform.

Infomercial? The F1000 platform allowed authors and the reviewers whom they nominated to collaborate in crafting more of their special interest advocacy that they have widely disseminated elsewhere. Nothing original in this article and certainly not best evidence!

I challenge the authors and the reviewers they picked to identify something said in the F1000 article that they have not said numerous times before either alone or in papers co-authored by some combination of authors and the reviewers they picked for this paper.

F1000 makes the attractive and misleading claim that versions of articles that are posted on its website reflect the response to reviewers.

Readers should be aware of uncritically accepting articles on the F 1000 website as having been peer-reviewed in any conventional sense of the term.

Will other special interests groups exploit this opportunity to brand their claims as “peer-reviewed” without the risk of having to tone down their claims in peer review? Is this already happening?

In the case of this article, reviewers were all chosen by the authors and have a history of co-authoring papers with the authors of the target paper in active advocacy of a shared political perspective, one that is contrary to available evidence.

Cynically, future authors might be motivated to divide their team, with some remaining authors and others dropping off to become nominated as reviewers. They could then suggest content that had already been agreed would be included, but was left off for the purposes being suggested in the review process

F1000

F1000Research bills itself as

An Open Research publishing platform for life scientists, offering immediate publication of articles and other research outputs without editorial bias. All articles benefit from transparent refereeing and the inclusion of all source data.

Material posted on this website is labeled as having received rapid peer-review:

Articles are published rapidly as soon as they are accepted, after passing an in-house quality check. Peer review by invited experts, suggested by the authors, takes place openly after publication.

My recent Google Scholar alert call attention to an article posted on F1000

Advancing psychosocial care in cancer patients [version 1; referees: 3 approved]

 Who were the reviewers?

open peer review of Advancing psychosocial care

Google the names of authors and reviewers. You will discover a pattern of co-authorship; leadership positions in international Psycho-Oncology society, a group promoting the mandating of specially mental health services for cancer patients, and lots of jointly and separately authored articles making a pitch for increased involvement of mental health professionals in routine cancer care. This article adds almost nothing to what is multiply available elsewhere in highly redundant publications

Given a choice of reviewers, these authors would be unlikely to nominate me. Nonetheless, here is my review of the article.

 As I might do in a review of a manuscript, I’m not providing citations for these comments, but support can readily be found by a search of blog posts at my website @CoyneoftheRealm.com and Google Scholar search of my publications. I welcome queries from anybody seeking documentation of these points below.

 Fighting Spirit

The notion that cancer patients having a fighting spirit improves survival is popular in the lay press and in promoting the power of the mind over cancer, but it has thoroughly been discredited.

Early on, the article identifies fighting spirit as an adaptive coping style. In actuality, fighting spirit was initially thought to predict mortality in a small methodologically flawed study. But that is no longer claimed.

Even one of the authors of the original study, Maggie Watson,  expressed relief when her own larger, better designed study failed to confirm the impression that a fighting spirit extended life after diagnosis  of cancer. Why? Dr. Watson was concerned that the concept was being abused in blaming cancer patients who were dying there was to their personal deficiency of not having enough fighting spirit.

Fighting spirit is rather useless as a measure of psychological adaptation. It confounds severity of cancer enrolled dysfunction with efforts to cope with cancer.

Distress as the sixth vital sign for cancer patients

distress thermometerBeware of a marketing slogan posing as an empirical statement. Its emptiness is similar to that of to “Pepsi is the one.” Can you imagine anyone conducting a serious study in which they conclude “Pepsi is not the one”?

Once again in this article, a vacuous marketing slogan is presented in impressive, but pseudo-medical terms. Distress cannot be a vital sign in the conventional sense. Thr  vital signs are objective measurements that do not depend on patient self-report: body temperature, pulse rate, and respiration rate (rate of breathing) (Blood pressure is not considered a vital sign, but is often measured along with the vital signs.).

Pain was declared a fifth vital sign, with physicians mandated  by guidelines to provide routine self-report screening of patients, regardless of their reasons for visit. Pain being the fifth vital sign seems to have been the inspiration for declaring distress as the sixth vital sign for cancer patients. However policy makers declaring pain  as the fifth vital sign did not result in improved patient levels of pain. Their subsequent making intervention mandatory for any reports of pain led to a rise in unnecessary back and knee surgery, with a substantial rise in associated morbidity and loss of function. The next shift to prescription of opioids that were claimed not to be addictive was the beginning of the current epidemic of addiction to prescription opioids. Making pain the fifth vital sign is killed a lot of patients and  turned others into addicts craving drugs on the street because they have lost their prescriptions for the opioids that addicted them.

pain as 5th vital signCDC launches

 Cancer as a mental health issue

There is a lack of evidence that cancer carries a risk of psychiatric disorder more than other chronic and catastrophic illnesses. However, the myth that there is something unique or unusual about cancer’s threat to mental health is commonly cited by mental health professional advocacy groups is commonly used to justify increased resources to them for specialized services.

The article provides an inflated estimate of psychiatric morbidity by counting adjustment disorders as psychiatric disorders. Essentially, a cancer patient who seeks mental health interventions for distress qualifies by virtue of help seeking being defined as impairment.

The conceptual and empirical muddle of “distress” in cancer patients

The article repeats the standard sloganeering definition of distress that the authors and reviewers have circulated elsewhere.

It has been very broadly defined as “a multifactorial, unpleasant, emotional experienceof a psychological (cognitive, behavioural, emotional), social and/or spiritual nature that may interfere with the ability to cope effectively with cancer, its physical symptoms and its treatment and that extends along a continuum, ranging from common normalfeelings of vulnerability, sadness and fears to problems that can become disabling, such as depression, anxiety, panic, social isolation and existential and spiritual crisis”5

[You might try googling this. I’m sure you’ll discover an amazing number of repetitions in similar articles advocating increasing psychosocial services for cancer patients organized around this broad definition.]

Distress is so broadly defined and all-encompassing, that there can be no meaningful independent validation of distress measures except for by other measures of distress, not conventional measures of adaptation or mental health. I have discussed that in a recent blog post.

If we restrict “distress” to the more conventional meaning of stress or negative affect, we find that any elevation in distress (usually 35% or so) associated with onset diagnosis of cancer tends to follow a natural trajectory of decline without formal intervention. Elevations in distress for most cancer patients, are resolved within 3 to 6 months without intervention. A residual 9 to 11% of cancer patients having elevated distress is likely attributed to pre-existing psychiatric disorder.

Routine screening for distress

The slogan “distress is the sixth vital sign” is used to justify mandatory routine screening of cancer patients for distress. In the United States, surgeons cannot close their electronic medical records for a patient and go on to the next patient without recording whether they had screened patients for distress, and if the patient reports distress, what intervention has been provided. Clinicians simply informally asking patients if they are distressed and responding to a “yes” by providing the patient with an antidepressant without further follow up allows surgeons to close the medical records.

As I have done so before, I challenge advocates of routine screening of cancer patients for distress to produce evidence that simply introducing routine screening without additional resources leads to better patient outcomes.

Routine screening for distress as uncovering unmet needs among cancer patients

 Studies in the Netherlands suggest that there is not a significant increase in need for services from mental health or allied health professionals associated with diagnosis of cancer. There is some disruption of such services that patients were receiving before diagnosis. It doesn’t take screening and discussion to suggest that patients that they at some point resume those services if they wish. There is also some increased need for physical therapy and nutritional counseling

If patients are simply asked a question whether they want a discussion of the services (in Dutch: Zou u met een deskundige willen praten over uw problemen?)  that are available, many patients will decline.

Much of demand for supportive services like counseling and support groups, especially among breast cancer patients is not from among the most distressed patients. One of the problems with clinical trials of psychosocial interventions is that most of the patients who seek enrollment are not distressed, and less they are prescreened. This poses dilemma: if you require elevated distress on a screening instrument, we end up rationing services and excluding many of the patients who would otherwise be receiving them.

I welcome clarification from F 1000 just what they offer over other preprint repositories. When one downloads a preprint from some other repositories, it clearly displays “not yet peer-reviewed.” F 1000 carries the advantage of the label of “peer-reviewed, but does not seem to be hard earned.

Notes

Slides are from two recent talks at Dutch International Congress on Insurance Medicine Thursday, November 9, 2017, Almere, Netherlands   :

Will primary care be automated screening and procedures or talking to patients and problem-solving? Invited presentation

and

Why you should not routinely screen your patients for depression and what you should do instead. Plenary Presentation

        

                                  

 

 

 

Stop using the Adverse Childhood Experiences Checklist to make claims about trauma causing physical and mental health problems

Scores on the adverse childhood experiences (ACE) checklist (or ACC) are widely used in making claims about the causal influence of childhood trauma on mental and physical health problems. Does anyone making these claims bother to look at how the checklist is put together and consider what a summary score might mean?

 

mind the brain logo

Scores on the adverse childhood experiences (ACE) checklist (or ACC) are widely used in making claims about the causal influence of childhood trauma on mental and physical health problems. Does anyone making these claims bother to look at how the checklist is put together and consider what a summary score might mean?

In this issue of Mind the Brain, we begin taking a skeptical look at the ACE checklist. We ponder some of the assumptions implicit in what items were included and how summary scores of the number of items checked are interpreted. Readers will be left with profound doubts that the ACE is suitable for making claims about trauma.

This blog will eventually be followed by another that presents the case that scores on the ACC do not represent a risk factor for health problems, only a relatively uninformative risk marker. In contrast to potentially modifiable risk factors, risk markers are best interpreted as calling attention to the influence of some combination of other risk factors, many of as yet unspecified, but undoubtedly of an entirely different nature than what is being studied. What?!! You will have to stay tuned, but I’ll give some hints about what I am talking about in the current blog post.

Summary of key points

 The ACE checklist is a collection of very diverse and ambiguous items that cannot be presumed to necessarily represent traumatic experiences.

Items variously

  • Represent circumstances that are not typically traumatic.
  • Reflect the respondent’s past or current psychopathology.
  • Make equivalent and traumatic vastly different experiences, many neutral and some that are positive.
  • Reinterpret a personal vulnerability due to familial transmission of psychopathology, either direct or indirect, rather than simply an exposure to events.
  • Ignore crucial contextual information, including timing of events.

There is reason not to assume that higher summed scores for the ACE represent more exposure to trauma than lower scores.

Are professionals misinterpreting the ACE checklist just careless or are they ideologues selectively identifying “evidence” for their positions which don’t depend on evidence at all?

ace-7Witness claims based on research with the ACE that migraines are caused by sexual abuse   and that psychotherapy addressing that abuse should be first line treatment. Or claims that childhood trauma is as strong a risk factor for psychosis and schizophrenia as smoking is for lung cancer [* ] and so psychotherapy is equivalent to medication in its effects. Or claims that myalgic encephalomyelitis, formerly known as chronic fatigue syndrome, is caused by childhood trauma and the psychological treatments can be recommended as the treatment of choice. These claims share a speculative, vague neo-cryptic pseudopsychoanalytic set of assumptions that is seldom articulated or explicitly confronted with evidence. Authors typically leap from claims about childhood trauma causing later problems to non sequitur claims about the efficacy of psychological intervention in treating these problems by addressing trauma. These claims about efficacy of trauma-focused treatment are not borne out in actually examining effects observed in randomized controlled trials.

Rather than attempting to address a provocative question about investigator motivation without a ready way of answering it, I will show most claims about trauma causing mental and physical health problems are, at best, based on very weak evidence, if they depend solely on the ACE checklist.

I will leave for my readers to decide if some authors who make such a fuss about the ACE have bothered to look at the instrument or care that is so inappropriate for the purposes to which they put it.

The ACE is reproduced at the bottom of this post and it is a good idea to compare what I’m saying about it to the actual checklist.

e5fc302ac1fabf0757e62a935b27800d
What “science” is behind such speculations?

The ACE was originally intended for educational purposes, not as a scientific instrument. Perhaps that explains its gross deficiencies as a key measure of psychological and epidemiological constructs.

The ACE checklist is a collection of very different and ambiguous items that cannot be presumed to represent traumatic experiences.

The ACE consists of ten dichotomous items for which the respondent is asked to indicate no/yes whether an experience occurred before the age of 18.  However, for six of the 10 items, the respondent is given further choices  that often differ greatly in the kind of experience to which the items refer. Scoring of the instrument does not take which of these experiences is the basis of a response. For example,

5. Did you often feel that … You didn’t have enough to eat, had to wear dirty clothes, and had no one to protect you? or

Your parents were too drunk or high to take care of you or take you to the doctor if you needed it?

Yes   No     If yes enter 1     ________

This item treats some very different circumstances as equivalent. The first half is complex, but largely covers the experience of living in poverty, but combines that with “having no one to protect you.” In contrast, the second half refers to substance abuse on the part of parents. In neither case, is there any room for interpreting what mitigating circumstances in the respondent’s life might have influenced effects of exposure. Presumably, the timing of this exposure would be important. If the exposure only occurred at the end of the 18 year period covered by the checklist, effects could be mitigated by other individual and social resources the respondent had.

Single items that are added together in a summary score.  We have to ask whether there is an equivalency between the two halves of the item that will be treated as the same. This will be an accumulating concern as we go through the 10 item questionnaire

The items vary greatly in the likelihood that they refer to an experience that was traumatic. Seldom do any of the researchers who use the ACE explain what they mean by trauma. If they did, I doubt that they could make a good argument that in endorsing many of these items would indicate that a respondent had faced a trauma.

From the third edition of the American Psychiatric Association Diagnostic and Statistical Manual (DSM-III) onward to DSM-5, the assumption has been that a traumatic event is a catastrophic stressor outside the range of usual human experience.

With that criteria in mind we have to ask if items are likely to represent a traumatic experience for most people. In answering this question, we also have to ask how we willing to consider a particular item is equivalent to other items in arriving at an overall score reflecting exposure to trauma before age 18. Yet, if summary scores are to be meaningful, assumption has to be made that items contribute equally if they are endorsed

6. Were your parents ever separated or divorced?

Yes   No     If yes enter 1     ________

The item refers to a highly prevalent and complex event, the nature and consequences of which are likely to unfold over time. Importantly, we need a sense of context to judge whether the event is traumatic and, if so how severe. Presumably, it would matter greatly when, across the 18 year span, the event that occurred. No timing or other information is asked of the respondent, only whether or not this event occurred. Neither the respondent nor anyone interpreting a score on the inventory has further information as to what is meant.

Other problems with ambiguous items.

Questions can be raised about the validity of all the individual items and the wisdom of combining them as equivalent in creating a summary score.

Items 1 and 2: Items raise questions about what role the respondent played eliciting the event.

 Did an event simply befall a respondent? Was it related to some pre-existing characteristic of the respondent? Or did the respondent have an active role in generating the event?

Did a parent or other adult member of the household often…

Swear at you, insult you, put you down, or humiliate you?

or

Act in a way that made you afraid that you might be physically hurt?

Yes   No     If yes enter 1     ________

And

Did a parent or other adult in the household often …

Push, grab, slap, or throw something at you?

or

Ever hit you so hard that you had marks or were injured?

Yes   No     If yes enter 1     ________

 Here, as throughout the rest of the checklist, questions can be raised about whether these items refer simply to an environmental exposure in epidemiological terms, say, equivalent to asbestos or tobacco. We don’t know the frequency, intensity or context of a the behavior in question, all of which may be crucial in evaluating whether a trauma occurred. For instance, it matters greatly if the behavior happened frequently when the respondent as a toddler or was limited to a struggle that occurred when the respondent was a teen high on drugs  attempting to take the car keys and go for a after midnight drive.

Like most of the rest of the questionnaire, there is the question of timing.

Item 3: There is so much ambiguity in endorsments of (ostensible) sexual abuse. Maybe it was a positive, liberating experience.

This is a crucial item and discussions of the ACE often assume that it is endorsed and represents a traumatic experience:

Did an adult or person at least 5 years older than you ever…

Touch or fondle you or have you touch their body in a sexual way?

or

Try to or actually have oral, anal, or vaginal sex with you?

Note that this is a complex item for which endorsement could be on the basis of a single instance of a person at least 5 years older touching or fondling the respondent. What if the presumed “perpetrator” is the 20 year old boyfriend or girlfriend of a 14 year old?

Are we willing to treat as equivalent “touch” or ‘fondle you” and “having anal sex” in all instances?

Arguably, the event which construed as trauma could actually be quite positive, as in the respondent  forming a secure attachment with a somewhat older, but nonetheless appropriate partner. All that is unconventional is not traumatic. What if the respondent and  alleged “perpetrator” were in a deeply intimate relationship or already married?

The research that attempts to link endorsement of such an item to lasting mental and physical health problems is remarkably contradictory and inconsistent 

Item 4:  Does this  item reflect the respondent’s serious clinical depression or other mental disorder before age 18 or currently, when the checklist is being completed?

Did you often feel that …  No one in your family loved you or thought you were important or special?    or

Your family didn’t look out for each other, feel close to each other, or support each other?

Yes   No     If yes enter 1     ________

As elsewhere in the checklist, there is no place for the respondent or someone interpreting a “yes” response for taking into account timing or contextual factors that might mitigate or compound effects of this “exposure.”

Item 5: Is this a  traumatic exposure or an enduring set of circumstances conferring multiple known risks to mental and physical health?

Did you often feel that …

You didn’t have enough to eat, had to wear dirty clothes, and had no one to protect you?

or

Your parents were too drunk or high to take care of you or take you to the doctor if you needed it?

Yes   No     If yes enter 1     ________

This item has already been discussed above, but is worth revisiting in terms of raising issues whether particular items refer either directly or indirectly to enduring sets of circumstances that pose their own enduring threat. The relevant question is whether items which ostensibly represent “traumatic events” and risk for subsequent problems are not risk factors, but only risk indicators, and not particularly informative ones.

Item 7: Could an ostensibly a traumatic exposure actually be no actual exposure?

Was your mother or stepmother:

Often pushed, grabbed, slapped, or had something thrown at her?    or

Sometimes or often kicked, bitten, hit with a fist, or hit with something hard?    or

Ever repeatedly hit over at least a few minutes or threatened with a gun or knife?

Yes   No     If yes enter 1     ________

Like item four, which refers to ostensible sexual abuse, this item seems to be one of the least ambiguous in terms of representing exposure to risk. But does it? We don’t know the timing, duration, or context. For instance, the mother might no longer be in the home and the respondent might not have known what happened at the time. There is even the possibility that the respondent was the “perpetrator” of such violence against the mother.

Items 8 and 9: Are traumatic exposures or indications of familial transmission of psychopathology?

Did you live with anyone who was a problem drinker or alcoholic or who used street drugs?

Yes   No

If yes enter 1     ________

And

Was a household member depressed or mentally ill or did a household member attempt suicide?    Yes   No     If yes enter 1     ________

These items are highly ambiguous. They don’t take in consideration whether the person was a biological relative, or whether they were a parent, sibling, or someone not biologically related. They don’t take into account timing. There may not have even been any direct exposure to the substance misuse or the attempted suicide, but the respondent only later learned of something that was closeted.

Item 10: traumatic exposure or relief from exposure?

Did a household member go to prison?

Yes   No

If yes enter 1     ________

The implications of endorsement of this item depend greatly on whom the household member was and the circumstances of them going to prison.

There may be a familial relationship with this person, but it could have been an abusive stepparents or stepsiblings, with the incarceration representing a lasting relief from some impressive situations. Or the person who became incarcerated was not an immediate family member, but somewhat more transient, maybe someone who was just renting a room or given a place to stay. We just don’t know.

Does adding up all these endorsements in a summary score clarify or confuse further?

Now add up your “Yes” answers:   _______   This is your ACE Score

 It would be useful to briefly review the assumptions involved in summing across items of a checklist and entering the summary score as a continuous variable in statistical analyses.

Classical test theory recognizes that the individual items may imperfectly reflect the underlying construct, in this case, traumatic exposure. However, in constructing a sum, the expectation is that the imperfections or errors of measurement in particular items cancel each other out. The summed score becomes a purer a representation of the underlying construct than any of the original items. Thus, the summary score will be more reliable and valid than any of the individual items would be.

There are a number of problems in applying this assumption to a summary ACE score. The items are quite heterogeneous, i.e., they vary wildly in whether they are likely to represent a traumatic exposure, and if so, the severity of that exposure. More importantly, there is a huge amount of variation in what these brief items would represent for particular individuals in the contexts they found themselves in the first 18 years of their lives. Undoubtedly, most endorsements of these items would represent false positives, if we hold ourselves to any strict definitions of trauma. If we don’t do so, we risk equating the only normative experiences that may have neutral or even positive effects on the respondent with serious exposures to traumatic events with lasting consequences

We are not in a position to know whether a score of five or even eight necessarily represents more traumatic exposure than a score of one.

Moreover, there is important empirical research of the clustering of events. We certainly cannot consider them random and unrelated. One classic study found 

In our data, total CCA was related to depressive symptoms, drug use, and antisocial behavior in a quadratic manner. Without further elucidation, this higher order relationship could have been interpreted as support for a sensitization process in which the long-term impact of each additional adversity on mental health compounds as childhood adversity accumulates. However, further analysis revealed that this acceleration effect was an artifact of the confounding of high cumulative adversity scores with the experience of more severe events. Thus, respondents with higher total CCA had disproportionately poorer emotional and behavioral functioning because of both the number and severity of the adversities they were exposed to, not the cumulative number of different types of adversities experienced.

And

Because low-impact adversities did not present a cumulative hazard to young adult mental health, they functioned as suppressor events in the total sum score, consistent with Turner and Wheaton’s (1997) expectation. Their inclusion increased the “noise” in the score and greatly watered down the influence of high-impact events. Thus, in addition to decreasing efficiency, total scores may seriously underestimate the cumulative effects of severe forms of childhood adversity, such as abuse and serious neglect.

But what if many or most of the high scores in a particular sample represent only a clustering of low- or no-impact adversities?

Another large-sample, key study cautioned:

Significant effects of parental separation}divorce in predicting subsequent mood disorders and addictive disorders are powerfully affected by whether or not there was parental violence and psychopathology in the household prior to the break-up and whether exposure to these adversities was reduced as a result of the separation (Kessler et al. 1997a). There are some situations – such as one in which the father was a violent alcoholic – where our data suggest that parental divorce and subsequent removal of the respondent from exposure to the father might actually be associated with a significant improvement in the respondent’s subsequent disorder risk profile, a possibility that has important social policy implications.

Finding Your ACE Score-page-0

NOTE

*Richard Bentall commonly interprets summed ACE scores in peer reviewed articles  as having a traditional dose-response association with mental health outcomes, and therefore as representing a modifiable causal factor in psychosis. In books and in social media, his claims become simply absurd.

bentall

I don’t think his interpretations withstand a scrutiny of the items and what a summed score might conceivably represent.

eBook_Mindfulness_345x550Preorders are being accepted for e-books providing skeptical looks at mindfulness and positive psychology, and arming citizen scientists with critical thinking skills. 

I will also be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. I want to give researchers the tools to get into the journals where their work will get the attention it deserves.

Sign up at my website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.