When psychotherapy trials have multiple flaws…

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

mind the brain logo

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

We can learn to spot features of psychotherapy trials that are likely to lead to exaggerated claims of efficacy for treatments or claims that will not generalize beyond the sample that is being studied in a particular clinical trial. We can look to the adequacy of sample size, and spot what Cochrane collaboration has defined as risk of bias in their handy assessment tool.

We can look at the case-mix in the particular sites where patients were recruited.  We can examine the adequacy of diagnostic criteria that were used for entering patients to a trial. We can examine how blinded the trial was in terms of whoever assigned patients to particular conditions, but also what the patients, the treatment providers, and their evaluaters knew which condition to which particular patients were assigned.

And so on. But what about combinations of these factors?

We typically do not pay enough attention multiple flaws in the same trial. I include myself among the guilty. We may suspect that flaws are seldom simply additive in their effect, but we don’t consider whether they may be even synergism in the negative effects on the validity of a trial. As we will see in this analysis of a clinical trial, multiple flaws can provide more threats to the validity trial than what we might infer when the individual flaws are considered independently.

The particular paper we are probing is described in its discussion section as the “largest RCT to date testing the efficacy of group CBT for patients with CFS.” It also takes on added importance because two of the authors, Gijs Bleijenberg and Hans Knoop, are considered leading experts in the Netherlands. The treatment protocol was developed over time by the Dutch Expert Centre for Chronic Fatigue (NKCV, http://www.nkcv.nl; Knoop and Bleijenberg, 2010). Moreover, these senior authors dismiss any criticism and even ridicule critics. This study is cited as support for their overall assessment of their own work.  Gijs Bleijenberg claims:

Cognitive behavioural therapy is still an effective treatment, even the preferential treatment for chronic fatigue syndrome.

But

Not everybody endorses these conclusions, however their objections are mostly baseless.

Spoiler alert

This is a long read blog post. I will offer a summary for those who don’t want to read through it, but who still want the gist of what I will be saying. However, as always, I encourage readers to be skeptical of what I say and to look to my evidence and arguments and decide for themselves.

Authors of this trial stacked the deck to demonstrate that their treatment is effective. They are striving to support the extraordinary claim that group cognitive behavior therapy fosters not only better adaptation, but actually recovery from what is internationally considered a physical condition.

There are some obvious features of the study that contribute to the likelihood of a positive effect, but these features need to be considered collectively, in combination, to appreciate the strength of this effort to guarantee positive results.

This study represents the perfect storm of design features that operate synergistically:

perfect storm

 Referral bias – Trial conducted in a single specialized treatment setting known for advocating psychological factors maintaining physical illness.

Strong self-selection bias of a minority of patients enrolling in the trial seeking a treatment they otherwise cannot get.

Broad, overinclusive diagnostic criteria for entry into the trial.

Active treatment condition carry strong message how patients should respond to outcome assessment with improvement.

An unblinded trial with a waitlist control lacking the nonspecific elements (placebo) that confound the active treatment.

Subjective self-report outcomes.

Specifying a clinically significant improvement that required only that a primary outcome be less than needed for entry into the trial

Deliberate exclusion of relevant objective outcomes.

Avoidance of any recording of negative effects.

Despite the prestige attached to this trial in Europe, the US Agency for Healthcare Research and Quality (AHRQ) excludes this trial from providing evidence for its database of treatments for chronic fatigue syndrome/myalgic encephalomyelitis. We will see why in this post.

factsThe take away message: Although not many psychotherapy trials incorporate all of these factors, most trials have some. We should be more sensitive to when multiple factors occur in the same trial, like bias in the site for patient recruitment; lacking of blinding; lack of balance between active treatment and control condition in terms of nonspecific factors, and subjective self-report measures.

The article reporting the trial is

Wiborg JF, van Bussel J, van Dijk A, Bleijenberg G, Knoop H. Randomised controlled trial of cognitive behaviour therapy delivered in groups of patients with chronic fatigue syndrome. Psychotherapy and Psychosomatics. 2015;84(6):368-76.

Unfortunately, the article is currently behind a pay wall. Perhaps readers could contact the corresponding author Hans.knoop@radboudumc.nl  and request a PDF.

The abstract

Background: Meta-analyses have been inconclusive about the efficacy of cognitive behaviour therapies (CBTs) delivered in groups of patients with chronic fatigue syndrome (CFS) due to a lack of adequate studies. Methods: We conducted a pragmatic randomised controlled trial with 204 adult CFS patients from our routine clinical practice who were willing to receive group therapy. Patients were equally allocated to therapy groups of 8 patients and 2 therapists, 4 patients and 1 therapist or a waiting list control condition. Primary analysis was based on the intention-to-treat principle and compared the intervention group (n = 136) with the waiting list condition (n = 68). The study was open label. Results: Thirty-four (17%) patients were lost to follow-up during the course of the trial. Missing data were imputed using mean proportions of improvement based on the outcome scores of similar patients with a second assessment. Large and significant improvement in favour of the intervention group was found on fatigue severity (effect size = 1.1) and overall impairment (effect size = 0.9) at the second assessment. Physical functioning and psychological distress improved moderately (effect size = 0.5). Treatment effects remained significant in sensitivity and per-protocol analyses. Subgroup analysis revealed that the effects of the intervention also remained significant when both group sizes (i.e. 4 and 8 patients) were compared separately with the waiting list condition. Conclusions: CBT can be effectively delivered in groups of CFS patients. Group size does not seem to affect the general efficacy of the intervention which is of importance for settings in which large treatment groups are not feasible due to limited referral

The trial registration

http://www.isrctn.com/ISRCTN15823716

Who was enrolled into the trial?

Who gets into a psychotherapy trial is a function of the particular treatment setting of the study, the diagnostic criteria for entry, and patient preferences for getting their care through a trial, rather than what is being routinely provided in that setting.

 We need to pay particular attention to when patients enter psychotherapy trials hoping they will receive a treatment they prefer and not to be assigned to the other condition. Patients may be in a clinical trial for the betterment of science, but in some settings, they are willing to enroll because of a probability of getting treatment they otherwise could not get. This in turn also affects the evaluation of both the condition in which they get the preferred treatment, but also their evaluation of the condition in which they are denied it. Simply put, they register being pleased with what they wanted or not being pleased if they did not get what they wanted.

The setting is relevant to evaluating who was enrolled in a trial.

The authors’ own outpatient clinic at the Radboud University Medical Center was the site of the study. The group has an international reputation for promoting the biopsychosocial model, in which psychological factors are assumed to be the decisive factor in maintaining somatic complaints.

All patients were referred to our outpatient clinic for the management of chronic fatigue.

There is thus a clear referral bias  or case-mix bias but we are not provided a ready basis for quantifying it or even estimating its effects.

The diagnostic criteria.

The article states:

In accordance with the US Center for Disease Control [9], CFS was defined as severe and unexplained fatigue which lasts for at least 6 months and which is accompanied by substantial impairment in functioning and 4 or more additional complaints such as pain or concentration problems.

Actually, the US Center for Disease Control would now reject this trial because these entry criteria are considered obsolete, overinclusive, and not sufficiently exclusive of other conditions that might be associated with chronic fatigue.*

There is a real paradigm shift happening in America. Both the 2015 IOM Report and the Centers for Disease Control and Prevention (CDC) website emphasize Post Exertional Malaise and getting more ill after any effort with M.E. CBT is no longer recommended by the CDC as treatment.

cdc criteriaThe only mandatory symptom for inclusion in this study is fatigue lasting 6 months. Most properly, this trial targets chronic fatigue [period] and not the condition, chronic fatigue syndrome.

Current US CDC recommendations  (See box  7-1 from the IoM document, above) for diagnosis require postexertional malaise for a diagnosis of myalgic encephalomyelitis (ME). See below.

pemPatients meeting the current American criteria for ME would be eligible for enrollment in this trial, but it’s unclear what proportion of the patients enrolled actually met the American criteria. Because of the over-inclusiveness of the entry diagnostic criteria, it is doubtful whether the results would generalize to American sample. A look at patient flow into the study will be informative.

Patient flow

Let’s look at what is said in the text, but also in the chart depicting patient flow into the trial for any self-selection that might be revealed.

In total, 485 adult patients were diagnosed with CFS during the inclusion period at our clinic (fig. 1). One hundred and fifty-seven patients were excluded from the trial because they declined treatment at our clinic, were already asked to participate in research incompatible with inclusion (e.g. research focusing on individual CBT for CFS) or had a clinical reason for exclusion (i.e. they received specifically tailored interventions because they were already unsuccessfully treated with individual CBT for CFS outside our clinic or were between 18 and 21 years of age and the family had to be involved in the therapy). Of the 328 patients who were asked to engage in group therapy, 99 (30%) patients indicated that they were unwilling to receive group therapy. In 25 patients, the reason for refusal was not recorded. Two hundred and four patients were randomly allocated to one of the three trial conditions. Baseline characteristics of the study sample are presented in table 1. In total, 34 (17%) patients were lost to follow-up. Of the remaining 170 patients, 1 patient had incomplete primary outcome data and 6 patients had incomplete secondary outcome data.

flow chart

We see that the investigators invited two thirds of patients attending the clinic to enroll in the trial. Of these, 41% refused. We don’t know the reason for some of the refusals, but almost a third of the patients approached declined because they did not want group therapy. The authors left being able to randomize 42% of patients coming to the clinic or less than two thirds of patients they actually asked. Of these patients, a little more than two thirds received the treatment to which were randomized and were available for follow-up.

These patients receiving treatment to which they were randomized and who were available for follow-up are self-selected minority of the patients coming to the clinic. This self-selection process likely reduced the proportion of patients with myalgic encephalomyelitis. It is estimated that 25% of patients meeting the American criteria a housebound and 75% are unable to work. It’s reasonably to infer that patients being the full criteria would opt out of a treatment that require regular attendance of a group session.

The trial is biased to ambulatory patients with fatigue and not ME. Their fatigue is likely due to some combinations of factors such as multiple co-morbidities, as-yet-undiagnosed medical conditions, drug interactions, and the common mild and subsyndromal  anxiety and depressive symptoms that characterize primary care populations.

The treatment being evaluated

Group cognitive behavior therapy for chronic fatigue syndrome, either delivered in a small (4 patients and 1 therapist) or larger (8 patients and 2 therapists) group format.

The intervention consisted of 14 group sessions of 2 h within a period of 6 months followed by a second assessment. Before the intervention started, patients were introduced to their group therapist in an individual session. The intervention was based on previous work of our research group [4,13] and included personal goal setting, fixing sleep-wake cycles, reducing the focus on bodily symptoms, a systematic challenge of fatigue-related beliefs, regulation and gradual increase in activities, and accomplishment of personal goals. A formal exercise programme was not part of the intervention.

Patients received a workbook with the content of the therapy. During sessions, patients were explicitly invited to give feedback about fatigue-related cognitions and behaviours to fellow patients. This aspect was introduced to facilitate a pro-active attitude and to avoid misperceptions of the sessions as support group meetings which have been shown to be insufficient for the treatment of CFS.

And note:

In contrast to our previous work [4], we communicated recovery in terms of fatigue and disabilities as general goal of the intervention.

Some impressions of the intensity of this treatment. This is a rather intensive treatment with patients having considerable opportunities for interactions with providers. This factor alone distinguishes being assigned to the intervention group versus being left in the wait list control group and could prove powerful. It will be difficult to distinguish intensity of contact from any content or active ingredients of the therapy.

I’ll leave for another time a fuller discussion of the extent to which what was labeled as cognitive behavior therapy in this study is consistent with cognitive therapy as practiced by Aaron Beck and other leaders of the field. However, a few comments are warranted. What is offered in this trial does not sound like cognitive therapy as Americans practice it. What is often in this trial seems emphasize challenging beliefs, pushing patients to get more active, along with psychoeducational activities. I don’t see indications of the supportive, collaborative relationship in which patients are encouraged to work on what they want to work on, engage in outside activities (homework assignments) and get feedback.

What is missing in this treatment is what Beck calls collaborative empiricism, “a systemic process of therapist and patient working together to establish common goals in treatment, has been found to be one of the primary change agents in cognitive-behavioral therapy (CBT).”

Importantly, in Beck’s approach, the therapist does not assume cognitive distortions on the part of the patient. Rather, in collaboration with the patient, the therapist introduces alternatives to the interpretations that the patient has been making and encourages the patient to consider the difference. In contrast, rather than eliciting goal statements from patients, therapist in this study imposes the goal of increased activity. Therapists in this study also seem ready to impose their views that the patients’ fatigue-related beliefs are maladaptive.

The treatment offered in this trial is complex, with multiple components making multiple assumptions that seem quite different from what is called cognitive therapy or cognitive behavioral therapy in the US.

The authors’ communication of recovery from fatigue and disability seems a radical departure not only from cognitive behavior therapy for anxiety and depression and pain, but for cognitive behavior therapy offered for adaptation to acute and chronic physical illnesses. We will return to this “communication” later.

The control group

Patients not randomized to group CBT were placed on a waiting list.

Think about it! What do patients think about having gotten involved in all the inconvenience and burden of a clinical trial in hope that they would get treatment and then being assigned to the control group with just waiting? Not only are they going to be disappointed and register that in their subjective evaluations of the outcome assessments patients may worry about jeopardizing the right to the treatment they are waiting for if they overly endorse positive outcomes. There is a potential for  nocebo effect , compounding the placebo effect of assignment to the CBT active treatment groups.

What are informative comparisons between active treatments and  control conditions?

We need to ask more often what inclusion of a control group accomplishes for the evaluation of a psychotherapy. In doing so, we need to keep in mind that psychotherapies do not have effect sizes, only comparisons of psychotherapies and control condition have effect sizes.

A pre-post evaluation of psychotherapy from baseline to follow-up includes the effects of any active ingredient in the psychotherapy, a host of nonspecific (placebo) factors, and any changes that would’ve occurred in the absence of the intervention. These include regression to the mean– patients are more likely to enter a clinical trial now, rather than later or previously, if there has been exacerbation of their symptoms.

So, a proper comparison/control condition includes everything that the patients randomized to the intervention group get except for the active treatment. Ideally, the intervention and the comparison/control group are equivalent on all these factors, except the active ingredient of the intervention.

That is clearly not what is happening in this trial. Patients randomized to the intervention group get the intervention, the added intensity and frequency of contact with professionals that the intervention provides, and all the support that goes with it; and the positive expectations that come with getting a therapy that they wanted.

Attempts to evaluate the group CBT versus the wait-list control group involved confounding the active ingredients of the CBT and all these nonspecific effects. The deck is clearly being stacked in favor of CBT.

This may be a randomized trial, but properly speaking, this is not a randomized controlled trial, because the comparison group does not control for nonspecific factors, which are imbalanced.

The unblinded nature of the trial

In RCTs of psychotropic drugs, the ideal is to compare the psychotropic drug to an inert pill placebo with providers, patients, and evaluate being blinded as to whether the patients received psychotropic drug or the comparison pill.

While it is difficult to achieve a comparable level of blindness and a psychotherapy trial, more of an effort to achieve blindness is desirable. For instance, in this trial, the authors took pains to distinguish the CBT from what would’ve happened in a support group. A much more adequate comparison would therefore be CBT versus either a professional or peer-led support group with equivalent amounts of contact time. Further blinding would be possible if patients were told only two forms of group therapy were being compared. If that was the information available to patients contemplating consenting to the trial, it wouldn’t have been so obvious from the outset to the patients being randomly assigned that one group was preferable to the other.

Subjective self-report outcomes.

The primary outcomes for the trial were the fatigue subscale of the Checklist Individual Strength;  the physical functioning subscale of the Short Health Survey 36 (SF-36); and overall impairment as measured by the Sickness Impact Profile (SIP).

Realistically, self-report outcomes are often all that is available in many psychotherapy trials. Commonly these are self-report assessments of anxiety and depressive symptoms, although these may be supplemented by interviewer-based assessments. We don’t have objective biomarkers with which to evaluate psychotherapy.

These three self-report measures are relatively nonspecific, particularly in a population that is not characterized by ME. Self-reported fatigue in a primary care population lacks discriminative validity with respect to pain, anxiety and depressive symptoms, and general demoralization.  The measures are susceptible to receipt of support and re-moralization, as well as gratitude for obtaining a treatment that was sought.

Self-report entry criteria include a score 35 or higher on the fatigue severity subscale. Yet, a score of less than 35 on this scale at follow up is part of what is defined as a clinically significant improvement with a composite score from combined self-report measures.

We know from medical trials that differences can be observed with subjective self-report measures that will not be found with objective measures. Thus, mildly asthmatic patients will fail to distinguish in their subjective self-reports between [  between the effective inhalant albuterol, an inert inhalant, and sham acupuncture, but will rate improvement better than getting no intervention.  However,  there will be a strong advantage over the other three conditions with an objective measure, maximum forced expiratory volume in 1 second (FEV1) as assessed  with spirometry.

The suppression of objective outcome measures

We cannot let these the authors of this trial off the hook in their dependence on subjective self-report outcomes. They are instructing patients that recovery is the goal, which implies that it is an attainable goal. We can reasonably be skeptical about acclaim of recovery based on changes in self-report measures. Were the patients actually able to exercise? What was their exercise capacity, as objectively measured? Did they return to work?

These authors have included such objective measurements in past studies, but not included them as primary outcomes, nor, even in some cases, reported them in the main paper reporting the trial.

Wiborg JF, Knoop H, Stulemeijer M, Prins JB, Bleijenberg G. How does cognitive behaviour therapy reduce fatigue in patients with chronic fatigue syndrome? The role of physical activity. Psychol Med. 2010 Jan 5:1

The senior authors’ review fails to mention their three studies using actigraphy that did not find effects for CBT. I am unaware of any studies that did find enduring effects.

Perhaps this is what they mean when they say the protocol has been developed over time – they removed what they found to be threats to the findings that they wanted to claim.

Dismissing of any need to consider negative effects of treatment

Most psychotherapy fail to assess any adverse effects of treatment, but this is usually done discretely, without mention. In contrast, this article states

Potential harms of the intervention were not assessed. Previous research has shown that cognitive behavioural interventions for CFS are safe and unlikely to produce detrimental effects.

Patients who meet stringent criteria for ME would be put at risk for pressure to exert themselves. By definition they are vulnerable to postexertional malaise (PEM). Any trail of this nature needs to assess that risk. Maybe no adverse effects would be found. If that were so, it would strongly indicate the absence of patients with appropriate diagnoses.

Timing of assessment of outcomes varied between intervention and control group.

I at first did not believe what I was reading when I encountered this statement in the results section.

The mean time between baseline and second assessment was 6.2 months (SD = 0.9) in the control condition and 12.0 months (SD = 2.4) in the intervention group. This difference in assessment duration was significant (p < 0.001) and was mainly due to the fact that the start of the therapy groups had to be frequently postponed because of an irregular patient flow and limited treatment capacities for group therapy at our clinic. In accordance with the treatment manual, the second assessment was postponed until the fourteenth group session was accomplished. The mean time between the last group session and the second assessment was 3.3 weeks (SD = 3.5).

So, outcomes were assessed for the intervention group shortly after completion of therapy, when nonspecific (placebo) effects would be stronger, but a mean of six months later than for patients assigned to the control condition.

Post-hoc statistical controls are not sufficient to rescue the study from this important group difference, and it compounds other problems in the study.

Take away lessons

Pay more attention to how limitations any clinical trial may compound each other in terms of the trial provide exaggerated estimates of the effects of treatment or the generalizability of the results to other settings.

Be careful of loose diagnostic criteria because a trial may not generalize to the same criteria being applied in settings that are different either in terms of patient population of the availability of different treatments. This is particularly important when a treatment setting has a bias in referrals and only a minority of patients being invited to participate in the trial actually agree and are enrolled.

Ask questions about just what information is obtained in comparing active treatment group and the study to its control/comparison. For start, just what is being controlled and how might that affect the estimates of the effectiveness of the active treatment?

Pay particular attention to the potent combination of the trial being unblinded, a weak comparision/control, and an active treatment that is not otherwise available to patients.

Note

*The means of determining whether the six months of fatigue might be accounted for by other medical factors was specific to the setting. Note that a review of medical records for sufficient for an unknown proportion of patients, with no further examination or medical tests.

The Department of Internal Medicine at the Radboud University Medical Center assessed the medical examination status of all patients and decided whether patients had been sufficiently examined by a medical doctor to rule out relevant medical explanations for the complaints. If patients had not been sufficiently examined, they were seen for standard medical tests at the Department of Internal Medicine prior to referral to our outpatient clinic. In accordance with recommendations by the Centers for Disease Control, sufficient medical examination included evaluation of somatic parameters that may provide evidence for a plausible somatic explanation for prolonged fatigue [for a list, see [9]. When abnormalities were detected in these tests, additional tests were made based on the judgement of the clinician of the Department of Internal Medicine who ultimately decided about the appropriateness of referral to our clinic. Trained therapists at our clinic ruled out psychiatric comorbidity as potential explanation for the complaints in unstructured clinical interviews.

workup

Power Poseur: The lure of lucrative pseudoscience and the crisis of untrustworthiness of psychology

This is the second of two segments of Mind the Brain aimed at redirecting the conversation concerning power posing to the importance of conflicts of interest in promoting and protecting its scientific status. 

The market value of many lines of products offered to consumers depends on their claims of being “science-based”. Products from psychologists that invoke wondrous mind-body or brain-behavior connections are particularly attractive. My colleagues and I have repeatedly scrutinized such claims, sometimes reanalyzing the original data, and consistently find the claims false or premature and exaggerated.

There is so little risk and so much money and fame to be gained in promoting questionable and even junk psychological science to lay audiences. Professional organizations confer celebrity status on psychologists who succeed, provide them with forums and free publicity that enhance their credibility, and protect their claims of being “science-based” from critics.

How much money academics make from popular books, corporate talks, and workshops and how much media attention they garner serve as alternative criteria for a successful career, sometimes seeming to be valued more than the traditional ones of quality and quantity of publications and the amount of grant funding obtained.

Efforts to improve the trustworthiness of what psychologists publish in peer-reviewed have no parallel in any efforts to improve the accuracy of what psychologists say to the public outside of the scientific literature.

By the following reasoning, there may be limits to how much the former efforts at reform can succeed without the latter. In the hypercompetitive marketplace, only the most dramatic claims gain attention. Seldom are the results of rigorously done, transparently reported scientific work sufficiently strong and  unambiguous enough to back up the claims with the broadest appeal, especially in psychology. Psychologists who remain in academic setting but want to sell market their merchandise to consumers face a dilemma: How much do they have to hype and distort their findings in peer-reviewed journals to fit with what they say to the public?

It important for readers of scientific articles to know that authors are engaged in these outside activities and have pressure to obtain particular results. The temptation of being able to make bold claims clash with the requirements to conduct solid science and report results transparently and completely. Let readers decide if this matters for their receptivity to what authors say in peer-reviewed articles by having information available to them. But almost never is a conflict of interest declared. Just search articles in Psychological Science and see if you can find a single declaration of a COI, even when the authors have booking agents and give high priced corporate talks and seminars.

The discussion of the quality of science backing power posing should have been shorter.

Up until now, much attention to power posing in academic circles has been devoted to the quality of the science behind it, whether results can be independently replicated, and whether critics have behaved badly. The last segment of Mind the Brain examined the faulty science of the original power posing paper in Psychological Science and showed why it could not contribute a credible effect size to the literature.

The discussion of the science behind power posing should have been much shorter and should have reached a definitive conclusion: the original power posing paper should never have been published in Psychological Science. Once the paper had been published, a succession of editors failed in their expanded Pottery-Barn responsibility to publish critiques by Steven J. Stanton  and by Marcus Crede and Leigh A. Phillips that were quite reasonable in their substance and tone. As is almost always the case, bad science was accorded an incumbent advantage once it was published. Any disparagement or criticism of this paper would be held by editors to strict and even impossibly high standards if it were to be published. Let’s review the bad science uncovered in the last blog. Readers who are familiar with that post can skip to the next section.

A brief unvarnished summary of the bad science of the original power posing paper has a biobehavioral intervention study

Reviewers of the original paper should have balked at the uninformative and inaccurate abstract. Minimally, readers need to know at the outset that there were only 42 participants (26 females and 16 males) in the study comparing high power versus low-power poses. Studies with so few participants cannot be expected to provide reproducible effect sizes. Furthermore, there is no basis for claiming that results held for both men and women because that claim depended on analyses with even smaller numbers. Note the 16 males were distributed in some unknown way across the two conditions. If power is fixed by the smaller cell size, even the optimal 8 males/cell is well below contributing an effect size. Any apparent significant effects in this study are likely to be meaning imposed on noise.

The end sentence in the abstract is an outrageously untrue statement of results. Yet, as we will see, it served as the basis of a product launch worth in the seven-figure range that was already taking shape:

That a person can, by assuming two simple 1-minute poses, embody power and instantly become more powerful has real-world, actionable implications.

Aside from the small sample size, as an author, editor and critic for in clinical and health psychology for over 40 years, I greet a claim of ‘real-world actionable implications’ from two one-minute manipulations of participants’ posture with extreme skepticism. My skepticism grows as we delve into the details of the study.

Investigators’ collecting a single pair of pre-post assessments of salivary cortisol is at best a meaningless ritual, and can contribute nothing to understanding what is going on in the study at a hormonal level.

Men in this age range of participants in this study have six times more testosterone than women. Statistical “control” of testosterone by controlling for gender is a meaningless gesture producing uninterpretable results. Controlling for baseline testosterone in analyses of cortisol and vice versa eliminates any faint signal in the loud noise of the hormonal data.

Although it was intended as a manipulation check (and subsequently as claimed as evidence of the effect of power posing on feelings),  the crude subjective self-report ratings of how “powerful” and “in charge” on a 1-4 scale could simply communicate the experimenters’ expectancies to participants. Endorsing whether they felt more powerful indicated how smart participants were and if they were go along with the purpose of the study. Inferences beyond that uninteresting finding require external validation.

In clinical and health psychology trials, we are quite wary of simple subjective self-report analogue scales, particularly when there is poor control of the unblinded experimenters’ behavior and what they communicate to participants.

The gambling task lacks external validation. Low stakes could simply reduce it to another communication of experimenters’ expectancies. Note that the saliva assessments were obtained after completion of the task and if there is any confidence left in the assessments of hormones, this is an important confound.

The unblinded experimenters’ physically placing participants in either 2 1-minute high power or 2 1-minute low-power poses is a weird, unvalidated experimental manipulation that could not have the anticipated effects on hormonal levels. Neither high- nor low-power poses are credible, but the hypothesis particularly strains credibility that they low-power pose would actually raise cortisol, if cortisol assessments in the study had any meaning at all.

Analyses were not accurately described, and statistical controls of any kind with such a small sample  are likely to add to spurious findings. The statistical controls in this study were particularly inappropriate and there is evidence of the investigators choosing the analyses to present after the results were known.

There is no there there: The original power pose paper did not introduce a credible effect size into the literature.

The published paper cannot introduce a credible effect size into the scientific literature. Power posing may be an interesting and important idea that deserves careful scientific study but if any future study of the idea would be “first ever,” not a replication of the  Psychological Science article. The two commentaries that were blocked from publication in Psychological Science but published elsewhere amplify any dismissal of the paper, but we are already well over the top. But then there is the extraordinary repudiation of the paper by the first author and her exposure of the exploitation of investigator degrees of freedom and outright p-hacking.  How many stakes do you have to plunge into the heart of a vampire idea?

Product launch

 Even before the power posing article appeared in Psychological Science, Amy Cuddy was promoting it at Harvard, first  in Power Posing: Fake It Until You Make It  in Harvard Business School’s Working Knowledge: Business Research for Business Leaders. Shortly afterwards was the redundant but elaborated article in Harvard Magazine, subtitled Amy Cuddy probes snap judgments, warm feelings, and how to become an “alpha dog.”

Amy Cuddy is the middle author on the actual Psychological Science between first author Dana Carney and third author, Dana Carney’s graduate student Andy J Yap. Yet, the Harvard Magazine article lists Cuddy first. The Harvard Magazine article is also noteworthy in unveiling what would grow into Cuddy’s redemptive self narrative, although Susan Fiske’s role as  as the “attachment figure” who nurtures Cuddy’s  realization of her inner potential was only hinted.

QUITE LITERALLY BY ACCIDENT, Cuddy became a psychologist. In high school and in college at the University of Colorado at Boulder, she was a serious ballet dancer who worked as a roller-skating waitress at the celebrated L.A. Diner. But one night, she was riding in a car whose driver fell asleep at 4:00 A.M. while doing 90 miles per hour in Wyoming; the accident landed Cuddy in the hospital with severe head trauma and “diffuse axonal injury,” she says. “It’s hard to predict the outcome after that type of injury, and there’s not much they can do for you.”

Cuddy had to take years off from school and “relearn how to learn,” she explains. “I knew I was gifted–I knew my IQ, and didn’t think it could change. But it went down by two standard deviations after the injury. I worked hard to recover those abilities and studied circles around everyone. I listened to Mozart–I was willing to try anything!” Two years later her IQ was back. And she could dance again.

Yup, leading up to promoting the idea that overcoming circumstances and getting what you want is as simple as adopitng these 2 minutes of  behavioral manipulation.

The last line of the Psychological Science abstract was easily fashioned into the pseudoscientific basis for this ease of changing behavior and outcomes, which now include the success of venture-capital pitches:

 

Tiny changes that people can make can lead to some pretty dramatic outcomes,” Cuddy reports. This is true because changing one’s own mindset sets up a positive feedback loop with the neuroendocrine secretions, and also changes the mindset of others. The success of venture-capital pitches to investors apparently turns, in fact, on nonverbal factors like “how comfortable and charismatic you are.”

Soon, The New York Times columnist David Brooks   placed power posing solidly within the positive thinking product line of positive psychology, even if Cuddy had no need to go out on that circuit: “If you act powerfully, you will begin to think powerfully.”

In 2011, both first author Dana Carney and Amy Cuddy received the Rising Star Award from the Association for Psychological Science (APS) for having “already made great advancements in science” Carney cited her power posing paper as one that she liked. Cuddy didn’t nominate the paper, but reported er recent work examined “how brief nonverbal expressions of competence/power and warmth/connection actually alter the neuroendocrine levels, expressions, and behaviors of the people making the expressions, even when the expressions are “posed.”

The same year, In 2011, Cuddy also appeared at PopTech, which is a”global community of innovators, working together to expand the edge of change” with tickets selling for $2,000. According to an article in The Chronicle of Higher Education :

When her turn came, Cuddy stood on stage in front of a jumbo screen showing Lynda Carter as Wonder Woman while that TV show’s triumphant theme song announced the professor’s arrival (“All the world is waiting for you! And the power you possess!”). After the music stopped, Cuddy proceeded to explain the science of power poses to a room filled with would-be innovators eager to expand the edge of change.

But that performance was just a warm up for Cuddy’s TedGlobal Talk which has now received almost 42 million views.

A Ted Global talk that can serve as a model for all Ted talks: Your body language may shape who you are  

This link takes you not only to Amy Cuddy’s Ted Global talk but to a transcript in 49 different languages

 Amy Cuddy’s TedGlobal Talk is brilliantly crafted and masterfully delivered. It has two key threads. The first thread is what David McAdams has described as an obligatory personal narrative of a redeemed self.  McAdams summarizes the basic structure:

As I move forward in life, many bad things come my way—sin, sickness, abuse, addiction, injustice, poverty, stagnation. But bad things often lead to good outcomes—my suffering is redeemed. Redemption comes to me in the form of atonement, recovery, emancipation, enlightenment, upward social mobility, and/or the actualization of my good inner self. As the plot unfolds, I continue to grow and progress. I bear fruit; I give back; I offer a unique contribution.

This is interwoven with a second thread, the claims of the strong science of power pose derived from the Psychological Science article. Without the science thread, the talk is reduced to a motivational talk of the genre of Oprah Winfrey or Navy Seal Admiral William McRaven Sharing Reasons You Should Make Bed Everyday

It is not clear that we should hold the redeemed self of a Ted Talk to the criteria of historical truth. Does it  really matter whether  Amy Cuddy’s IQ temporarily fell two standard deviations after an auto accident (13:22)? That Cuddy’s “angel adviser Susan Fiske saved her from feeling like an imposter with the pep talk that inspired the “fake it until you make it” theme of power posing (17:03)? That Cuddy similarly transformed the life of her graduate student (18:47) with:

So I was like, “Yes, you are! You are supposed to be here! And tomorrow you’re going to fake it, you’re going to make yourself powerful, and, you know –

This last segment of the Ted talk is best viewed, rather than read in the transcript. It brings Cuddy to tears and the cheering, clapping audience to their feet. And Cuddy wraps up with her takeaway message:

The last thing I’m going to leave you with is this. Tiny tweaks can lead to big changes. So, this is two minutes. Two minutes, two minutes, two minutes. Before you go into the next stressful evaluative situation, for two minutes, try doing this, in the elevator, in a bathroom stall, at your desk behind closed doors. That’s what you want to do. Configure your brain to cope the best in that situation. Get your testosterone up. Get your cortisol down. Don’t leave that situation feeling like, oh, I didn’t show them who I am. Leave that situation feeling like, I really feel like I got to say who I am and show who I am.

So I want to ask you first, you know, both to try power posing, and also I want to ask you to share the science, because this is simple. I don’t have ego involved in this. (Laughter) Give it away. Share it with people, because the people who can use it the most are the ones with no resources and no technology and no status and no power. Give it to them because they can do it in private. They need their bodies, privacy and two minutes, and it can significantly change the outcomes of their life.

Who cares if the story is literal historical truth? Maybe we should not. But I think psychologists should care about the misrepresentation of the study. For that matter, anyone concerned with truth in advertising to consumers. Anyone who believes that consumers have the right to fair and accurate portrayal of science in being offered products, whether anti-aging cream, acupuncture, or self-help merchandise:

Here’s what we find on testosterone. From their baseline when they come in, high-power people experience about a 20-percent increase, and low-power people experience about a 10-percent decrease. So again, two minutes, and you get these changes. Here’s what you get on cortisol. High-power people experience about a 25-percent decrease, and the low-power people experience about a 15-percent increase. So two minutes lead to these hormonal changes that configure your brain to basically be either assertive, confident and comfortable, or really stress-reactive, and feeling sort of shut down. And we’ve all had the feeling, right? So it seems that our nonverbals do govern how we think and feel about ourselves, so it’s not just others, but it’s also ourselves. Also, our bodies change our minds.

Why should we care? Buying into such simple solutions prepares consumers to accept other outrageous claims. It can be a gateway drug for other quack treatments like Harvard psychologist Ellen Langer’s claims that changing mindset can overcome advanced cancer.

Unwarranted claims breaks down the barriers between evidence-based recommendations and nonsense. Such claims discourages consumers from accepting more deliverable promises that evidence-based interventions like psychotherapy can indeed make a difference, but they take work and effort, and effects can be modest. Who would invest time and money in cognitive behavior therapy, when two one-minute self-manipulations can transform lives? Like all unrealistic promises of redemption, such advice may ultimately lead people to blame themselves when they don’t overcome adversity- after all it is so simple  and just a matter of taking charge of your life. Their predicament indicates that they did not take charge or that they are simply losers.

But some consumers can be turned cynical about psychology. Here is a Harvard professor trying to sell them crap advice. Psychology sucks, it is crap.

Conflict of interest: Nothing to declare?

In an interview with The New York Times, Amy Cuddy said: “I don’t care if some people view this research as stupid,” she said. “I feel like it’s my duty to share it.”

Amy Cuddy may have been giving her power pose advice away for free in her Ted Talk, but she already had given it away at the $2,000 a ticket PopTech talk. The book contract for Presence: Bringing Your Boldest Self to Your Biggest Challenges was reportedly for around a million dollars.  And of course, like many academics who leave psychology for schools of management, Cuddy had a booking agency soliciting corporate talks and workshops. With the Ted talk, she could command $40,000-$100,000.

Does this discredit the science of power posing? Not necessarily, but readers should be informed and free to decide for themselves. Certainly, all this money in play might make Cuddy more likely to respond defensively to criticism of her work. If she repudiated this work the way that first author Dana Carey did, would there be a halt to her speaking gigs, a product recall, or refunds issued by Amazon for Presence?

I think it is fair to suggest that there is too much money in play for Cuddy to respond to academic debate.  Maybe things are outside that realm because of these stakes.

The replicationados attempt replications: Was it counterproductive?

 Faced with overwhelming evidence of the untrustworthiness of the psychological literature, some psychologists have organized replication initiatives and accumulated considerable resources for multisite replications. But replication initiatives are insufficient to salvage the untrustworthiness of many areas of psychology, particularly clinical and health psychology intervention studies, and may inadvertently dampen more direct attacks on bad science. Many of those who promote replication initiatives are silent when investigators refused to share data for studies with important clinical and public health implications. They are also silent when journals like Psychological Science fail to publish criticism of papers with blatantly faulty science.

Replication initiatives take time and results are often,but not always ultimately published outside of the journals where a flawed original work was published. But in important unintended consequence of them is they lend credibility to effect sizes that had no validity whatsoever when they occurred in the original papers. In debate attempting to resolve discrepancies between original studies and large scale replications, the original underpowered studies are often granted a more entrenched incumbent advantage.

It should be no surprise that in large-scale attempted  replication,  Ranehill , Dreber, Johannesson, Leiberg, Sul , and Weber failed to replicate the key, nontrivial findings of the original power pose study.

Consistent with the findings of Carney et  al., our results showed a significant effect of power posing on self-reported feelings of power. However, we found no significant effect of power posing on hormonal levels or in any of the three behavioral tasks.

It is also not surprising that Cuddy invoked her I-said-it-first-and-i-was-peer-reviewed incumbent advantage reasserting of her original claim, along with a review of 33 studies including the attempted replication:

The work of Ranehill et al. joins a body of research that includes 33 independent experiments published with a total of 2,521 research participants. Together, these results may help specify when nonverbal expansiveness will and will not cause embodied psychological changes.

Cuddy asserted methodological differences between their study and the attempted Ranehill replication, may have moderated the effects of posing. But no study has shown that putting participants into a power pose affects hormones.

Joe Simmons and Uri Simonsohn and performed a meta analysis of the studies nominated by Cuddy and ultimately published in Psychological Science. Their blog Data Colada succinctly summarized the results:

Consistent with the replication motivating this post, p-curve indicates that either power-posing overall has no effect, or the effect is too small for the existing samples to have meaningfully studied it. Note that there are perfectly benign explanations for this: e.g., labs that run studies that worked wrote them up, labs that run studies that didn’t, didn’t. [5]

While the simplest explanation is that all studied effects are zero, it may be that one or two of them are real (any more and we would see a right-skewed p-curve). However, at this point the evidence for the basic effect seems too fragile to search for moderators or to advocate for people to engage in power posing to better their lives.

Come on, guys, there was never a there there, don’t invent one, but keeping trying to explain it.

It is interesting that none of these three follow up articles in Psychological Science have abstracts, especially in contrast to the original power pose paper that effectively delivered its misleading message in the abstract.

Just as this blog post was being polished, a special issue of Comprehensive Results in Social Psychology (CRSP) on Power Poses was released.

  1. No preregistered tests showed positive effects of expansive poses on any behavioral or hormonal measures. This includes direct replications and extensions.
  2. Surprise: A Bayesian meta-analysis across the studies reveals a credible effect of expansive poses on felt power. (Note that this is described as a ‘manipulation check’ by Cuddy in 2015.) Whether this is anything beyond a demand characteristic and whether it has any positive downstream behavioral effects is unknown.

No, not a surprise, just an uninteresting artifact. But stay tuned for the next model of poser pose dropping the tainted name and focusing on “felt power.” Like rust, commercialization of bad psychological science never really sleeps, it only takes power naps.

Meantime, professional psychological organizations, with their flagship journals and publicity machines need to:

  • Lose their fascination with psychologists whose celebrity status depends on Ted talks and the marketing of dubious advice products grounded in pseudoscience.
  • Embrace and adhere to an expanded Pottery Barn rule that covers not only direct replications, but corrections to bad science that has been published.
  • Make the protection of  consumers from false and exaggerated claims a priority equivalent to the vulnerable reputations of academic psychologists in efforts to improve the trustworthiness of psychology.
  • Require detailed conflicts of interest statements for talks and articles.

All opinions expressed here are solely those of Coyne of the Realm and not necessarily of PLOS blogs, PLOS One or his other affiliations.

Disclosure:

I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

 

 

 

 

Danish RCT of cognitive behavior therapy for whatever ails your physician about you

I was asked by a Danish journalist to examine a randomized controlled trial (RCT) of cognitive behavior therapy (CBT) for functional somatic symptoms. I had not previously given the study a close look.

I was dismayed by how highly problematic the study was in so many ways.

I doubted that the results of the study showed any benefits to the patients or have any relevance to healthcare.

I then searched and found the website for the senior author’s clinical offerings.  I suspected that the study was a mere experimercial or marketing effort of the services he offered.

Overall, I think what I found hiding in plain sight has broader relevance to scrutinizing other studies claiming to evaluate the efficacy of CBT for what are primarily physical illnesses, not psychiatric disorders. Look at the other RCTs. I am confident you will find similar problems. But then there is the bigger picture…

[A controversial assessment ahead? You can stop here and read the full text of the RCT  of the study and its trial registration before continuing with my analysis.]

Schröder A, Rehfeld E, Ørnbøl E, Sharpe M, Licht RW, Fink P. Cognitive–behavioural group treatment for a range of functional somatic syndromes: randomised trial. The British Journal of Psychiatry. 2012 Apr 13:bjp-p.

A summary overview of what I found:

 The RCT:

  • Was unblinded to patients, interventionists, and to the physicians continuing to provide routine care.
  • Had a grossly unmatched, inadequate control/comparison group that leads to any benefit from nonspecific (placebo) factors in the trial counting toward the estimated efficacy of the intervention.
  • Relied on subjective self-report measures for primary outcomes.
  • With such a familiar trio of design flaws, even an inert homeopathic treatment would be found effective, if it were provided with the same positive expectations and support as the CBT in this RCT. [This may seem a flippant comment that reflects on my credibility, not the study. But please keep reading to my detailed analysis where I back it up.]
  • The study showed an inexplicably high rate of deterioration in both treatment and control group. Apparent improvement in the treatment group might only reflect less deterioration than in the control group.
  • The study is focused on unvalidated psychiatric diagnoses being applied to patients with multiple somatic complaints, some of whom may not yet have a medical diagnosis, but most clearly had confirmed physical illnesses.

But wait, there is more!

  • It’s not CBT that was evaluated, but a complex multicomponent intervention in which what was called CBT is embedded in a way that its contribution cannot be evaluated.

The “CBT” did not map well on international understandings of the assumptions and delivery of CBT. The complex intervention included weeks of indoctrination of the patient with an understanding of their physical problems that incorporated simplistic pseudoscience before any CBT was delivered. We focused on goals imposed by a psychiatrist that didn’t necessarily fit with patients’ sense of their most pressing problems and the solutions.

OMGAnd the kicker.

  • The authors switched primary outcomes – reconfiguring the scoring of their subjective self-report measures years into the trial, based on a peeking at the results with the original scoring.

Investigators have a website which is marketing services. Rather than a quality contribution to the literature, this study can be seen as an experimercial doomed to bad science and questionable results from before the first patient was enrolled. An undeclared conflict of interest in play? There is another serious undeclared conflict of interest for one of the authors.

For the uninformed and gullible, the study handsomely succeeds as an advertisement for the investigators’ services to professionals and patients.

Personally, I would be indignant if a primary care physician tried to refer me or friend or family member to this trial. In the absence of overwhelming evidence to the contrary, I assume that people around me who complain of physical symptoms have legitimate physical concerns. If they do not yet have a confirmed diagnosis, it serves little purpose to stop the probing and refer them to psychiatrists. This trial operates with an anachronistic Victorian definition of psychosomatic condition.

something is rotten in the state of DenmarkBut why should we care about a patently badly conducted trial with switched outcomes? Is it only a matter of something being rotten in the state of Denmark? Aside from the general impact on the existing literature concerning CBT for somatic conditions, results of this trial  were entered into a Cochrane review of nonpharmacological interventions for medically unexplained symptoms. I previously complained about one of the authors of this RCT also being listed as an author on another Cochrane review protocol. Prior to that, I complained to Cochrane  about this author’s larger research group influencing a decision to include switched outcomes in another Cochrane review.  A lot of us rightfully depend heavily on the verdict of Cochrane reviews for deciding best evidence. That trust is being put into jeopardy.

Detailed analysis

1.This is an unblinded trial, a particularly weak methodology for examining whether a treatment works.

The letter that alerted physicians to the trial had essentially encouraged them to refer patients they were having difficulty managing.

‘Patients with a long-term illness course due to medically unexplained or functional somatic symptoms who may have received diagnoses like fibromyalgia, chronic fatigue syndrome, whiplash associated disorder, or somatoform disorder.

Patients and the physicians who referred them subsequently got feedback about to which group patients were assigned, either routine care or what was labeled as CBT. This information could have had a strong influence on the outcomes that were reported, particularly for the patients left in routine care.

Patients’ learning that they did not get assigned to the intervention group was undoubtedly disappointing and demoralizing. The information probably did nothing to improve the positive expectations and support available to patients in routine. This could have had a nocebo effect. The feedback may have contributed to the otherwise  inexplicably high rates of subjective deterioration [to be noted below] reported by patients left in the routine care condition. In contrast, the authors’ disclosure that patients had been assigned to the intervention group undoubtedly boosted the morale of both patients and physicians and also increased the gratitude of the patients. This would be reflected in the responses to the subjective outcome measures.

The gold standard alternative to an unblinded trial is a double-blind, placebo-controlled trial in which neither providers, nor patients, nor even the assessors rating outcomes know to which group particular patients were assigned. Of course, this is difficult to achieve in a psychotherapy trial. Yet a fair alternative is a psychotherapy trial in which patients and those who refer them are blind to the nature of the different treatments, and in which an effort is made to communicate credible positive expectations about the comparison control group.

Conclusion: A lack of blinding seriously biases this study toward finding a positive effect for the intervention, regardless of whether the intervention has any active, effective component.

2. A claim that this is a randomized controlled trial depends on the adequacy of the control offered by the comparison group, enhanced routine care. Just what is being controlled by the comparison? In evaluating a psychological treatment, it’s important that the comparison/control group offers the same frequency and intensity of contact, positive expectations, attention and support. This trial decidedly did not.

 There were large differences between the intervention and control conditions in the amount of contact time. Patients assigned to the cognitive therapy condition received an additional 9 group sessions with a psychiatrist of 3.5 hour duration, plus the option of even more consultations. The over 30 hours of contact time with a psychiatrist should be very attractive to patients who wanted it and could not otherwise obtain it. For some, it undoubtedly represented an opportunity to have someone to listen to their complaints of pain and suffering in a way that had not previously happened. This is also more than the intensity of psychotherapy typically offered in clinical trials, which is closer to 10 to 15, 50-minute sessions.

The intervention group thus received substantially more support and contact time, which was delivered with more positive expectations. This wealth of nonspecific factors favoring the intervention group compromises an effort to disentangle the specific effects of any active ingredient in the CBT intervention package. From what has been said so far, the trials’ providing a fair and generalizable evaluation of the CBT intervention is nigh impossible.

Conclusion: This is a methodologically poor choice of control groups with the dice loaded to obtain a positive effect for CBT.

3.The primary outcomes, both as originally scored and after switching, are subjective self-report measures that are highly responsive to nonspecific treatments, alleviation of mild depressive symptoms and demoralization. They are not consistently related to objective changes in functioning. They are particularly problematic when used as outcome measures in the context of an unblinded clinical trial within an inadequate control group.

There have been consistent demonstrations that assigning patients to inert treatments and measuring the outcomes with subjective measures may register improvements that will not correspond to what would be found with objective measures.

For instance, a provocative New England Journal of Medicine study showed that sham acupuncture as effective as an established medical treatment – an albuterol inhaler – for asthma when judged with subjective measures, but there was a large superiority for the established medical treatment obtained with objective measures.

There have been a number of demonstrations that treatments such as the one offered in the present study to patient populations similar to those in the study produce changes in subjective self-report that are not reflected in objective measures.

Much of the improvement in primary outcomes occurred before the first assessment after baseline and not very much afterwards. The early response is consistent with a placebo response.

The study actually included one largely unnoticed objective measure, utilization of routine care. Presumably if the CBT was effective as claimed, it would have produced a significant reduction in healthcare utilization. After all, isn’t the point of this trial to demonstrate that CBT can reduce health-care utilization associated with (as yet) medically unexplained symptoms? Curiously, utilization of routine care did not differ between groups.

The combination of the choice of subjective outcomes, unblinded nature of the trial, and poorly chosen control group bring together features that are highly likely to produce the appearance of positive effects, without any substantial benefit to the functioning and well-being of the patients.

Conclusion: Evidence for the efficacy of a CBT package for somatic complaints that depends solely on subjective self-report measures is unreliable, and unlikely to generalize to more objective measures of meaningful impact on patients’ lives.

4. We need to take into account the inexplicably high rates of deterioration in both groups, but particularly in the control group receiving enhanced care.

There was an unexplained deterioration of 50% deterioration in the control group and 25% in the intervention group. Rates of deterioration are only given a one-sentence mention in the article, but deserve much more attention. These rates of deterioration need to qualify and dampen any generalizable clinical interpretation of other claims about outcomes attributed to the CBT. We need to keep in mind that the clinical trials cannot determine how effective treatments are, but only how different a treatment is from a control group. So, an effect claimed for a treatment and control can largely or entirely come from deterioration in the control group, not what the treatment offers. The claim of success for CBT probably largely depends on the deterioration in the control group.

One interpretation of this trial is that spending an extraordinary 30 hours with a psychiatrist leads to only half the deterioration experienceddoing nothing more than routine care. But this begs the question of why are half the patients left in routine care deteriorating in such a large proportion. What possibly could be going on?

Conclusion: Unexplained deterioration in the control group may explain apparent effects of the treatment, but both groups are doing badly.

5. The diagnosis of “functional somatic symptoms” or, as the authors prefer – Severe Bodily Distress Syndromes – is considered by the authors to be a psychiatric diagnosis. It is not accepted as a valid diagnosis internationally. Its validation is limited to the work done almost entirely within the author group, which is explicitly labeled as “preliminary.” This biased sample of patients is quite heterogeneous, beyond their physicians having difficulty managing them. They have a full range of subjective complaints and documented physical conditions. Many of these patients would not be considered primarily having a psychiatric disorder internationally and certainly within the US, except where they had major depression or an anxiety disorder. Such psychiatric disorders were not an exclusion criteria.

Once sent on the pathway to a psychiatric diagnosis by their physicians’ making a referral to the study, patients had to meet additional criteria:

To be eligible for participation individuals had to have a chronic (i.e. of at least 2 years duration) bodily distress syndrome of the severe multi-organ type, which requires functional somatic symptoms from at least three of four bodily systems, and moderate to severe impairment.in daily living.

The condition identified in the title of the article is not validated as a psychiatric diagnosis. Two papers to which the authors refer to their  own studies ( 1 , 2 ) from a single sample. The title of one of these papers makes a rather immodest claim:

Fink P, Schröder A. One single diagnosis, bodily distress syndrome, succeeded to capture 10 diagnostic categories of functional somatic syndromes and somatoform disorders. Journal of Psychosomatic Research. 2010 May 31;68(5):415-26.

In neither the two papers nor the present RCT is there sufficient effort to rule out a physical basis for the complaints qualifying these patients for a psychiatric diagnosis. There is also a lack of follow-up to see if physical diagnoses were later applied.

Citation patterns of these papers strongly suggest  the authors are not having got much traction internationally. The criteria of symptoms from three out of four bodily systems is arbitrary and unvalidated. Many patients with known physical conditions would meet these criteria without any psychiatric diagnosis being warranted.

The authors relate what is their essentially homegrown diagnosis to functional somatic syndromes, diagnoses which are themselves subject to serious criticism. See for instance the work of Allen Frances M.D., who had been the chair of the American Psychiatric Association ‘s Diagnostic and Statistical Manual (DSM-IV) Task Force. He became a harsh critic of its shortcomings and the failures of APA to correct coverage of functional somatic syndromes in the next DSM.

Mislabeling Medical Illness As Mental Disorder

Unless DSM-5 changes these incredibly over inclusive criteria, it will greatly increase the rates of diagnosis of mental disorders in the medically ill – whether they have established diseases (like diabetes, coronary disease or cancer) or have unexplained medical conditions that so far have presented with somatic symptoms of unclear etiology.

And:

The diagnosis of mental disorder will be based solely on the clinician’s subjective and fallible judgment that the patient’s life has become ‘subsumed’ with health concerns and preoccupations, or that the response to distressing somatic symptoms is ‘excessive’ or ‘disproportionate,’ or that the coping strategies to deal with the symptom are ‘maladaptive’.

And:

 “These are inherently unreliable and untrustworthy judgments that will open the floodgates to the overdiagnosis of mental disorder and promote the missed diagnosis of medical disorder.

The DSM 5 Task force refused to adopt changes proposed by Dr. Frances.

Bad News: DSM 5 Refuses to Correct Somatic Symptom Disorder

Leading Frances to apologize to patients:

My heart goes out to all those who will be mislabeled with this misbegotten diagnosis. And I regret and apologize for my failure to be more effective.

The chair of The DSM Somatic Symptom Disorder work group has delivered a scathing critique of the very concept of medically unexplained symptoms.

Dimsdale JE. Medically unexplained symptoms: a treacherous foundation for somatoform disorders?. Psychiatric Clinics of North America. 2011 Sep 30;34(3):511-3.

Dimsdale noted that applying this psychiatric diagnosis sidesteps the quality of medical examination that led up to it. Furthermore:

Many illnesses present initially with nonspecific signs such as fatigue, long before the disease progresses to the point where laboratory and physical findings can establish a diagnosis.

And such diagnoses may encompass far too varied a group of patients for any intervention to make sense:

One needs to acknowledge that diseases are very heterogeneous. That heterogeneity may account for the variance in response to intervention. Histologically, similar tumors have different surface receptors, which affect response to chemotherapy. Particularly in chronic disease presentations such as irritable bowel syndrome or chronic fatigue syndrome, the heterogeneity of the illness makes it perilous to diagnose all such patients as having MUS and an underlying somatoform disorder.

I tried making sense of a table of the additional diagnoses that the patients in this study had been given. A considerable proportion of patients had physical conditions that would not be considered psychiatric problems in the United States.. Many patients could be suffering from multiple symptoms not only from the conditions, but side effects of the medications being offered. It is very difficult to manage multiple medications required by multiple comorbidities. Physicians from the community found their competence and ability to spend time with these patients taxing.

table of functional somatic symptoms

Most patients had a diagnosis of “functional headaches.” It’s not clear what this designation means, but conceivably it could include migraine headaches, which are accompanied by multiple physical complaints. CBT is not an evidence-based treatment of choice for functional headaches, much less migraines.

Over a third of the patients had irritable bowel syndrome (IBS). A systematic review of the comorbidity  of irritable bowel syndrome concluded physical comorbidity is the norm in IBS:

The nongastrointestinal nonpsychiatric disorders with the best-documented association are fibromyalgia (median of 49% have IBS), chronic fatigue syndrome (51%), temporomandibular joint disorder (64%), and chronic pelvic pain (50%).

In the United States, many patients and specialists would consider considering irritable bowel syndrome as a psychiatric condition offensive and counterproductive. There is growing evidence that irritable bowel syndrome is a disturbance in the gut microbiota. It involves a gut-brain interaction, but the primary direction of influence is of the disturbance in the gut on the brain. Anxiety and depression symptoms are secondary manifestations, a product of activity in the gut influencing the nervous system.

Most of the patients in the sample had a diagnosis of fibromyalgia and over half of all patients in this study had a diagnosis of chronic fatigue syndrome.

Other patients had diagnosable anxiety and depressive disorders, which, particularly at the lower end of severity, are responsive to nonspecific treatments.

Undoubtedly many of these patients, perhaps most of them, are demoralized by not been able to get a  diagnosis for what they have good basis to believe is a medical condition, aside from the discomfort, pain, and interference with their life that they are experiencing. They could be experiencing a demoralization secondary to physical illness.

These patients presented with pain, fatigue, general malaise, and demoralization. I have trouble imagining how their specific most pressing concerns could be addressed in group settings. These patients pose particular problems for making substantive clinical interpretation of outcomes that are highly general and subjective.

Conclusion: Diagnosing patients with multiple physical symptoms as having a psychiatric condition is highly controversial. Results will not generalize to countries and settings where the practice is not accepted. Many of the patients involved in the study had recognizable physical conditions, and yet they are being shunted to psychiatrists who focused only on their attitude towards the symptoms. They are being denied the specialist care and treatments that might conceivably reduce the impact of their conditions on their lives

6. The “CBT” offered in this study is as part of a complex, multicomponent treatment that does not resemble cognitive behavior therapy as it is practiced in the United States.

it is thoughtAs seen in figure 1 in the article, The multicomponent intervention is quite complex and consists of more than cognitive behavior therapy. Moreover, at least in the United States, CBT has distinctive elements of collaborative empiricism. Patients and therapist work together selecting issues on which to focus, developing strategies, with the patients reporting back on efforts to implement them. From the details available in the article, the treatment sounded much more like an exhortation or indoctrination, even arguing with the patients, if necessary. An English version available on the web of the educational material used in initial sessions confirmed a lot of condescending pseudoscience was presented to convince the patients that their problems were largely in their heads.

Without a clear application of learning theory, behavioral analysis, or cognitive science, the “CBT”  treatment offered in this RCT has much more in common with the creative novation therapy offered by Hans Eysenck, which is now known to have been justified with fraudulent data. Indeed,  the educational materials  for this study to what is offered in Eysenck’s study reveal striking similarities. Eysenck was advancing the claim that his intervention could prevent cardiovascular disease and cancer and overcome the iatrogenic effects. I know, this sounds really crazy, but see my careful documentation elsewhere.

Conclusion: The embedding of an unorthodox “CBT” in a multicomponent intervention in this study does not allow isolating any specific, active component ofCBT that might be at work.

7. The investigators disclose having altered their scoring of their primary outcome years after the trial began, and probably after a lot of outcome data had been collected.

I found a casual disclosure in the method section of this article unsettling, particularly noting that the original trial registration was:

We found an unexpected moderate negative correlation of the physical and mental component summary measures, which are constructed as independent measures. According to the SF-36 manual, a low or zero correlation of the physical and mental components is a prerequisite of their use.23 Moreover, three SF-36 scales that contribute considerably to the PCS did not fulfil basic scaling assumptions.31 These findings, together with a recent report of problems with the PCS in patients with physical and mental comorbidity,32 made us concerned that the PCS would not reliably measure patients’ physical health in the study sample. We therefore decided before conducting the analysis not to use the PCS, but to use instead the aggregate score as outlined above as our primary outcome measure. This decision was made on 26 February 2009 and registered as a protocol change at clinical trials. gov on 11 March 2009. Only baseline data had been analysed when we made our decision and the follow-up data were still concealed.

Switching outcomes, particularly after some results are known, constitutes a serious violation of best research practices and leads to suspicion of the investigators refining their hypotheses after they had peeked at the data. See How researchers dupe the public with a sneaky practice called “outcome switching”.

The authors had originally proposed a scoring consistent with a very large body of literature. Dropping the original scoring precludes any direct comparison with this body of research, including basic norms. They claim that they switched scoring because two key subscales were correlated in the opposite direction of what is reported in the larger literature. This is troubling indication that something has gone terribly wrong in authors’ recruitment of a sample. It should not be pushed under the rug.

The authors claim that they switched outcomes based only on examining of baseline data from their study. However, one of the authors, Michael Sharpe is also an author on the controversial PACE trial  A parallel switch was made to the scoring of the subjective self-reports in that trial. When the data were eventually re-analyzed using the original scoring, any positive findings for the trial were substantially reduced and arguably disappeared.

Even if the authors of the present RCT did not peekat their outcome data before deciding to switch scoring of the primary outcome, they certainly had strong indications from other sources that the original scoring would produce weak or null findings. In 2009, one of the authors, Michael Sharpe had access to results of a relevant trial. What is called the FINE trial had null findings, which affected decisions to switch outcomes in the PACE trial. Is it just a coincidence that the scoring of the outcomes was then switched for the present RCT?

Conclusion: The outcome switching for the present trial  represents bad research practices. For the trial to have any credibility, the investigators should make their data publicly available so these data could be independently re-analyzed with the original scoring of primary outcomes.

The senior author’s clinic

 I invite readers to take a virtual tour of the website for the senior author’s clinical services  ]. Much of it is available in English. Recently, I blogged about dubious claims of a health care system in Detroit achieving a goal of “zero suicide.” . I suggested that the evidence for this claim was quite dubious, but was a powerful advertisement for the health care system. I think the present report of an RCT can similarly be seen as an infomercial for training and clinical services available in Denmark.

Conflict of interest

 No conflict of interest is declared for this RCT. Under somewhat similar circumstances, I formally complained about undeclared conflicts of interest in a series of papers published in PLOS One. A correction has been announced, but not yet posted.

Aside from the senior author’s need to declare a conflict of interest, the same can be said for one of the authors, Michael Sharpe.

Apart from the professional and reputational interest, (his whole career has been built making strong claims about such interventions) Sharpe works for insurance companies, and publishes on the subject. He declared a conflict of interest for the for PACE trial.

MS has done voluntary and paid consultancy work for government and for legal and insurance companies, and has received royalties from Oxford University Press.

Here’s Sharpe’s report written for the social benefits reinsurance company UnumProvident.

If results of this are accepted at face, they will lend credibility to the claims that effective interventions are available to reduce social disability. It doesn’t matter that the intervention is not effective. Rather persons receiving social disability payments can be disqualified because they are not enrolled in such treatment.

Effects on the credibility of Cochrane collaboration report

The switched outcomes of the trial were entered into a Cochrane systematic review, to which primary care health professionals look for guidance in dealing with a complex clinical situation. The review gives no indication of the host of problems that I exposed here. Furthermore, I have glanced at some of the other trials included and I see similar difficulties.

I been unable to convince the Cochrane to clean up conflicts of interest that are attached to switched outcomes being entered in reviews. Perhaps some of my readers will want to approach Cochrane to revisit this issue.
I think this post raises larger issues about whether Cochrane has any business conducting and disseminating reviews of such a bogus psychiatric diagnosis, medically unexplained symptoms. These reviews do patients no good, and may sidetrack them from getting the medical care they deserve. The reviews do serve the interest of special interests, including disability insurance companies.

Special thanks to John Peters and to Skeptical Cat for their assistance with my writing this blog. However, I have sole responsibility for any excesses or distortions.

 

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.

 

COBRA study would have shown homeopathy can be substituted for cognitive behavior therapy for depression

If The Lancet COBRA study had evaluated homeopathy rather than behavioural activation (BA), homeopathy would likely have similarly been found “non-inferior” to cognitive behavior therapy.

This is not an argument for treating depression with homeopathy, but an argument that the 14 talented authors of The Lancet COBRA study stacked the deck for their conclusion that BA could be substituted for CBT in routine care for depression without loss of effectiveness. Conflict of interest and catering to politics intruded on science in the COBRA trial.

If a study like COBRA produces phenomenally similar results with treatments based on distinct mechanisms of change, one possibility is that background nonspecific factors are dominating the results. Insert homeopathy, a bogus treatment with strong nonspecific effects, in place of BA, and a non-inferiority may well be shown.

Why homeopathy?

Homeopathy involves diluting a substance so thoroughly that no molecules are likely to be present in what is administered to patients. The original substance is first diluted to one part per 10,000 part alcohol or distilled water. This process is repeated six times, ending up with the original material diluted by a factor of 100−6=10−12 .

Nonetheless, a super diluted and essentially inert substance is selected and delivered within a complex ritual.  The choice of the particular substance being diluted and the extent of its dilution is determined with detailed questioning of patients about their background, life style, and personal functioning. Naïve and unskeptical patients are likely to perceive themselves as receiving exceptionally personalized medicine delivered by a sympathetic and caring provider. Homeopathy thus has potentially strong nonspecific (placebo) elements that may be lacking in the briefer and less attentive encounters of routine medical care.

As an academic editor at PLOS One, I received considerable criticism for having accepted a failed trial of homeopathy for depression. The study had been funded by the German government and had fallen miserably short in efforts to recruit the intended sample size. I felt the study should be published in PLOS One  to provide evidence whether such and worthless studies should be undertaken in the future. But I also wanted readers to have the opportunity to see what I had learned from the article about just how ritualized homeopathy can be, with a strong potential for placebo effects.

Presumably, readers would then be better equipped to evaluate when authors claim in other contexts that homeopathy is effective from clinical trials with it was inadequate control of nonspecific effects. But that is also a pervasive problem in psychotherapy trials [ 1,  2 ]  that do not have a suitable comparison/control group.

I have tried to reinforce this message in the evaluation of complementary or integrative treatments in Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned.

The Lancet COBRA study

The Lancet COBRA study has received extraordinary promotion as evidence for the cost-effectiveness of substituting behavioural activation therapy (BA) delivered by minimally trained professionals for cognitive behaviour therapy (CBT) for depression. The study  is serving as the basis for proposals to cut costs in the UK National Health Service by replacing more expensive clinical psychologists with less trained and experienced providers.

Coached by the Science Media Centre, the authors of The Lancet study focused our attention on their finding no inferiority of BA to CBT. They are distracting us from the more important question of whether either treatment had any advantage over nonspecific interventions in the unusual context in which they were evaluated.

The editorial accompanying the COBRA study suggest a BA involves a simple message delivered by providers with very little training:

“Life will inevitably throw obstacles at you, and you will feel down. When you do, stay active. Do not quit. I will help you get active again.”

I encourage readers to stop and think how depressed persons suffering substantial impairment, including reduced ability to experience pleasure, would respond to such suggestions. It sounds all too much like the “Snap out of it, Debbie” they may have already heard from people around them or in their own self-blame.

Snap out of it, Debbie (from South Park)

 BA by any other name…

Actually, this kind of activation is routinely provided in in primary care in some countries as a first stage treatment in a stepped care approach to depression.

In such a system, when emergent mild to moderate depressive symptoms are uncovered in a primary medical care setting, providers are encouraged neither to initiate an active treatment nor even make a formal psychiatric diagnosis of a condition that could prove self-limiting with a brief passage of time. Rather, providers are encouraged to defer diagnosis and schedule a follow-up appointment. This is more than simple watchful waiting. Until the next appointment, providers encourage patients to undertake some guided self-help, including engagement in pleasant activities of their choice, much as apparently done in the BA condition in the COBRA study. Increasingly, they may encourage Internet-based therapy.

In a few parts of the UK, general practitioners may refer patients to a green gym.

green gym

It’s now appreciated that to have any effectiveness, such prescriptions have to be made in a relationship of supportive accountability. For patients to adhere adequately to such prescriptions and not feel they are simply being dismissed by the provider and sent away. Patients need to have a sense that the prescription is occurring within the context of a relationship with someone who cares with whether they carry out and benefit from the prescription.

Used in this way, this BA component of stepped care could possibly be part of reducing unnecessary medication and the need for more intensive treatment. However, evaluation of cost effectiveness is complicated by the need for a support structure in which treatment can be monitored, including any antidepressant medication that is subsequently prescribed. Otherwise, the needs of a substantial number of patients needing more intensive, quality care for depression would be neglected.

The shortcomings of COBRA as an evaluation of BA in context

COBRA does not provide an evaluation of any system offering BA to the large pool of patients who do not require more intensive treatment in a system where they would be provided appropriate timely evaluation and referral onwards.

It is the nature of mild to moderate depressive symptoms being presented in primary care, especially when patients are not specifically seeking mental health treatment, that the threshold for a formal diagnosis of major depression is often met by the minimum or only one more than the five required symptoms. Diagnoses are of necessity unreliable, in part because the judgment of particular symptoms meeting a minimal threshold of severity is unreliable. After a brief passage of time and in the absence of formal treatment, a substantial proportion of patients will no longer meet diagnostic criteria.

COBRA also does not evaluate BA versus CBT in the more select population that participates in clinical trials of treatment for depression. Sir David Goldberg is credited  with first describing the filters that operate on the pathway of patients from presenting a complex combination of problems in living and psychiatric symptoms in primary medical care to treatment in specialty settings.

Results of the COBRA study cannot be meaningfully integrated into the existing literature concerning BA as a component of stepped care or treatment for depression that is sufficient in itself.

More recently, I reviewed in detail The Lancet COBRA study, highlighting how one of the most ambitious and heavily promoted psychotherapy studies ever – was noninformative.  The authors’ claim was unwarranted that it would be wise to substitute BA delivered by minimally trained providers for cognitive behavior therapy delivered by clinical psychologists.

I refer readers to that blog post for further elaboration of some points I will be making here. For instance, some readers might want to refresh their sense of how a noninferiority trial differs from a conventional comparison of two treatments.

Risk of bias in noninferiority trial

 Published reports of clinical trials are notoriously unreliable and biased in terms of the authors’ favored conclusions.

With the typical evaluation of an active treatment versus a control condition, the risk of bias is that reported results will favor the active treatment. However, the issue of bias in a noninferiority trial is more complex. The investigators’ interest is in demonstrating that within certain limits, there are no significant differences between two treatments. Yet, although it is not always tested directly, the intention is to show that this lack of difference is due them both being effective, rather than ineffective.

In COBRA, the authors’ clear intention was to show that less expensive BA was not inferior to CBT, with the assumption that both were effective. Biases can emerge from building in features of the design, analysis, and interpretation of the study that minimized differences between these two treatments. But bias can also arise from a study design in which nonspecific effects are distributed across interventions so that any difference in active ingredients is obscured by shared features of the circumstances in which the interventions are delivered. As in Alice in Wonderland [https://en.wikipedia.org/wiki/Dodo_bird_verdict ], the race is rigged so that almost everybody can get a prize.

Why COBRA could have shown almost any treatment with nonspecific effects was noninferior to CBT for depression

 1.The investigators chose a population and a recruitment strategy that increase the likelihood that patients participating in the trial would likely get better with minimal support and contact available in either of the two conditions – BA versus CBT.

The recruited patients were not actively seeking treatment. They were identified from records of GPs has having had a diagnosis of depression, but were required to not currently being in psychotherapy.

GP recording of a diagnosis of depression has poor concordance with a formal, structured interview-based diagnosis, with considerable overdiagnosis and overtreatment.

A recent Dutch study found that persons meeting interview-based criteria for major depression in the community who do not have a past history of treatment mostly are not found to be depressed upon re-interview.

To be eligible for participation in the study, the patients also had to meet criteria for major depression in a semi structured research interview with (Structured Clinical Interview for the Diagnostic and Statistical Manual of  Mental Disorders, Fourth Edition [SCID]. Diagnoses with the SCID obtained under these circumstances are also likely to have a considerable proportion of false positives.

A dirty secret from someone who has supervised thousands of SCID interviews of medical patients. The developers of the SCID recognized that it yielded a lot of false positives and inflated rates of disorder among patients who are not seeking mental health care.

They attempted to compensate by requiring that respondents not only endorse symptoms, but indicate that the symptoms are a source of impairment. This is the so-called clinical significance criterion. Respondents automatically meet the criterion if they are seeking mental health treatment. Those who are not seeking treatment are asked directly whether the symptoms impair them. This is a particularly on validated aspect of the SCID in patients typically do not endorse their symptoms as a source of impairment.

When we asked breast cancer patients who otherwise met criteria for depression with the SCID whether the depressive symptoms impaired them, they uniformly said something like ‘No, my cancer impairs me.’ When we conducted a systematic study of the clinical significance criterion, we found that whether or not it was endorsed substantially affected individual in overall rates of diagnosis. Robert Spitzer, who developed the SCID interview along with his wife Janet Williams, conceded to me in a symposium that application of the clinical significance criterion was a failure.

What is the relevance in a discussion of the COBRA study? I would wager that the authors, like most investigators who use the SCID, did not inquire about the clinical significance criterion, and as a result they had a lot of false positives.

The population being sampled in the recruitment strategy used in COBRA is likely to yield a sample unrepresentative of patients participating in the usual trials of psychotherapy and medication for depression.

2. Most patients participating in COBRA reported already receiving antidepressants at baseline, but adherence and follow-up are unknown, but likely to be inadequate.

Notoriously, patients receiving a prescription for an antidepressant in primary care actually take the medication inconsistently and for only a short time, if at all. They receive inadequate follow-up and reassessment. Their depression outcomes may actually be poorer than for patients receiving a pill placebo in the context of a clinical trial, where there is blinding and a high degree of positive expectations, attention and support.

Studies, including one by an author of the COBRA study suggests that augmenting adequately managed treatment with antidepressants with psychotherapy is unlikely to improve outcomes.

We’re stumbling upon one of the more messy features of COBRA. Most patients had already been prescribed medication at baseline, but their adherence and follow-up is left unreported, but is likely to be poor. The prescription is likely to have been made up to two years before baseline.

It would not be cost-effective to introduce psychotherapy to such a sample without reassessing whether they were adequately receiving medication. Such a sample would also be highly susceptible to nonspecific interventions providing positive expectations, support, and attention that they are not receiving in their antidepressant treatment. There are multiple ways in which nonspecific effects could improve outcomes – perhaps by improving adherence, but perhaps because of the healing effects of support on mild depressive symptoms.

3. The COBRA authors’ way of dealing with co-treatment with antidepressants blocked readers ability to independently evaluate main effects and interactions with BA versus CBT.

 The authors used antidepressant treatment as a stratification factor, insuring that the 70% of patients receiving them were evenly distributed the BA in CBT conditions. This strategy made it more difficult to separate effects of antidepressants. However, the problem is compounded by the authors failure to provide subgroup analyses based on whether patients had received an antidepressant prescription, as well as the authors failure to provide any descriptions of the extent to which patients received management of their antidepressants at baseline or during active psychotherapy and follow-up. The authors incorporated data concerning the cost of medication into their economic analyses, but did not report the data in a way that could be scrutinized.

I anticipate requesting these data from the authors to find out more, although they have not responded to my previous query concerning anomalies in the reporting of how long since patients had first received a prescription for antidepressants.

4. The 12 month assessment designated as the primary outcomes capitalized on natural recovery patterns, unreliability of initial diagnosis, and simple regression to the mean.

Depression identified in the community and in primary care patient populations is variable in the course, but typically resolves in nine months. Making reassessment of primary outcomes at 12 months increases the likelihood that effects of active ingredients of the two treatments would be lost in a natural recovery process.

5. The intensity of treatment (allowable number of 20 sessions plus for additional sessions) offered in the study exceeded what is available in typical psychotherapy trials and exceeded what was actually accessed by patients.

Allowing this level of intensity of treatment generates a lot of noise in any interpretation of the resulting data. Offering so much treatment encourages patients dropping out, with the loss of their follow-up data. We can’t tell if they simply dropped out because they had received what they perceived as sufficient treatment or if they were dissatisfied. This intensity of offered treatment reduces generalizability to what actually occurs in routine care and comparing and contrasting results of the COBRA study to the existing literature.

 6. The low rate of actual uptake of psychotherapy and retention of patients for follow-up present serious problems for interpreting the results of the COBRA study.

Intent to treat analyses with imputation of missing data are simply voodoo statistics with so much missing data. Imputation and other multivariate techniques make the assumption that data are missing at random, but as I just noted, this is an improbable assumption. [I refer readers back to my previous blog post who want to learn more about intent to treat versus per-protocol analyses].

The authors cite past literature in their choice to emphasize the per-protocol analyses. That means that they based their interpretation of the results on 135 of 221 patients originally assigned to the BA and in the 151 of 219 patients originally signed to CBT. This is a messy approach and precludes generalizing back to original assignment. That’s why that intent to treat analyses are emphasized in conventional evaluations of psychotherapy.

A skeptical view of what will be done with the COBRA data

 The authors clear intent was to produce data supporting an argument that more expensive clinical psychologists could be replaced by less trained clinicians providing a simplified treatment. The striking lack of differences between BA and CBT might be seen as strong evidence that BA could replace CBT. Yet, I am suggesting that the striking lack of differences could also indicate features built into the design that swamped any differences in limited any generalizability to what would happen if all depressed patients were referred to BA delivered by clinicians with little training versus CBT. I’m arguing that homeopathy would have done as well.

BA is already being implemented in the UK and elsewhere as part of stepped care initiatives for depression. Inclusion of BA is inadequately evaluated, as is the overall strategy of stepped care. See here for an excellent review of stepped care initiatives and a tentative conclusion that they are moderately effective, but that many questions remain.

If the COBRA authors were most committed to improving the quality of depression care in the UK, they would’ve either designed their study as a fairer test of substituting BA for CBT or they would have tackled the more urgent task of evaluating rigorously whether stepped care initiatives work.

Years ago, collaborative care programs for depression were touted as reducing overall costs. These programs, which were found to be robustly effective in many contexts, involved placing depression managers in primary care to assist the GPs in improved monitoring and management of treatment. Often the most immediate and effective improvement was that patients got adequate follow-up, where previously they were simply being ignored. Collaborative care programs did not prove to be cheaper, and not surprising, because better care is often more expensive than ineptly provided inadequate care.

We should be extremely skeptical of experienced investigators who claim that they demonstrate that they can cut costs and maintain quality with a wholesale reduction in the level of training of providers treating depression, a complex and heterogeneous disorder, especially when their expensive study fails to deal with this complexity and heterogeneity.

 

A skeptical look at The Lancet behavioural activation versus CBT for depression (COBRA) study

A skeptical look at:

Richards DA, Ekers D, McMillan D, Taylor RS, Byford S, Warren FC, Barrett B, Farrand PA, Gilbody S, Kuyken W, O’Mahen H. et al. Cost and Outcome of Behavioural Activation versus Cognitive Behavioural Therapy for Depression (COBRA): a randomised, controlled, non-inferiority trial. The Lancet. 2016 Jul 23.

 

humpty dumpty fallenAll the Queen’s horses and all the Queen’s men (and a few women) can’t put a flawed depression trial back together again.

Were they working below their pay grade? The 14 authors of the study collectively have impressive expertise. They claim to have obtained extensive consultation in designing and implementing the trial. Yet they produced:

  • A study doomed from the start by serious methodological problems from yielding any scientifically valid and generalizable results.
  • Instead, they produced tortured results that pander to policymakers seeking an illusory cheap fix.

 

Why the interests of persons with mental health problems are not served by translating the hype from a wasteful project into clinical practice and policy.

Maybe you were shocked and awed, as I was by the publicity campaign mounted by The Lancet on behalf of a terribly flawed article in The Lancet Psychiatry about whether locked inpatient wards fail suicidal patients.

It was a minor league effort compared to the campaign orchestrated by the Science Media Centre for a recent article in The Lancet The study concerned a noninferiority trial of behavioural activation (BA) versus cognitive behaviour therapy (CBT) for depression. The message echoing through social media without any critical response was behavioural activation for depression delivered by minimally trained mental health workers was cheaper but just as effective as cognitive behavioural therapy delivered by clinical psychologists.

Reflecting the success of the campaign, the immediate reactions to the article are like nothing I have recently seen. Here are the published altmetrics for an article with an extraordinary overall score of 696 (!) as of August 24, 2016.

altmetrics

 

Here is the press release.

Here is the full article reporting the study, which nobody in the Twitter storm seems to have consulted.

some news coverage

 

 

 

 

 

 

 

 

 

Here are supplementary materials.

Here is the well-orchestrated,uncritical response from tweeters, UK academics and policy makers.

.

The Basics of the study

The study was an open-label  two-armed non-inferiority trial of behavioural activation therapy (BA) versus cognitive behavioural therapy (CBT) for depression with no non-specific comparison/control treatment.

The primary outcome was depression symptoms measured with the self-report PHQ-9 at 12 months.

Delivery of both BA and CBT followed written manuals for a maximum of 20 60-minute sessions over 16 weeks, but with the option of four additional booster sessions if the patients wanted them. Receipt of eight sessions was considered an adequate exposure to the treatments.

The BA was delivered by

Junior mental health professionals —graduates trained to deliver guided self-help interventions, but with neither professional mental health qualifications nor formal training in psychological therapies—delivered an individually tailored programme re-engaging participants with positive environmental stimuli and developing depression management strategies.

CBT, in contrast, was delivered by

Professional or equivalently qualified psychotherapists, accredited as CBT therapists with the British Association of Behavioural and Cognitive Psychotherapy, with a postgraduate diploma in CBT.

The interpretation provided by the journal article:

Junior mental health workers with no professional training in psychological therapies can deliver behavioural activation, a simple psychological treatment, with no lesser effect than CBT has and at less cost. Effective psychological therapy for depression can be delivered without the need for costly and highly trained professionals.

A non-inferiority trial

An NHS website explains non-inferiority trials:

The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.

I have previously critiqued  [ 1,   2 ] noninferiority psychotherapy trials. I will simply reproduce a passage here:

Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials  found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs  so that misuse of the design in the service of getting publishable results is more readily detected.

Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.

Noninferiority trials are very popular among Pharma companies marketing rivals to popular medications. They use noninferiority trials to show that their brand is no worse than the already popular medication. But by not including a nonspecific control group, the trialists don’t bother to show that either of the medications is more effective than placebo under the conditions in which they were administered in these trials. Often, the medication dominating the market had achieved FDA approval for advertising with evidence of only being only modestly effective. So, potato are noninferior to spuds.

Compounding the problems of a noninferiority trial many times over

Let’s not dwell on this trial being a noninferiority trial, although I will return to the problem of knowing what would happen in the absence of either intervention or with a credible, nonspecific control group. Let’s focus instead on some other features of the trial that seriously compromised an already compromised trial.

Essentially, we will see that the investigators reached out to primary care patients who were mostly already receiving treatment with antidepressants, but unlikely with the support and positive expectations or even adherence necessary to obtain benefit. By providing these nonspecific factors, any psychological intervention would likely to prove effective in the short run.

The total amount of treatment offered substantially exceeded what is typically provided in clinical trials of CBT. However, uptake and actual receipt of treatment is likely to be low in such a population recruited by outreach, not active seeking treatment. So, noise is being introduced by offering so much treatment.

A considerable proportion of primary care patients identified as depressed won’t accept treatment or will not accept the full intensity available. However, without careful consideration of data that are probably not available for this trial, it will be ambiguous whether the amount of treatment received by particular patients represented dropping out prematurely or simply dropping out when they were satisfied with the benefits they had been received. Undoubtedly, failures to receive minimal intensity of treatment and even the overall amount of treatment received by particular patients are substantial and complexly determined, but nonrandom and differ between patients.

Dropping out of treatment is often associated with dropping out of a study – further data not being available for follow-up. These conditions set the stage for considerable challenges in analyzing and generalizing from whatever data are available. Clearly, the assumption of data being missing at random will be violated. But that is the key assumption required by multivariate statistical strategies that attempt to compensate for incomplete data.

12 months – the time point designated for assessment of primary outcomes – is likely to exceed the duration of a depressive episode in a primary care population, which is approximately 9 months. In the absence of a nonspecific active comparison/control or even a waitlist control group, recovery that would’ve occurred in the absence of treatment will be ascribed to the two active interventions being studied.

12 months is likely to exceed substantially the end of any treatment being received and so effects of any active treatments are likely to dissipate. The design allowed for up to four booster sessions. However, access to booster sessions was not controlled. It was not assigned and access cannot be assumed to be random. As we will see when we examined the CONSORT flowchart for the study, there was no increase in the number of patients receiving an adequate exposure to psychotherapy from 6 to 12 months. That is likely to indicate that most active treatment had ended within the first six months.

Focusing on 12 months outcomes, rather than six months, increases the unreliability of any analyses because more 12 month outcomes will be missing than what were available at six months.

Taken together, the excessively long 12 month follow-up being designated as primary outcome and the unusually amount of treatment being offered, but not necessarily being accepted, create substantial problems of missing data that cannot be compensated by typical imputation and multivariate methods; difficulties interpreting results in terms of the amount of treatment actually received; and comparison to the primary outcomes typical trials of psychotherapy being offered to patients seeking psychotherapy.

The authors’ multivariate analysis strategy was inappropriate, given the amount of missing data and the violation of data being missing at random..

Surely the more experienced of the 14 authors of The Lancet should have anticipated these problems and the low likelihood that this study would produce generalizable results.

Recruitment of patients

The article states:

 We recruited participants by searching the electronic case records of general practices and psychological therapy services for patients with depression, identifying potential participants from depression classification codes. Practices or services contacted patients to seek permission for researcher contact. The research team interviewed those that responded, provided detailed information on the study, took informed written consent, and assessed people for eligibility.

Eligibility criteria

Eligible participants were adults aged 18 years or older who met diagnostic criteria for major depressive disorder assessed by researchers using a standard clinical interview (Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition [SCID]9). We excluded people at interview who were receiving psychological therapy, were alcohol or drug dependent, were acutely suicidal or had attempted suicide in the previous 2 months, or were cognitively impaired, or who had bipolar disorder or psychosis or psychotic symptoms.

Table 3 Patient Characteristics reveals a couple of things about co-treatment with antidepressants that must be taken into consideration in evaluating the design and interpreting results.

antidepressant stratificationAnd

 

antidepressant stratification

So, investigators did not wait for patients to refer themselves or to be referred by physicians to the trial, they reached out to them. Applying their exclusion criteria, the investigators obtained a sample that mostly had been prescribed antidepressants, with no indication that the prescription had ended. The length of time which 70% patients had been on antidepressants was highly skewed, with a mean of 164 weeks and a median of 19. These figures strain credibility. I have reached out to the authors with a question whether there is an error in the table and await clarification.

We cannot assume that patients whose records indicate they were prescribed an antidepressant were refilling their prescriptions at the time of recruitment, were faithfully adhering, or were even being monitored.  The length of time since initial prescription increases skepticism whether there was adequate exposure to antidepressants at the time of recruitment to the study..

The inadequacy of antidepressant treatment in routine primary care

Refilling of first prescriptions of antidepressants in primary care, adherence, and monitoring and follow-up by providers are notoriously low.

Guideline-congruent treatment with antidepressants in the United States requires a five week follow up visit, which is only infrequently received in routine. When the five week follow-up visit is kept,

Rates of improvement in depression associated with prescription of an antidepressant in routine care approximate that achieved with pill placebo in antidepressant trials. The reasons for this are complex: but center on depression being of mild to moderate severity in primary care. Perhaps more important is that the attention, provisional positive expectations and support provided in routine primary care is lower than what is provided in the blinded pill-placebo condition in clinical trials. In blinded trials, neither the provider nor patient know whether the active medication or a pill placebo is being administered. The famous NIMH National Collaborative Study found, not surprisingly, that response in the pill-placebo condition was predicted by the quality of the therapeutic alliance between patient and provider.

In The Lancet study, readers are not provided with important baseline characteristics of the patients that are crucial to interpreting the results and their generalizability. We don’t know the baseline or subsequent adequacy of antidepressant treatment or of the quality of the routine care being provided for it. Given that antidepressants are not the first-line treatment for mild to moderate depression, we don’t know why these patients were not receiving psychotherapy. We don’t know even whether the recruited patients were previously offered psychotherapy and with what uptake, except that they were not receiving it two months prior to recruitment.

There is a fascinating missing story about why these patients were not receiving psychotherapy at the start of the study and why and with what accuracy they were described as taking antidepressants.

Readers are not told what happened to antidepressant treatment during the trial. To what extent did patients who were not receiving antidepressants begin doing so? As result of the more frequent contact and support provided in the psychotherapy, to what extent was there improvement in adherence, as well as the ongoing support inattention per providers and attention from primary care providers?

Depression identified in primary care is a highly heterogeneous condition, more so than among patients recruited from treatment in specialty mental health settings. Much of the depression has only the minimum number of symptoms required for a diagnosis or one more. The reliability of diagnosis is therefore lower than in specialty mental health settings. Much of the depression and anxiety disorders identified with semi structured research instruments in populations that is not selected for having sought treatment resolves itself without formal intervention.

The investigators were using less than ideal methods to recruit patients from a population in which major depressive disorder is highly heterogeneous and subject to recovery in the absence of treatment by the time point designated for assessment of primary outcome. They did not sufficiently address the problem of a high level of co-treatment having been prescribed long before the beginning of the study. They did not even assess the extent to which that prescribed treatment had patient adherence or provider monitoring and follow-up. The 12 month follow-up allowed the influence of lots of factors beyond the direct effects of the active ingredients of the two interventions being compared in the absence of a control group.

decline in scores

Examination of a table presented in the supplementary materials suggests that most change occurred in the first six months after enrollment and little thereafter. We don’t know the extent to which there was any treatment beyond the first six-month or what effect it had. A population with clinically significant depression drawn from specially care, some deterioration can be expected after withdrawal of active treatment. In a primary care population, such a graph could be produced in large part because of the recovery from depression that would be observed in the absence of active treatment.

 

Cost-effectiveness analyses reported in the study address the wrong question. These analyses only considered the relative cost of these two active treatments, leaving unaddressed the more basic question of whether it is cost-effective to offer either treatments at this intensity. It might be more cost-effective to have a person with even less mental health training contact patients, inquire about adherence, side effects, and clinical outcomes, and prompt patients to accept another appointment with the GP if an algorithm indicates that would be appropriate.

The intensity of treatment being offered and received

The 20 sessions plus 4 booster sessions of psychotherapy being offered in this trial is considerably higher than the 12 to 16 sessions offered in the typical RCT for depression. Having more sessions available than typical introduces some complications. Results are not comparable to what is found inthe trials offering less treatment. But in a primary care population not actively seeking psychotherapy for depression, there is further complication in that many patients will not access the full 20 sessions. There will be difficulties interpreting results in terms of intensity of treatment because of the heterogeneity of reasons for getting less treatment. Effectively, offering so much therapy to a group that is less inclined to accept psychotherapy introduces a lot of noise in trying to make sense of the data, particularly when cost-effectiveness is an issue.

This excerpt from the CONSORT flowchart demonstrates the multiple problems associated with offering so much treatment to a population that was not actively seeking it and yet needing twelve-month data for interpreting the results of a trial.

CONSORT chart

The number of patients who had no data at six months increased by 12 months. There was apparently no increase in the number of patients receiving an adequate exposure to psychotherapy

Why the interest of people with mental health problems are not served by the results claimed by these investigators being translated into clinical practice.

 The UK National Health Service (NHS) is seriously underfunding mental health services. Patients being referred for psychotherapy from primary care have waiting periods that often exceed the expected length of an episode of depression in primary care. Simply waiting for depression to remit without treatment is not necessarily cost effective because of the unneeded suffering, role impairment, and associated social and personal costs of an episode that persist. Moreover, there is a subgroup of depressed patients in primary care who need more intensive or different treatment. Guidelines recommending assessment after five weeks are not usually reflected in actual clinical practice.

There’s a desperate search for ways in which costs can be further reduced in the NHS. The Lancet study is being interpreted to suggest that more expensive clinical psychologists can be replaced by less expensive and less trained mental health workers. Uncritically and literally accepted, the message is that clinical psychologist working half-time addressing particular comment clinical problems can be replaced by less expensive mental health workers achieving the same effects in the same amount of time.

The pragmatic translation of these claims into practice are replace have a clinical psychologists with cheaper mental health workers. I don’t think it’s being cynical to anticipate the NHS seizing upon an opportunity to reduce costs, while ignoring effects on overall quality of care.

Care for the severely mentally ill in the NHS is already seriously compromised for other reasons. Patients experiencing an acute or chronic breakdown in psychological and social functioning often do not get minimal support and contact time to avoid more intensive and costly interventions like hospitalization. I think would be naïve to expect that the resources freed up by replacing a substantial portion of clinical psychologists with minimally trained mental health workers would be put into addressing unmeet needs of the severely mentally ill.

Although not always labeled as such, some form of BA is integral to stepped care approaches to depression in primary care. Before being prescribed antidepressants or being referred to psychotherapy, patients are encouraged to increased pleasant activities. In Scotland, they may be even given free movie passes for participating in cleanup of parks.

A stepped care approach is attractive, but evaluation of cost effectiveness is complicated by consideration of the need for adequate management of antidepressants for those patients who go on to that level of care.

If we are considering a sample of primary care patients mostly already receiving antidepressants, the relevant comparator is introduction of a depression care manager.

Furthermore, there are issues in the adequacy of addressing the needs of patients who do not benefit from lower intensity care. Is the lack of improvement with low levels of care adequately monitored and addressed. Is the uncertain escalation in level of care adequately supported so that referrals are completed?

The results of The Lancet study don’t tell us very much about the adequacy of care that patients who were enrolled in the study were receiving or whether BA is as effective as CBT as stand-alone treatments or whether nonspecific treatments would’ve done as well. We don’t even know whether patients assigned to a waitlist control would’ve shown as much improvement by 12 months and we have reason to suspect that many would.

I’m sure that the administrations of NHS are delighted with the positive reception of this study. I think it should be greeted with considerable skepticism. I am disappointed that the huge resources that went into conducting this study which could have put into more informative and useful research.

I end with two questions for the 14 authors – Can you recognize the shortcomings of your study and its interpretation that you have offered? Are you at least a little uncomfortable with the use to which these results will be put?

 

 

 

 

Trusted source? The Conversation tells migraine sufferers that child abuse may be at the root of their problems

Patients and family members face a challenge obtaining credible, evidence-based information about health conditions from the web.

Migraine sufferers have a particularly acute need because their condition is often inadequately self-managed without access to best available treatment approaches. Demoralized by the failure of past efforts to get relief, some sufferers may give up consulting professionals and desperately seek solutions on Internet.

A lot of both naïve and exploitative quackery that awaits them.

Even well-educated patients cannot always distinguish the credible from the ridiculous.

One search strategy is to rely on websites that have proven themselves as trusted sources.

The Conversation has promoted itself as such a trusted source, but its brand is tarnished by recent nonsense we will review concerning the role of child abuse in migraines.

Despite some excellent material that has appeared in other articles in The Conversation, I’m issuing a reader’s advisory:

exclamation pointThe Conversation cannot be trusted because this article shamelessly misinforms migraine sufferers that child abuse could be at the root of their problems.

The Conversation article concludes with a non sequitur that shifts sufferers and their primary care physicians away from getting consultation with the medical specialists who are most able to improve management of a complex condition.

 

The Conversation article tells us:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

This blog post identifies clickbait, manipulation, misapplied buzz terms, and  misinformation – in the The Conversation article.

Perhaps the larger message of this blog post is that persons with complex medical conditions and those who provide formal and informal care for them should not rely solely on what they find on the Internet. This exercise specifically focusing on The Conversation article serves to demonstrate this.

Hopefully, The Conversation will issue a correction, as they promise to do at the website when errors are found.

We are committed to responsible and ethical journalism, with a strict Editorial Charter and codes of conduct. Errors are corrected promptly.

The Conversation article –

Why emotional abuse in childhood may lead to migraines in adulthood

clickbaitA clickbait title offered a seductive  integration of a trending emotionally laden social issue – child abuse – with a serious medical condition – migraines – for which management is often not optimal. A widely circulating estimate is that 60% of migraine sufferers do not get appropriate medical attention in large part because they do not understand the treatment options available and may actually stop consulting physicians.

Some quick background about migraine from another, more credible source:

Migraines are different from other headaches. People who suffer migraines other debilitating symptoms.

  • visual disturbances (flashing lights, blind spots in the vision, zig zag patterns etc).
  • nausea and / or vomiting.
  • sensitivity to light (photophobia).
  • sensitivity to noise (phonophobia).
  • sensitivity to smells (osmophobia).
  • tingling / pins and needles / weakness / numbness in the limbs.

Persons with migraines differ greatly among themselves in terms of the frequency, intensity, and chronicity of their symptoms, as well as their triggers for attacks.

Migraine is triggered by an enormous variety of factors – not just cheese, chocolate and red wine! For most people there is not just one trigger but a combination of factors which individually can be tolerated. When these triggers occur altogether, a threshold is passed and a migraine is triggered. The best way to find your triggers is to keep a migraine diary. Download your free diary now!

Into The Conversation article: What is the link between emotional abuse and migraines?

Without immediately providing a clicklink so that  readers can check sources themselves, The Conversation authors say they are drawing on “previous research, including our own…” to declare there is indeed an association between past abuse and migraines.

Previous research, including our own, has found a link between experiencing migraine headaches in adulthood and experiencing emotional abuse in childhood. So how strong is the link? What is it about childhood emotional abuse that could lead to a physical problem, like migraines, in adulthood?

In invoking the horror of childhood emotional abuse, the authors imply that they are talking about something infrequent – outside the realm of most people’s experience.  If “childhood emotional abuse” is commonplace, how could  it be horrible and devastating?

In their pursuit of click bait sensationalism, the authors have only succeeded in trivializing a serious issue.

A minority of people endorsing items concerning past childhood emotional abuse actually currently meet criteria for a diagnosis of posttraumatic stress disorder. Their needs are not met by throwing them into a larger pool of people who do not meet these criteria and making recommendations based on evidence derived from the combined group.

Spiky_Puffer_Fish_Royalty_Free_Clipart_Picture_090530-025255-184042The Conversation authors employ a manipulative puffer fish strategy [1 and  2 ] They take what is a presumably infrequent condition and  attach horror to it. But they then wildly increase the presumed prevalence by switching to a definition that arises in a very different context:

Any act or series of acts of commission or omission by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child.

So we are now talking about ‘Any act or series of acts? ‘.. That results in ‘harm, potential for harm or threat’? The authors then assert that yes, whatever they are talking about is indeed that common. But the clicklink to support for this claim takes the reader behind a pay wall where a consumer can’t venture without access to a university library account.

Most readers are left with the authors’ assertion as an authority they can’t check. I have access to a med school library and I checked. The link is  to a secondary source. It is not a systematic review of the full range of available evidence. Instead, it is a  selective search for evidence favoring particular speculations. Disconfirming evidence is mostly ignored. Yet, this article actually contradicts other assertions of The Conversation authors. For instance, the paywalled article says that there is actually little evidence that cognitive behavior therapy is effective for people whose need for therapy is only because they  reported abuse in early childhood.

Even if you can’t check The Conversation authors’ claims, know that adults’ retrospective of childhood adversity are not particularly reliable or valid, especially studies relying on checklist responses of adults to broad categories, as this research does.

When we are dealing with claims that depend on adult retrospective reports of childhood adversity, we are dealing with a literature with seriously deficiencies. This literature grossly overinterprets common endorsement of particular childhood experiences as strong evidence of exposure to horrific conditions. This literature has a strong confirmation bias. Positive findings are highlighted. Negative findings do not get cited much. Serious limitations in methodology and inconsistency and findings generally ignored.

[This condemnation is worthy of a blog post or two itself. But ahead I will provide some documentation.]

The Conversation authors explain the discrepancy between estimates based on administrative data of one in eight children suffering abuse or neglect before age 18 versus much higher estimates from retrospective adult reports on the basis of so much abuse going unreported.

The discrepancy may be because so many cases of childhood abuse, particularly cases of emotional or psychological abuse, are unreported. This specific type of abuse may occur within a family over the course of years without recognition or detection.

This could certainly be true, but let’s see the evidence. A lack of reporting could also indicate a lack of many experiences reaching a threshold prompting reporting. I’m willing to be convinced otherwise, but let’s see the evidence.

The link between emotional abuse and migraines

The Conversation authors provide links only to their own research for their claim:

While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse. Two studies using nationally representative samples of older Americans (the mean ages were 50 and 56 years old, respectively) have found a link.

The first link is to an article that is paywalled except for its abstract. The abstract shows  the study does not involve a nationally representative sample of adults. The study compared patients with tension headaches to patients with migraines, without a no-headache control group. There is thus no opportunity to examine whether persons with migraines recall more emotional abuse than persons who do not suffer headaches.  Any significant associations in a huge sample disappeared after controlling for self-reported depression and anxiety.

My interpretation: There is nothing robust here. Results could be due to crude measurement, confounding of retrospective self-report by current self-report anxious or depressive symptoms. We can’t say much without a no-headache control group.

The second of the authors’ studies is also paywalled, but we can see from the abstract:

We used data from the Adverse Childhood Experiences (ACE) study, which included 17,337 adult members of the Kaiser Health Plan in San Diego, CA who were undergoing a comprehensive preventive medical evaluation. The study assessed 8 ACEs including abuse (emotional, physical, sexual), witnessing domestic violence, growing up with mentally ill, substance abusing, or criminal household members, and parental separation or divorce. Our measure of headaches came from the medical review of systems using the question: “Are you troubled by frequent headaches?” We used the number of ACEs (ACE score) as a measure of cumulative childhood stress and hypothesized a “dose–response” relationship of the ACE score to the prevalence and risk of frequent headaches.

Results — Each of the ACEs was associated with an increased prevalence and risk of frequent headaches. As the ACE score increased the prevalence and risk of frequent headaches increased in a “dose–response” fashion. The risk of frequent headaches increased more than 2-fold (odds ratio 2.1, 95% confidence interval 1.8-2.4) in persons with an ACE score ≥5, compared to persons with and ACE score of 0. The dose–response relationship of the ACE score to frequent headaches was seen for both men and women.

The Conversation authors misrepresent this study. It is about self-reported headaches, not the subgroup of these patients reporting migraines. But in the first of their own studies they just cited, the authors contrast tension headaches with migraine headaches, with no controls.

So the data did not allow examination of the association between adult retrospective reports of childhood emotional abuse and migraines. There is no mention of self-reported depression and anxiety, which wiped out any relationship with childhood adversity in headaches in the first study. I would expect that a survey of ACES would include such self-report. And the ACEs equate either parental divorce and separation (the same common situation likely occur together and so are counted twice) with sexual abuse in calculating an overall score.

The authors make a big deal of the “dose-response” they found. But this dose-response could just represent uncontrolled confounding  – the more ACEs indicates the more confounding, greater likelihood that respondents faced other social, person, economic, and neighborhood deprivations.  The higher the ACE score, the greater likelihood that other background characteristic s are coming into play.

The only other evidence the authors cite is again another one of their papers, available only as a conference abstract. But the abstract states:

Results: About 14.2% (n = 2,061) of the sample reported a migraine diagnosis. Childhood abuse was recalled by 60.5% (n =1,246) of the migraine sample and 49% (n = 6,088) of the non-migraine sample. Childhood abuse increased the chances of a migraine diagnosis by 55% (OR: 1.55; 95% CI 1.35 – 1.77). Of the three types of abuse, emotional abuse had a stronger effect on migraine (OR: 1.52; 95% CI 1.34 – 1.73) when compared to physical and sexual abuse. When controlled for depression and anxiety, the effect of childhood abuse on migraine (OR: 1.32; 95% CI 1.15 – 1.51) attenuated but remained significant. Similarly, the effect of emotional abuse on migraine decreased but remained significant (OR: 1.33; 95% CI 1.16 – 1.52), when controlled for depression and anxiety.

The rates of childhood abuse seem curiously high for both the migraine and non-migraine samples. If you dig a bit on the web for details of the National Longitudinal Study of Adolescent Health, you can find how crude the measurement is.  The broad question assessing emotional abuse covers the full range of normal to abnormal situations without distinguishing among them.

How often did a parent or other adult caregiver say things that really hurt your feelings or made you feel like you were not wanted or loved? How old were you the first time this happened? (Emotional abuse).

An odds ratio of 1.33 is not going to attract much attention from an epidemiologist, particularly when it is obtained from such messy data.

I conclude that the authors have made only a weak case for the following statement: While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse.

Oddly, if we jump ahead to the closing section of The Conversation article, the authors concede:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

But, as we will  see, they make recommendations that assume a strong link has been established.

Why would emotional abuse in childhood lead to migraines in adulthood?

This section throws out a number of trending buzz terms, strings them together in a way that should impress and intimidate consumers, rather than allow them independent evaluation of what is being said.

got everything

The section also comes below a stock blue picture of the brain.  In web searches, the picture  is associated with social media where the brain is superficially brought into  in discussions where neuroscience is  not relevant.

An Australian neuroscientist commented on Facebook:

Deborah on blowing brains

The section starts out:

The fact that the risk goes up in response to increased exposure is what indicates that abuse may cause biological changes that can lead to migraine later in life. While the exact mechanism between migraine and childhood maltreatment is not yet established, research has deepened our understanding of what might be going on in the body and brain.

We could lost in a quagmire trying to figuring out the evidence for the loose associations that are packed into a five paragraph section.  Instead,  I’ll make some observations that can be followed up by interested readers.

The authors acknowledge that no mechanism has been established linking migraines and child maltreatment. The link for this statement takes the reader to the authors own pay walled article that is explicitly labeled “Opinion Statement ”.

The authors ignore a huge literature that acknowledges great heterogeneity among sufferers of migraines, but points to some rather strong evidence for treatments based on particular mechanisms identified among carefully selected patients. For instance, a paper published in The New England Journal of Medicine with well over 1500 citations:

Goadsby PJ, Lipton RB, Ferrari MD. Migraine—current understanding and treatment. New England Journal of Medicine. 2002 Jan 24;346(4):257-70.

Speculations concerning the connections between childhood adversity, migraines and the HPA axis are loose. The Conversation authors their obviousness needs to be better document with evidence.

For instance, if we try to link “childhood adversity” to the HPA axis, we need to consider the lack of specificity of” childhood adversity’ as defined by retrospective endorsement of Adverse Childhood Experiences (ACEs). Suppose we rely on individual checklist items or cumulative scores based on number of endorsements. We can’t be sure that we are dealing with actual rather than assumed exposure to traumatic events or that there be any consistent correlates in current measures derived from the HPA axis.

Any non-biological factor defined so vaguely is not going to be a candidate for mapping into causal processes or biological measurements.

An excellent recent Mind the Brain article by my colleague blogger Shaili Jain interviews Dr. Rachel Yehuda, who had a key role in researching HPA axis in stress. Dr. Yehuda says endocrinologists would cringe at the kind of misrepresentations that are being made in The Conversation article.

A recent systematic review concludes the evidence for specific links between child treatment and inflammatory markers is of limited and poor quality.

Coelho R, Viola TW, Walss‐Bass C, Brietzke E, Grassi‐Oliveira R. Childhood maltreatment and inflammatory markers: a systematic review. Acta Psychiatrica Scandinavica. 2014 Mar 1;129(3):180-92.

The Conversation article misrepresents gross inconsistencies in the evidence of biological correlates representing biomarkers. There are as yet no biomarkers for migraines in the sense of a biological measurement that reliably distinguishes persons with migraines from other patient populations with whom they may be confused. See an excellent funny blog post by Hilda Bastian.

Notice the rhetorical trick in authors of The Conversation article’s assertion that

Migraine is considered to be a hereditary condition. But, except in a small minority of cases, the genes responsible have not been identified.

Genetic denialists like Oliver James  or Richard Bentall commonly phrased questions in this manner to be a matter of hereditary versus non-hereditary. But complex traits like height, intelligence, or migraines involve combinations of variations in a number of genes, not a single gene or even a few genes.. For an example of the kind of insights that sophisticated genetic studies of migraines are yielding see:

Yang Y, Ligthart L, Terwindt GM, Boomsma DI, Rodriguez-Acevedo AJ, Nyholt DR. Genetic epidemiology of migraine and depression. Cephalalgia. 2016 Mar 9:0333102416638520.

The Conversation article ends with some signature nonsense speculation about epigenetics:

However, stress early in life induces alterations in gene expression without altering the DNA sequence. These are called epigenetic changes, and they are long-lasting and may even be passed on to offspring.

Interested readers can find these claims demolished in Epigenetic Ain’t Magic by PZ Myers, a biologist who attempts to rescue an extremely important development concept from its misuse.

Or Carl Zimmer’s Growing Pains for Field of Epigenetics as Some Call for Overhaul.

What does this mean for doctors treating migraine patients?

The Conversation authors startle readers with an acknowledgment that contradicts what they have been saying earlier in their article:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

It is therefore puzzling when they next say:

But because research indicates that there is a strong link between the two, clinicians may want to bear that in mind when evaluating patients.

Cognitive behavior therapy is misrepresented as an established effective treatment for migraines. A recent systematic review and meta-analysis  had to combine migraines with other chronic headaches and order to get ten studies to consider.

The conclusion of this meta-analysis:

Methodology inadequacies in the evidence base make it difficult to draw any meaningful conclusions or to make any recommendations.

The Conversation article notes that the FDA has approved anti-epileptic drugs such as valproate and topiramate for treatment of migraines. However, the article’s claim that the efficacy of these drugs are due to their effects on epigenetics is quite inconsistent with what is said in the larger literature.

Clinicians specializing and treating fibromyalgia or irritable bowel syndrome would be troubled by the authors’ lumping these conditions with migraines and suggesting that a psychiatric consultation is the most appropriate referral for patients who are having difficulty achieving satisfactory management.

See for instance the links contained in my blog post, No, irritable bowel syndrome is not all in your head.

The Conversation article closes with:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

It’s difficult to how this recommendation is relevant to what has preceded it. Routine screening is not evidence-based.

The authors should know that the World Health Organization formerly recommended screening primary care women for intimate abuse but withdrew the recommendation because of a lack of evidence that it improved outcomes for women facing abuse and a lack of evidence that no harm was being done.

I am sharing this blog post with the authors of The Conversation article. I am requesting a correction from The Conversation. Let’s see what they have to say.

Meanwhile, patients seeking health information are advised to avoid The Conversation.