When psychotherapy trials have multiple flaws…

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

mind the brain logo

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

We can learn to spot features of psychotherapy trials that are likely to lead to exaggerated claims of efficacy for treatments or claims that will not generalize beyond the sample that is being studied in a particular clinical trial. We can look to the adequacy of sample size, and spot what Cochrane collaboration has defined as risk of bias in their handy assessment tool.

We can look at the case-mix in the particular sites where patients were recruited.  We can examine the adequacy of diagnostic criteria that were used for entering patients to a trial. We can examine how blinded the trial was in terms of whoever assigned patients to particular conditions, but also what the patients, the treatment providers, and their evaluaters knew which condition to which particular patients were assigned.

And so on. But what about combinations of these factors?

We typically do not pay enough attention multiple flaws in the same trial. I include myself among the guilty. We may suspect that flaws are seldom simply additive in their effect, but we don’t consider whether they may be even synergism in the negative effects on the validity of a trial. As we will see in this analysis of a clinical trial, multiple flaws can provide more threats to the validity trial than what we might infer when the individual flaws are considered independently.

The particular paper we are probing is described in its discussion section as the “largest RCT to date testing the efficacy of group CBT for patients with CFS.” It also takes on added importance because two of the authors, Gijs Bleijenberg and Hans Knoop, are considered leading experts in the Netherlands. The treatment protocol was developed over time by the Dutch Expert Centre for Chronic Fatigue (NKCV, http://www.nkcv.nl; Knoop and Bleijenberg, 2010). Moreover, these senior authors dismiss any criticism and even ridicule critics. This study is cited as support for their overall assessment of their own work.  Gijs Bleijenberg claims:

Cognitive behavioural therapy is still an effective treatment, even the preferential treatment for chronic fatigue syndrome.

But

Not everybody endorses these conclusions, however their objections are mostly baseless.

Spoiler alert

This is a long read blog post. I will offer a summary for those who don’t want to read through it, but who still want the gist of what I will be saying. However, as always, I encourage readers to be skeptical of what I say and to look to my evidence and arguments and decide for themselves.

Authors of this trial stacked the deck to demonstrate that their treatment is effective. They are striving to support the extraordinary claim that group cognitive behavior therapy fosters not only better adaptation, but actually recovery from what is internationally considered a physical condition.

There are some obvious features of the study that contribute to the likelihood of a positive effect, but these features need to be considered collectively, in combination, to appreciate the strength of this effort to guarantee positive results.

This study represents the perfect storm of design features that operate synergistically:

perfect storm

 Referral bias – Trial conducted in a single specialized treatment setting known for advocating psychological factors maintaining physical illness.

Strong self-selection bias of a minority of patients enrolling in the trial seeking a treatment they otherwise cannot get.

Broad, overinclusive diagnostic criteria for entry into the trial.

Active treatment condition carry strong message how patients should respond to outcome assessment with improvement.

An unblinded trial with a waitlist control lacking the nonspecific elements (placebo) that confound the active treatment.

Subjective self-report outcomes.

Specifying a clinically significant improvement that required only that a primary outcome be less than needed for entry into the trial

Deliberate exclusion of relevant objective outcomes.

Avoidance of any recording of negative effects.

Despite the prestige attached to this trial in Europe, the US Agency for Healthcare Research and Quality (AHRQ) excludes this trial from providing evidence for its database of treatments for chronic fatigue syndrome/myalgic encephalomyelitis. We will see why in this post.

factsThe take away message: Although not many psychotherapy trials incorporate all of these factors, most trials have some. We should be more sensitive to when multiple factors occur in the same trial, like bias in the site for patient recruitment; lacking of blinding; lack of balance between active treatment and control condition in terms of nonspecific factors, and subjective self-report measures.

The article reporting the trial is

Wiborg JF, van Bussel J, van Dijk A, Bleijenberg G, Knoop H. Randomised controlled trial of cognitive behaviour therapy delivered in groups of patients with chronic fatigue syndrome. Psychotherapy and Psychosomatics. 2015;84(6):368-76.

Unfortunately, the article is currently behind a pay wall. Perhaps readers could contact the corresponding author Hans.knoop@radboudumc.nl  and request a PDF.

The abstract

Background: Meta-analyses have been inconclusive about the efficacy of cognitive behaviour therapies (CBTs) delivered in groups of patients with chronic fatigue syndrome (CFS) due to a lack of adequate studies. Methods: We conducted a pragmatic randomised controlled trial with 204 adult CFS patients from our routine clinical practice who were willing to receive group therapy. Patients were equally allocated to therapy groups of 8 patients and 2 therapists, 4 patients and 1 therapist or a waiting list control condition. Primary analysis was based on the intention-to-treat principle and compared the intervention group (n = 136) with the waiting list condition (n = 68). The study was open label. Results: Thirty-four (17%) patients were lost to follow-up during the course of the trial. Missing data were imputed using mean proportions of improvement based on the outcome scores of similar patients with a second assessment. Large and significant improvement in favour of the intervention group was found on fatigue severity (effect size = 1.1) and overall impairment (effect size = 0.9) at the second assessment. Physical functioning and psychological distress improved moderately (effect size = 0.5). Treatment effects remained significant in sensitivity and per-protocol analyses. Subgroup analysis revealed that the effects of the intervention also remained significant when both group sizes (i.e. 4 and 8 patients) were compared separately with the waiting list condition. Conclusions: CBT can be effectively delivered in groups of CFS patients. Group size does not seem to affect the general efficacy of the intervention which is of importance for settings in which large treatment groups are not feasible due to limited referral

The trial registration

http://www.isrctn.com/ISRCTN15823716

Who was enrolled into the trial?

Who gets into a psychotherapy trial is a function of the particular treatment setting of the study, the diagnostic criteria for entry, and patient preferences for getting their care through a trial, rather than what is being routinely provided in that setting.

 We need to pay particular attention to when patients enter psychotherapy trials hoping they will receive a treatment they prefer and not to be assigned to the other condition. Patients may be in a clinical trial for the betterment of science, but in some settings, they are willing to enroll because of a probability of getting treatment they otherwise could not get. This in turn also affects the evaluation of both the condition in which they get the preferred treatment, but also their evaluation of the condition in which they are denied it. Simply put, they register being pleased with what they wanted or not being pleased if they did not get what they wanted.

The setting is relevant to evaluating who was enrolled in a trial.

The authors’ own outpatient clinic at the Radboud University Medical Center was the site of the study. The group has an international reputation for promoting the biopsychosocial model, in which psychological factors are assumed to be the decisive factor in maintaining somatic complaints.

All patients were referred to our outpatient clinic for the management of chronic fatigue.

There is thus a clear referral bias  or case-mix bias but we are not provided a ready basis for quantifying it or even estimating its effects.

The diagnostic criteria.

The article states:

In accordance with the US Center for Disease Control [9], CFS was defined as severe and unexplained fatigue which lasts for at least 6 months and which is accompanied by substantial impairment in functioning and 4 or more additional complaints such as pain or concentration problems.

Actually, the US Center for Disease Control would now reject this trial because these entry criteria are considered obsolete, overinclusive, and not sufficiently exclusive of other conditions that might be associated with chronic fatigue.*

There is a real paradigm shift happening in America. Both the 2015 IOM Report and the Centers for Disease Control and Prevention (CDC) website emphasize Post Exertional Malaise and getting more ill after any effort with M.E. CBT is no longer recommended by the CDC as treatment.

cdc criteriaThe only mandatory symptom for inclusion in this study is fatigue lasting 6 months. Most properly, this trial targets chronic fatigue [period] and not the condition, chronic fatigue syndrome.

Current US CDC recommendations  (See box  7-1 from the IoM document, above) for diagnosis require postexertional malaise for a diagnosis of myalgic encephalomyelitis (ME). See below.

pemPatients meeting the current American criteria for ME would be eligible for enrollment in this trial, but it’s unclear what proportion of the patients enrolled actually met the American criteria. Because of the over-inclusiveness of the entry diagnostic criteria, it is doubtful whether the results would generalize to American sample. A look at patient flow into the study will be informative.

Patient flow

Let’s look at what is said in the text, but also in the chart depicting patient flow into the trial for any self-selection that might be revealed.

In total, 485 adult patients were diagnosed with CFS during the inclusion period at our clinic (fig. 1). One hundred and fifty-seven patients were excluded from the trial because they declined treatment at our clinic, were already asked to participate in research incompatible with inclusion (e.g. research focusing on individual CBT for CFS) or had a clinical reason for exclusion (i.e. they received specifically tailored interventions because they were already unsuccessfully treated with individual CBT for CFS outside our clinic or were between 18 and 21 years of age and the family had to be involved in the therapy). Of the 328 patients who were asked to engage in group therapy, 99 (30%) patients indicated that they were unwilling to receive group therapy. In 25 patients, the reason for refusal was not recorded. Two hundred and four patients were randomly allocated to one of the three trial conditions. Baseline characteristics of the study sample are presented in table 1. In total, 34 (17%) patients were lost to follow-up. Of the remaining 170 patients, 1 patient had incomplete primary outcome data and 6 patients had incomplete secondary outcome data.

flow chart

We see that the investigators invited two thirds of patients attending the clinic to enroll in the trial. Of these, 41% refused. We don’t know the reason for some of the refusals, but almost a third of the patients approached declined because they did not want group therapy. The authors left being able to randomize 42% of patients coming to the clinic or less than two thirds of patients they actually asked. Of these patients, a little more than two thirds received the treatment to which were randomized and were available for follow-up.

These patients receiving treatment to which they were randomized and who were available for follow-up are self-selected minority of the patients coming to the clinic. This self-selection process likely reduced the proportion of patients with myalgic encephalomyelitis. It is estimated that 25% of patients meeting the American criteria a housebound and 75% are unable to work. It’s reasonably to infer that patients being the full criteria would opt out of a treatment that require regular attendance of a group session.

The trial is biased to ambulatory patients with fatigue and not ME. Their fatigue is likely due to some combinations of factors such as multiple co-morbidities, as-yet-undiagnosed medical conditions, drug interactions, and the common mild and subsyndromal  anxiety and depressive symptoms that characterize primary care populations.

The treatment being evaluated

Group cognitive behavior therapy for chronic fatigue syndrome, either delivered in a small (4 patients and 1 therapist) or larger (8 patients and 2 therapists) group format.

The intervention consisted of 14 group sessions of 2 h within a period of 6 months followed by a second assessment. Before the intervention started, patients were introduced to their group therapist in an individual session. The intervention was based on previous work of our research group [4,13] and included personal goal setting, fixing sleep-wake cycles, reducing the focus on bodily symptoms, a systematic challenge of fatigue-related beliefs, regulation and gradual increase in activities, and accomplishment of personal goals. A formal exercise programme was not part of the intervention.

Patients received a workbook with the content of the therapy. During sessions, patients were explicitly invited to give feedback about fatigue-related cognitions and behaviours to fellow patients. This aspect was introduced to facilitate a pro-active attitude and to avoid misperceptions of the sessions as support group meetings which have been shown to be insufficient for the treatment of CFS.

And note:

In contrast to our previous work [4], we communicated recovery in terms of fatigue and disabilities as general goal of the intervention.

Some impressions of the intensity of this treatment. This is a rather intensive treatment with patients having considerable opportunities for interactions with providers. This factor alone distinguishes being assigned to the intervention group versus being left in the wait list control group and could prove powerful. It will be difficult to distinguish intensity of contact from any content or active ingredients of the therapy.

I’ll leave for another time a fuller discussion of the extent to which what was labeled as cognitive behavior therapy in this study is consistent with cognitive therapy as practiced by Aaron Beck and other leaders of the field. However, a few comments are warranted. What is offered in this trial does not sound like cognitive therapy as Americans practice it. What is often in this trial seems emphasize challenging beliefs, pushing patients to get more active, along with psychoeducational activities. I don’t see indications of the supportive, collaborative relationship in which patients are encouraged to work on what they want to work on, engage in outside activities (homework assignments) and get feedback.

What is missing in this treatment is what Beck calls collaborative empiricism, “a systemic process of therapist and patient working together to establish common goals in treatment, has been found to be one of the primary change agents in cognitive-behavioral therapy (CBT).”

Importantly, in Beck’s approach, the therapist does not assume cognitive distortions on the part of the patient. Rather, in collaboration with the patient, the therapist introduces alternatives to the interpretations that the patient has been making and encourages the patient to consider the difference. In contrast, rather than eliciting goal statements from patients, therapist in this study imposes the goal of increased activity. Therapists in this study also seem ready to impose their views that the patients’ fatigue-related beliefs are maladaptive.

The treatment offered in this trial is complex, with multiple components making multiple assumptions that seem quite different from what is called cognitive therapy or cognitive behavioral therapy in the US.

The authors’ communication of recovery from fatigue and disability seems a radical departure not only from cognitive behavior therapy for anxiety and depression and pain, but for cognitive behavior therapy offered for adaptation to acute and chronic physical illnesses. We will return to this “communication” later.

The control group

Patients not randomized to group CBT were placed on a waiting list.

Think about it! What do patients think about having gotten involved in all the inconvenience and burden of a clinical trial in hope that they would get treatment and then being assigned to the control group with just waiting? Not only are they going to be disappointed and register that in their subjective evaluations of the outcome assessments patients may worry about jeopardizing the right to the treatment they are waiting for if they overly endorse positive outcomes. There is a potential for  nocebo effect , compounding the placebo effect of assignment to the CBT active treatment groups.

What are informative comparisons between active treatments and  control conditions?

We need to ask more often what inclusion of a control group accomplishes for the evaluation of a psychotherapy. In doing so, we need to keep in mind that psychotherapies do not have effect sizes, only comparisons of psychotherapies and control condition have effect sizes.

A pre-post evaluation of psychotherapy from baseline to follow-up includes the effects of any active ingredient in the psychotherapy, a host of nonspecific (placebo) factors, and any changes that would’ve occurred in the absence of the intervention. These include regression to the mean– patients are more likely to enter a clinical trial now, rather than later or previously, if there has been exacerbation of their symptoms.

So, a proper comparison/control condition includes everything that the patients randomized to the intervention group get except for the active treatment. Ideally, the intervention and the comparison/control group are equivalent on all these factors, except the active ingredient of the intervention.

That is clearly not what is happening in this trial. Patients randomized to the intervention group get the intervention, the added intensity and frequency of contact with professionals that the intervention provides, and all the support that goes with it; and the positive expectations that come with getting a therapy that they wanted.

Attempts to evaluate the group CBT versus the wait-list control group involved confounding the active ingredients of the CBT and all these nonspecific effects. The deck is clearly being stacked in favor of CBT.

This may be a randomized trial, but properly speaking, this is not a randomized controlled trial, because the comparison group does not control for nonspecific factors, which are imbalanced.

The unblinded nature of the trial

In RCTs of psychotropic drugs, the ideal is to compare the psychotropic drug to an inert pill placebo with providers, patients, and evaluate being blinded as to whether the patients received psychotropic drug or the comparison pill.

While it is difficult to achieve a comparable level of blindness and a psychotherapy trial, more of an effort to achieve blindness is desirable. For instance, in this trial, the authors took pains to distinguish the CBT from what would’ve happened in a support group. A much more adequate comparison would therefore be CBT versus either a professional or peer-led support group with equivalent amounts of contact time. Further blinding would be possible if patients were told only two forms of group therapy were being compared. If that was the information available to patients contemplating consenting to the trial, it wouldn’t have been so obvious from the outset to the patients being randomly assigned that one group was preferable to the other.

Subjective self-report outcomes.

The primary outcomes for the trial were the fatigue subscale of the Checklist Individual Strength;  the physical functioning subscale of the Short Health Survey 36 (SF-36); and overall impairment as measured by the Sickness Impact Profile (SIP).

Realistically, self-report outcomes are often all that is available in many psychotherapy trials. Commonly these are self-report assessments of anxiety and depressive symptoms, although these may be supplemented by interviewer-based assessments. We don’t have objective biomarkers with which to evaluate psychotherapy.

These three self-report measures are relatively nonspecific, particularly in a population that is not characterized by ME. Self-reported fatigue in a primary care population lacks discriminative validity with respect to pain, anxiety and depressive symptoms, and general demoralization.  The measures are susceptible to receipt of support and re-moralization, as well as gratitude for obtaining a treatment that was sought.

Self-report entry criteria include a score 35 or higher on the fatigue severity subscale. Yet, a score of less than 35 on this scale at follow up is part of what is defined as a clinically significant improvement with a composite score from combined self-report measures.

We know from medical trials that differences can be observed with subjective self-report measures that will not be found with objective measures. Thus, mildly asthmatic patients will fail to distinguish in their subjective self-reports between [  between the effective inhalant albuterol, an inert inhalant, and sham acupuncture, but will rate improvement better than getting no intervention.  However,  there will be a strong advantage over the other three conditions with an objective measure, maximum forced expiratory volume in 1 second (FEV1) as assessed  with spirometry.

The suppression of objective outcome measures

We cannot let these the authors of this trial off the hook in their dependence on subjective self-report outcomes. They are instructing patients that recovery is the goal, which implies that it is an attainable goal. We can reasonably be skeptical about acclaim of recovery based on changes in self-report measures. Were the patients actually able to exercise? What was their exercise capacity, as objectively measured? Did they return to work?

These authors have included such objective measurements in past studies, but not included them as primary outcomes, nor, even in some cases, reported them in the main paper reporting the trial.

Wiborg JF, Knoop H, Stulemeijer M, Prins JB, Bleijenberg G. How does cognitive behaviour therapy reduce fatigue in patients with chronic fatigue syndrome? The role of physical activity. Psychol Med. 2010 Jan 5:1

The senior authors’ review fails to mention their three studies using actigraphy that did not find effects for CBT. I am unaware of any studies that did find enduring effects.

Perhaps this is what they mean when they say the protocol has been developed over time – they removed what they found to be threats to the findings that they wanted to claim.

Dismissing of any need to consider negative effects of treatment

Most psychotherapy fail to assess any adverse effects of treatment, but this is usually done discretely, without mention. In contrast, this article states

Potential harms of the intervention were not assessed. Previous research has shown that cognitive behavioural interventions for CFS are safe and unlikely to produce detrimental effects.

Patients who meet stringent criteria for ME would be put at risk for pressure to exert themselves. By definition they are vulnerable to postexertional malaise (PEM). Any trail of this nature needs to assess that risk. Maybe no adverse effects would be found. If that were so, it would strongly indicate the absence of patients with appropriate diagnoses.

Timing of assessment of outcomes varied between intervention and control group.

I at first did not believe what I was reading when I encountered this statement in the results section.

The mean time between baseline and second assessment was 6.2 months (SD = 0.9) in the control condition and 12.0 months (SD = 2.4) in the intervention group. This difference in assessment duration was significant (p < 0.001) and was mainly due to the fact that the start of the therapy groups had to be frequently postponed because of an irregular patient flow and limited treatment capacities for group therapy at our clinic. In accordance with the treatment manual, the second assessment was postponed until the fourteenth group session was accomplished. The mean time between the last group session and the second assessment was 3.3 weeks (SD = 3.5).

So, outcomes were assessed for the intervention group shortly after completion of therapy, when nonspecific (placebo) effects would be stronger, but a mean of six months later than for patients assigned to the control condition.

Post-hoc statistical controls are not sufficient to rescue the study from this important group difference, and it compounds other problems in the study.

Take away lessons

Pay more attention to how limitations any clinical trial may compound each other in terms of the trial provide exaggerated estimates of the effects of treatment or the generalizability of the results to other settings.

Be careful of loose diagnostic criteria because a trial may not generalize to the same criteria being applied in settings that are different either in terms of patient population of the availability of different treatments. This is particularly important when a treatment setting has a bias in referrals and only a minority of patients being invited to participate in the trial actually agree and are enrolled.

Ask questions about just what information is obtained in comparing active treatment group and the study to its control/comparison. For start, just what is being controlled and how might that affect the estimates of the effectiveness of the active treatment?

Pay particular attention to the potent combination of the trial being unblinded, a weak comparision/control, and an active treatment that is not otherwise available to patients.

Note

*The means of determining whether the six months of fatigue might be accounted for by other medical factors was specific to the setting. Note that a review of medical records for sufficient for an unknown proportion of patients, with no further examination or medical tests.

The Department of Internal Medicine at the Radboud University Medical Center assessed the medical examination status of all patients and decided whether patients had been sufficiently examined by a medical doctor to rule out relevant medical explanations for the complaints. If patients had not been sufficiently examined, they were seen for standard medical tests at the Department of Internal Medicine prior to referral to our outpatient clinic. In accordance with recommendations by the Centers for Disease Control, sufficient medical examination included evaluation of somatic parameters that may provide evidence for a plausible somatic explanation for prolonged fatigue [for a list, see [9]. When abnormalities were detected in these tests, additional tests were made based on the judgement of the clinician of the Department of Internal Medicine who ultimately decided about the appropriateness of referral to our clinic. Trained therapists at our clinic ruled out psychiatric comorbidity as potential explanation for the complaints in unstructured clinical interviews.

workup

Creating illusions of wondrous effects of yoga and meditation on health: A skeptic exposes tricks

The tour of the sausage factory is starting, here’s your brochure telling you’ll see.

 

A recent review has received a lot of attention with it being used for claims that mind-body interventions have distinct molecular signatures that point to potentially dramatic health benefits for those who take up these practices.

What Is the Molecular Signature of Mind–Body Interventions? A Systematic Review of Gene Expression Changes Induced by Meditation and Related Practices.  Frontiers in Immunology. 2017;8.

Few who are tweeting about this review or its press coverage are likely to have read it or to understand it, if they read it. Most of the new agey coverage in social media does nothing more than echo or amplify the message of the review’s press release.  Lazy journalists and bloggers can simply pass on direct quotes from the lead author or even just the press release’s title, ‘Meditation and yoga can ‘reverse’ DNA reactions which cause stress, new study suggests’:

“These activities are leaving what we call a molecular signature in our cells, which reverses the effect that stress or anxiety would have on the body by changing how our genes are expressed.”

And

“Millions of people around the world already enjoy the health benefits of mind-body interventions like yoga or meditation, but what they perhaps don’t realise is that these benefits begin at a molecular level and can change the way our genetic code goes about its business.”

[The authors of this review actually identified some serious shortcomings to the studies they reviewed. I’ll be getting to some excellent points at the end of this post that run quite counter to the hype. But the lead author’s press release emphasized unwarranted positive conclusions about the health benefits of these practices. That is what is most popular in media coverage, especially from those who have stuff to sell.]

Interpretation of the press release and review authors’ claims requires going back to the original studies, which most enthusiasts are unlikely to do. If readers do go back, they will have trouble interpreting some of the deceptive claims that are made.

Yet, a lot is at stake. This review is being used to recommend mind-body interventions for people having or who are at risk of serious health problems. In particular, unfounded claims that yoga and mindfulness can increase the survival of cancer patients are sometimes hinted at, but occasionally made outright.

This blog post is written with the intent of protecting consumers from such false claims and providing tools so they can spot pseudoscience for themselves.

Discussion in the media of the review speaks broadly of alternative and complementary interventions. The coverage is aimed at inspiring  confidence in this broad range of treatments and to encourage people who are facing health crises investing time and money in outright quackery. Seemingly benign recommendations for yoga, tai chi, and mindfulness (after all, what’s the harm?) often become the entry point to more dubious and expensive treatments that substitute for established treatments.  Once they are drawn to centers for integrative health care for classes, cancer patients are likely to spend hundreds or even thousands on other products and services that are unlikely to benefit them. One study reported:

More than 72 oral or topical, nutritional, botanical, fungal and bacterial-based medicines were prescribed to the cohort during their first year of IO care…Costs ranged from $1594/year for early-stage breast cancer to $6200/year for stage 4 breast cancer patients. Of the total amount billed for IO care for 1 year for breast cancer patients, 21% was out-of-pocket.

Coming up, I will take a skeptical look at the six randomized trials that were highlighted by this review.  But in this post, I will provide you with some tools and insights so that you do not have to make such an effort in order to make an informed decision.

Like many of the other studies cited in the review, these randomized trials were quite small and underpowered. But I will focus on the six because they are as good as it gets. Randomized trials are considered a higher form of evidence than simple observational studies or case reports [It is too bad the authors of the review don’t even highlight what studies are randomized trials. They are lumped with others as “longitudinal studies.]

As a group, the six studies do not actually add any credibility to the claims that mind-body interventions – specifically yoga, tai chi, and mindfulness training or retreats improve health by altering DNA.  We can be no more confident with what the trials provide than we would be without them ever having been done.

I found the task of probing and interpreting the studies quite labor-intensive and ultimately unrewarding.

I had to get past poor reporting of what was actually done in the trials, to which patients, and with what results. My task often involved seeing through cover ups with authors exercising considerable flexibility in reporting what measures were they actually collected and what analyses were attempted, before arriving at the best possible tale of the wondrous effects of these interventions.

Interpreting clinical trials should not be so hard, because they should be honestly and transparently reported and have a registered protocol and stick to it. These reports of trials were sorely lacking, The full extent of the problems took some digging to uncover, but some things emerged before I got to the methods and results.

The introductions of these studies consistently exaggerated the strength of existing evidence for the effects of these interventions on health, even while somehow coming to the conclusion that this particular study was urgently needed and it might even be the “first ever”. The introductions to the six papers typically cross-referenced each other, without giving any indication of how poor quality the evidence was from the other papers. What a mutual admiration society these authors are.

One giveaway is how the introductions  referred to the biggest, most badass, comprehensive and well-done review, that of Goyal and colleagues.

That review clearly states that the evidence for the effects of mindfulness is poor quality because of the lack of comparisons with credible active treatments. The typical randomized trial of mindfulness involves a comparison with no-treatment, a waiting list, or patients remaining in routine care where the target problem is likely to be ignored.  If we depend on the bulk of the existing literature, we cannot rule out the likelihood that any apparent benefits of mindfulness are due to having more positive expectations, attention, and support over simply getting nothing.  Only a handful  of hundreds of trials of mindfulness include appropriate, active treatment comparison/control groups. The results of those studies are not encouraging.

One of the first things I do in probing the introduction of a study claiming health benefits for mindfulness is see how they deal with the Goyal et al review. Did the study cite it, and if so, how accurately? How did the authors deal with its message, which undermines claims of the uniqueness or specificity of any benefits to practicing mindfulness?

For yoga, we cannot yet rule out that it is better than regular exercising – in groups or alone – having relaxing routines. The literature concerning tai chi is even smaller and poorer quality, but there is the same need to show that practicing tai chi has any benefits over exercising in groups with comparable positive expectations and support.

Even more than mindfulness, yoga and tai chi attract a lot of pseudoscientific mumbo jumbo about integrating Eastern wisdom and Western science. We need to look past that and insist on evidence.

Like their introductions, the discussion sections of these articles are quite prone to exaggerating how strong and consistent the evidence is from existing studies. The discussion sections cherry pick positive findings in the existing literature, sometimes recklessly distorting them. The authors then discuss how their own positively spun findings fit with what is already known, while minimizing or outright neglecting discussion of any of their negative findings. I was not surprised to see one trial of mindfulness for cancer patients obtain no effects on depressive symptoms or perceived stress, but then go on to explain mindfulness might powerfully affect the expression of DNA.

If you want to dig into the details of these studies, the going can get rough and the yield for doing a lot of mental labor is low. For instance, these studies involved drawing blood and analyzing gene expression. Readers will inevitably encounter passages like:

In response to KKM treatment, 68 genes were found to be differentially expressed (19 up-regulated, 49 down-regulated) after adjusting for potentially confounded differences in sex, illness burden, and BMI. Up-regulated genes included immunoglobulin-related transcripts. Down-regulated transcripts included pro-inflammatory cytokines and activation-related immediate-early genes. Transcript origin analyses identified plasmacytoid dendritic cells and B lymphocytes as the primary cellular context of these transcriptional alterations (both p < .001). Promoter-based bioinformatic analysis implicated reduced NF-κB signaling and increased activity of IRF1 in structuring those effects (both p < .05).

Intimidated? Before you defer to the “experts” doing these studies, I will show you some things I noticed in the six studies and how you can debunk the relevance of these studies for promoting health and dealing with illness. Actually, I will show that even if these 6 studies got the results that the authors claimed- and they did not- at best, the effects would trivial and lost among the other things going on in patients’ lives.

Fortunately, there are lots of signs that you can dismiss such studies and go on to something more useful, if you know what to look for.

Some general rules:

  1. Don’t accept claims of efficacy/effectiveness based on underpowered randomized trials. Dismiss them. The rule of thumb is reliable to dismiss trials that have less than 35 patients in the smallest group. Over half the time, true moderate sized effects will be missed in such studies, even if they are actually there.

Due to publication bias, most of the positive effects that are published from such sized trials will be false positives and won’t hold up in well-designed, larger trials.

When significant positive effects from such trials are reported in published papers, they have to be large to have reached significance. If not outright false, these effect sizes won’t be matched in larger trials. So, significant, positive effect sizes from small trials are likely to be false positives and exaggerated and probably won’t replicate. For that reason, we can consider small studies to be pilot or feasibility studies, but not as providing estimates of how large an effect size we should expect from a larger study. Investigators do it all the time, but they should not: They do power calculations estimating how many patients they need for a larger trial from results of such small studies. No, no, no!

Having spent decades examining clinical trials, I am generally comfortable dismissing effect sizes that come from trials with less than 35 patients in the smaller group. I agree with a suggestion that if there are two larger trials are available in a given literature, go with those and ignore the smaller studies. If there are not at least two larger studies, keep the jury out on whether there is a significant effect.

Applying the Rule of 35, 5 of the 6 trials can be dismissed and the sixth is ambiguous because of loss of patients to follow up.  If promoters of mind-body interventions want to convince us that they have beneficial effects on physical health by conducting trials like these, they have to do better. None of the individual trials should increase our confidence in their claims. Collectively, the trials collapse in a mess without providing a single credible estimate of effect size. This attests to the poor quality of evidence and disrespect for methodology that characterizes this literature.

  1. Don’t be taken in by titles to peer-reviewed articles that are themselves an announcement that these interventions work. Titles may not be telling the truth.

What I found extraordinary is that five of the six randomized trials had a title that indicating a positive effect was found. I suspect that most people encountering the title will not actually go on to read the study. So, they will be left with the false impression that positive results were indeed obtained. It’s quite a clever trick to make the title of an article, by which most people will remember it, into a false advertisement for what was actually found.

For a start, we can simply remind ourselves that with these underpowered studies, investigators should not even be making claims about efficacy/effectiveness. So, one trick of the developing skeptic is to confirm that the claims being made in the title don’t fit with the size of the study. However, actually going to the results section one can find other evidence of discrepancies between what was found in what is being claimed.

I think it’s a general rule of thumb that we should be careful of titles for reports of randomized that declare results. Even when what is claimed in the title fits with the actual results, it often creates the illusion of a greater consistency with what already exists in the literature. Furthermore, even when future studies inevitably fail to replicate what is claimed in the title, the false claim lives on, because failing to replicate key findings is almost never a condition for retracting a paper.

  1. Check the institutional affiliations of the authors. These 6 trials serve as a depressing reminder that we can’t go on researchers’ institutional affiliation or having federal grants to reassure us of the validity of their claims. These authors are not from Quack-Quack University and they get funding for their research.

In all cases, the investigators had excellent university affiliations, mostly in California. Most studies were conducted with some form of funding, often federal grants.  A quick check of Google would reveal from at least one of the authors on a study, usually more, had federal funding.

  1. Check the conflicts of interest, but don’t expect the declarations to be informative. But be skeptical of what you find. It is also disappointing that a check of conflict of interest statements for these articles would be unlikely to arouse the suspicion that the results that were claimed might have been influenced by financial interests. One cannot readily see that the studies were generally done settings promoting alternative, unproven treatments that would benefit from the publicity generated from the studies. One cannot see that some of the authors have lucrative book contracts and speaking tours that require making claims for dramatic effects of mind-body treatments could not possibly be supported by: transparent reporting of the results of these studies. As we will see, one of the studies was actually conducted in collaboration with Deepak Chopra and with money from his institution. That would definitely raise flags in the skeptic community. But the dubious tie might be missed by patients in their families vulnerable to unwarranted claims and unrealistic expectations of what can be obtained outside of conventional medicine, like chemotherapy, surgery, and pharmaceuticals.

Based on what I found probing these six trials, I can suggest some further rules of thumb. (1) Don’t assume for articles about health effects of alternative treatments that all relevant conflicts of interest are disclosed. Check the setting in which the study was conducted and whether it was in an integrative [complementary and alternative, meaning mostly unproven.] care setting was used for recruiting or running the trial. Not only would this represent potential bias on the part of the authors, it would represent selection bias in recruitment of patients and their responsiveness to placebo effects consistent with the marketing themes of these settings.(2) Google authors and see if they have lucrative pop psychology book contracts, Ted talks, or speaking gigs at positive psychology or complementary and alternative medicine gatherings. None of these lucrative activities are typically expected to be disclosed as conflicts of interest, but all require making strong claims that are not supported by available data. Such rewards are perverse incentives for authors to distort and exaggerate positive findings and to suppress negative findings in peer-reviewed reports of clinical trials. (3) Check and see if known quacks have prepared recruitment videos for the study, informing patients what will be found (Serious, I was tipped off to look and I found that).

  1. Look for the usual suspects. A surprisingly small, tight, interconnected group is generating this research. You could look the authors up on Google or Google Scholar or  browse through my previous blog posts and see what I have said about them. As I will point out in my next blog, one got withering criticism for her claim that drinking carbonated sodas but not sweetened fruit drinks shortened your telomeres so that drinking soda was worse than smoking. My colleagues and I re-analyzed the data of another of the authors. We found contrary to what he claimed, that pursuing meaning, rather than pleasure in your life, affected gene expression related to immune function. We also showed that substituting randomly generated data worked as well as what he got from blood samples in replicating his original results. I don’t think it is ad hominem to point out a history for both of the authors of making implausible claims. It speaks to source credibility.
  1. Check and see if there is a trial registration for a study, but don’t stop there. You can quickly check with PubMed if a report of a randomized trial is registered. Trial registration is intended to ensure that investigators commit themselves to a primary outcome or maybe two and whether that is what they emphasized in their paper. You can then check to see if what is said in the report of the trial fits with what was promised in the protocol. Unfortunately, I could find only one of these was registered. The trial registration was vague on what outcome variables would be assessed and did not mention the outcome emphasized in the published paper (!). The registration also said the sample would be larger than what was reported in the published study. When researchers have difficulty in recruitment, their study is often compromised in other ways. I’ll show how this study was compromised.

Well, it looks like applying these generally useful rules of thumb is not always so easy with these studies. I think the small sample size across all of the studies would be enough to decide this research has yet to yield meaningful results and certainly does not support the claims that are being made.

But readers who are motivated to put in the time of probing deeper come up with strong signs of p-hacking and questionable research practices.

  1. Check the report of the randomized trial and see if you can find any declaration of one or two primary outcomes and a limited number of secondary outcomes. What you will find instead is that the studies always have more outcome variables than patients receiving these interventions. The opportunities for cherry picking positive findings and discarding the rest are huge, especially because it is so hard to assess what data were collected but not reported.
  1. Check and see if you can find tables of unadjusted primary and secondary outcomes. Honest and transparent reporting involves giving readers a look at simple statistics so they can decide if results are meaningful. For instance, if effects on stress and depressive symptoms are claimed, are the results impressive and clinically relevant? Almost in all cases, there is no peeking allowed. Instead, authors provide analyses and statistics with lots of adjustments made. They break lots of rules in doing so, especially with such a small sample. These authors are virtually assured to get results to crow about.

Famously, Joe Simmons and Leif Nelson hilariously published claims that briefly listening to the Beatles’ “When I’m 64” left students a year and a half older younger than if they were assigned to listening to “Kalimba.”  Simmons and Leif Nelson knew this was nonsense, but their intent was to show what researchers can do if they have free reign with how they analyze their data and what they report and  . They revealed the tricks they used, but they were so minor league and amateurish compared to what the authors of these trials consistently did in claiming that yoga, tai chi, and mindfulness modified expression of DNA.

Stay tuned for my next blog post where I go through the six studies. But consider this, if you or a loved one have to make an immediate decision about whether to plunge into the world of woo woo unproven medicine in hopes of  altering DNA expression. I will show the authors of these studies did not get the results they claimed. But who should care if they did? Effects were laughably trivial. As the authors of this review about which I have been complaining noted:

One other problem to consider are the various environmental and lifestyle factors that may change gene expression in similar ways to MBIs [Mind-Body Interventions]. For example, similar differences can be observed when analyzing gene expression from peripheral blood mononuclear cells (PBMCs) after exercise. Although at first there is an increase in the expression of pro-inflammatory genes due to regeneration of muscles after exercise, the long-term effects show a decrease in the expression of pro-inflammatory genes (55). In fact, 44% of interventions in this systematic review included a physical component, thus making it very difficult, if not impossible, to discern between the effects of MBIs from the effects of exercise. Similarly, food can contribute to inflammation. Diets rich in saturated fats are associated with pro-inflammatory gene expression profile, which is commonly observed in obese people (56). On the other hand, consuming some foods might reduce inflammatory gene expression, e.g., drinking 1 l of blueberry and grape juice daily for 4 weeks changes the expression of the genes related to apoptosis, immune response, cell adhesion, and lipid metabolism (57). Similarly, a diet rich in vegetables, fruits, fish, and unsaturated fats is associated with anti-inflammatory gene profile, while the opposite has been found for Western diet consisting of saturated fats, sugars, and refined food products (58). Similar changes have been observed in older adults after just one Mediterranean diet meal (59) or in healthy adults after consuming 250 ml of red wine (60) or 50 ml of olive oil (61). However, in spite of this literature, only two of the studies we reviewed tested if the MBIs had any influence on lifestyle (e.g., sleep, diet, and exercise) that may have explained gene expression changes.

How about taking tango lessons instead? You would at least learn dance steps, get exercise, and decrease any social isolation. And so what if there were more benefits than taking up these other activities?

 

 

Jane Brody promoting the pseudoscience of Barbara Fredrickson in the New York Times

Journalists’ coverage of positive psychology and health is often shabby, even in prestigious outlets like The New York Times.

Jane Brody’s latest installment of the benefits of being positive on health relied heavily on the work of Barbara Fredrickson that my colleagues and I have thoroughly debunked.

All of us need to recognize that research concerning effects of positive psychology interventions are often disguised randomized controlled trials.

With that insight, we need to evaluate this research in terms of reporting standards like CONSORT and declarations of conflict of interests.

We need to be more skeptical about the ability of small changes in behavior being able to profoundly improve health.

When in doubt, assume that much of what we read in the media about positivity and health is false or at least exaggerated.

Jane Brody starts her article in The New York Times by describing how most mornings she is “grinning from ear to ear, uplifted not just by my own workout but even more so” by her interaction with toddlers on the way home from where she swims. When I read Brody’s “Turning Negative Thinkers Into Positive Ones.” I was not left grinning ear to ear. I was left profoundly bummed.

I thought real hard about what was so unsettling about Brody’s article. I now have some clarity.

I don’t mind suffering even pathologically cheerful people in the morning. But I do get bothered when they serve up pseudoscience as the real thing.

I had expected to be served up Brody’s usual recipe of positive psychology pseudoscience concocted  to coerce readers into heeding her Barnum advice about how they should lead their lives. “Smile or die!” Apologies to my friend Barbara Ehrenreich for my putting the retitling of her book outside of North America to use here. I invoke the phrase because Jane Brody makes the case that unless we do what she says, we risk hurting our health and shortening our lives. So we better listen up.

What bummed me most this time was that Brody was drawing on the pseudoscience of Barbara Fredrickson that my colleagues and I have worked so hard to debunk. We took the trouble of obtaining data sets for two of her key papers for reanalysis. We were dismayed by the quality of the data. To start with, we uncovered carelessness at the level of data entry that undermined her claims. But her basic analyses and interpretations did not hold up either.

Fredrickson publishes exaggerated claims about dramatic benefits of simple positive psychology exercises. Fredrickson is very effective in blocking or muting the publication of criticism and getting on with hawking her wares. My colleagues and I have talked to others who similarly met considerable resistance from editors in getting detailed critiques and re-analyses published. Fredrickson is also aided by uncritical people like Jane Brody to promote her weak and inconsistent evidence as strong stuff. It sells a lot of positive psychology merchandise to needy and vulnerable people, like self-help books and workshops.

If it is taken seriously, Fredrickson’s research concerns health effects of behavioral intervention. Yet, her findings are presented in a way that does not readily allow their integration with the rest of health psychology literature. It would be difficult, for instance, to integrate Fredrickson’s randomized trials of loving-kindness meditation with other research because she makes it almost impossible to isolate effect sizes in a way that they could be integrated with other studies in a meta-analysis. Moreover, Fredrickson has multiply published contradictory claims from the sae data set without acknowledging the duplicate publication. [Please read on. I will document all of these claims before the post ends.]

The need of self-help gurus to generate support for their dramatic claims in lucrative positive psychology self-help products is never acknowledged as a conflict of interest.  It should be.

Just imagine, if someone had a contract based on a book prospectus promising that the claims of their last pop psychology book would be surpassed. Such books inevitably paint life too simply, with simple changes in behavior having profound and lasting effects unlike anything obtained in the randomized trials of clinical and health psychology. Readers ought to be informed that these pressures to meet demands of a lucrative book contract could generate a strong confirmation bias. Caveat emptor auditor, but how about at least informing readers and let them decide whether following the money influences their interpretation of what they read?

Psychology journals almost never require disclosures of conflicts of interest of this nature. I am campaigning to make that practice routine, nondisclosure of such financial benefits tantamount to scientific misconduct. I am calling for readers to take to social media when these disclosures do not appear in scientific journals where they should be featured prominently. And holding editors responsible for non-enforcement . I can cite Fredrickson’s work as a case in point, but there are many other examples, inside and outside of positive psychology.

Back to Jane Brody’s exaggerated claims for Fredrickson’s work.

I lived for half a century with a man who suffered from periodic bouts of depression, so I understand how challenging negativism can be. I wish I had known years ago about the work Barbara Fredrickson, a psychologist at the University of North Carolina, has done on fostering positive emotions, in particular her theory that accumulating “micro-moments of positivity,” like my daily interaction with children, can, over time, result in greater overall well-being.

The research that Dr. Fredrickson and others have done demonstrates that the extent to which we can generate positive emotions from even everyday activities can determine who flourishes and who doesn’t. More than a sudden bonanza of good fortune, repeated brief moments of positive feelings can provide a buffer against stress and depression and foster both physical and mental health, their studies show.

“Research…demonstrates” (?). Brody is feeding stupid-making pablum to readers. Fredrickson’s kind of research may produce evidence one way or the other, but it is too strong a claim, an outright illusion, to even begin suggesting that it “demonstrates” (proves) what follows in this passage.

Where, outside of tabloids and self-help products, do the immodest claims that one or a few poor quality studies “demonstrate”?

Negative feelings activate a region of the brain called the amygdala, which is involved in processing fear and anxiety and other emotions. Dr. Richard J. Davidson, a neuroscientist and founder of the Center for Healthy Minds at the University of Wisconsin — Madison, has shown that people in whom the amygdala recovers slowly from a threat are at greater risk for a variety of health problems than those in whom it recovers quickly.

Both he and Dr. Fredrickson and their colleagues have demonstrated that the brain is “plastic,” or capable of generating new cells and pathways, and it is possible to train the circuitry in the brain to promote more positive responses. That is, a person can learn to be more positive by practicing certain skills that foster positivity.

We are knee deep in neuro-nonsense. Try asking a serious neuroscientists about the claims that this duo have “demonstrated that the brain is ‘plastic,’ or that practicing certain positivity skills change the brain with the health benefits that they claim via Brody. Or that they are studying ‘amygdala recovery’ associated with reduced health risk.

For example, Dr. Fredrickson’s team found that six weeks of training in a form of meditation focused on compassion and kindness resulted in an increase in positive emotions and social connectedness and improved function of one of the main nerves that helps to control heart rate. The result is a more variable heart rate that, she said in an interview, is associated with objective health benefits like better control of blood glucose, less inflammation and faster recovery from a heart attack.

I will dissect this key claim about loving-kindness meditation and vagal tone/heart rate variability shortly.

Dr. Davidson’s team showed that as little as two weeks’ training in compassion and kindness meditation generated changes in brain circuitry linked to an increase in positive social behaviors like generosity.

We will save discussing Richard Davidson for another time. But really, Jane, just two weeks to better health? Where is the generosity center in brain circuitry? I dare you to ask a serious neuroscientist and embarrass yourself.

“The results suggest that taking time to learn the skills to self-generate positive emotions can help us become healthier, more social, more resilient versions of ourselves,” Dr. Fredrickson reported in the National Institutes of Health monthly newsletter in 2015.

In other words, Dr. Davidson said, “well-being can be considered a life skill. If you practice, you can actually get better at it.” By learning and regularly practicing skills that promote positive emotions, you can become a happier and healthier person. Thus, there is hope for people like my friend’s parents should they choose to take steps to develop and reinforce positivity.

In her newest book, “Love 2.0,” Dr. Fredrickson reports that “shared positivity — having two people caught up in the same emotion — may have even a greater impact on health than something positive experienced by oneself.” Consider watching a funny play or movie or TV show with a friend of similar tastes, or sharing good news, a joke or amusing incidents with others. Dr. Fredrickson also teaches “loving-kindness meditation” focused on directing good-hearted wishes to others. This can result in people “feeling more in tune with other people at the end of the day,” she said.

Brody ends with 8 things Fredrickson and others endorse to foster positive emotions. (Why only 8 recommendations, why not come up with 10 and make them commandments?) These include “Do good things for other people” and “Appreciate the world around you. Okay, but do Fredrickson and Davidson really show that engaging in these activities have immediate and dramatic effects on our health? I have examined their research and I doubt it. I think the larger problem, though, is the suggestion that physically ill people facing shortened lives risk being blamed for being bad people. They obviously did not do these 8 things or else they would be healthy.

If Brody were selling herbal supplements or coffee enemas, we would readily label the quackery. We should do the same for advice about psychological practices that are promised to transform lives.

Brody’s sloppy links to support her claims: Love 2.0

Journalists who talk of “science”  and respect their readers will provide links to their actual sources in the peer-reviewed scientific literature. That way, readers who are motivated can independently review the evidence. Especially in an outlet as prestigious as The New York Times.

Jane Brody is outright promiscuous in the links that she provides, often secondary or tertiary sources. The first link provide for her discussion of Fredrickson’s Love 2.0 is actually to a somewhat negative review of the book. https://www.scientificamerican.com/article/mind-reviews-love-how-emotion-afftects-everything-we-feel/

Fredrickson builds her case by expanding on research that shows how sharing a strong bond with another person alters our brain chemistry. She describes a study in which best friends’ brains nearly synchronize when exchanging stories, even to the point where the listener can anticipate what the storyteller will say next. Fredrickson takes the findings a step further, concluding that having positive feelings toward someone, even a stranger, can elicit similar neural bonding.

This leap, however, is not supported by the study and fails to bolster her argument. In fact, most of the evidence she uses to support her theory of love falls flat. She leans heavily on subjective reports of people who feel more connected with others after engaging in mental exercises such as meditation, rather than on more objective studies that measure brain activity associated with love.

I would go even further than the reviewer. Fredrickson builds her case by very selectively drawing on the literature, choosing only a few studies that fit.  Even then, the studies fit only with considerable exaggeration and distortion of their findings. She exaggerates the relevance and strength of her own findings. In other cases, she says things that have no basis in anyone’s research.

I came across Love 2.0: How Our Supreme Emotion Affects Everything We Feel, Think, Do, and Become (Unabridged) that sells for $17.95. The product description reads:

We all know love matters, but in this groundbreaking book positive emotions expert Barbara Fredrickson shows us how much. Even more than happiness and optimism, love holds the key to improving our mental and physical health as well as lengthening our lives. Using research from her own lab, Fredrickson redefines love not as a stable behemoth, but as micro-moments of connection between people – even strangers. She demonstrates that our capacity for experiencing love can be measured and strengthened in ways that improve our health and longevity. Finally, she introduces us to informal and formal practices to unlock love in our lives, generate compassion, and even self-soothe. Rare in its scope and ambitious in its message, Love 2.0 will reinvent how you look at and experience our most powerful emotion.

There is a mishmash of language games going on here. Fredrickson’s redefinition of love is not based on her research. Her claim that love is ‘really’ micro-moments of connection between people  – even strangers is a weird re-definition. Attempt to read her book, if you have time to waste.

You will quickly see that much of what she says makes no sense in long-term relationships which is solid but beyond the honeymoon stage. Ask partners in long tem relationships and they will undoubtedly lack lots of such “micro-moments of connection”. I doubt that is adaptive for people seeking to build long term relationships to have the yardstick that if lots of such micro-moments don’t keep coming all the time, the relationship is in trouble. But it is Fredrickson who is selling the strong claims and the burden is on her to produce the evidence.

If you try to take Fredrickson’s work seriously, you wind up seeing she has a rather superficial view of a close relationships and can’t seem to distinguish them from what goes on between strangers in drunken one-night stands. But that is supposed to be revolutionary science.

We should not confuse much of what Fredrickson emphatically states with testable hypotheses. Many statements sound more like marketing slogans – what Joachim Kruger and his student Thomas Mairunteregger identify as the McDonaldalization of positive psychology. Like a Big Mac, Fredrickson’s Love 2.0 requires a lot of imagination to live up to its advertisement.

Fredrickson’s love the supreme emotion vs ‘Trane’s Love Supreme

Where Fredrickson’s selling of love as the supreme emotion is not simply an advertising slogan, it is a bad summary of the research on love and health. John Coltrane makes no empirical claim about love being supreme. But listening to him is an effective self-soothing after taking Love 2.0 seriously and trying to figure it out.  Simply enjoy and don’t worry about what it does for your positivity ratio or micro-moments, shared or alone.

Fredrickson’s study of loving-kindness meditation

Jane Brody, like Fredrickson herself depends heavily on a study of loving kindness meditation in proclaiming the wondrous, transformative health benefits of being loving and kind. After obtaining Fredrickson’s data set and reanalyzing it, my colleagues – James Heathers, Nick Brown, and Harrison Friedman – and I arrived at a very different interpretation of her study. As we first encountered it, the study was:

Kok, B. E., Coffey, K. A., Cohn, M. A., Catalino, L. I., Vacharkulksemsuk, T., Algoe, S. B., . . . Fredrickson, B. L. (2013). How positive emotions build physical health: Perceived positive social connections account for the upward spiral between positive emotions and vagal tone. Psychological Science, 24, 1123-1132.

Consolidated standards for reporting randomized trials (CONSORT) are widely accepted for at least two reasons. First, clinical trials should be clearly identified as such in order to ensure that the results are a recognized and available in systematic searches to be integrated with other studies. CONSORT requires that RCTs be clearly identified in the titles and abstracts. Once RCTs are labeled as such, the CONSORT checklist becomes a handy tallying of what needs to be reported.

It is only in supplementary material that the Kok and Fredrickson paper is identify as a clinical trial. Only in that supplement is the primary outcome is identified, even in passing. No means are reported anywhere in the paper or supplement. Results are presented in terms of what Kok and Fredrickson term “a variant of a mediational, parallel process, latent-curve model.” Basic statistics needed for its evaluation are left to readers’ imagination. Figure 1 in the article depicts the awe-inspiring parallel-process mediational model that guided the analyses. We showed the figure to a number of statistical experts including Andrew Gelman. While some elements were readily recognizable, the overall figure was not, especially the mysterious large dot (a causal pathway roundabout?) near the top.

So, not only might study not be detected as an RCT, there isn’t relevant information that could be used for calculating effect sizes.

Furthermore, if studies are labeled as RCTs, we immediately seek protocols published ahead of time that specify the basic elements of design and analyses and primary outcomes. At Psychological Science, studies with protocols are unusual enough to get the authors awarded a badge. In the clinical and health psychology literature, protocols are increasingly common, like flushing a toilet after using a public restroom. No one runs up and thanks you, “Thank you for flushing/publishing your protocol.”

If Fredrickson and her colleagues are going to be using the study to make claims about the health benefits of loving kindness meditation, they have a responsibility to adhere to CONSORT and to publish their protocol. This is particularly the case because this research was federally funded and results need to be transparently reported for use by a full range of stakeholders who paid for the research.

We identified a number of other problems and submitted a manuscript based on a reanalysis of the data. Our manuscript was promptly rejected by Psychological Science. The associate editor . Batja Mesquita noted that two of my co-authors, Nick Brown and Harris Friedman had co-authored a paper resulting in a partial retraction of Fredrickson’s, positivity ratio paper.

Brown NJ, Sokal AD, Friedman HL. The Complex Dynamics of Wishful Thinking: The Critical Positivity Ratio American Psychologist. 2013 Jul 15.

I won’t go into the details, except to say that Nick and Harris along with Alan Sokal unambiguously established that Fredrickson’s positivity ratio of 2.9013 positive to negative experiences was a fake fact. Fredrickson had been promoting the number  as an “evidence-based guideline” of a ratio acting as a “tipping point beyond which the full impact of positive emotions becomes unleashed.” Once Brown and his co-authors overcame strong resistance to getting their critique published, their paper garnered a lot of attention in social and conventional media. There is a hilariously funny account available at Nick Brown Smelled Bull.

Batja Mesquita argued that that the previously published critique discouraged her from accepting our manuscript. To do, she would be participating in “a witch hunt” and

 The combatant tone of the letter of appeal does not re-assure me that a revised commentary would be useful.

Welcome to one-sided tone policing. We appealed her decision, but Editor Eric Eich indicated, there was no appeal process at Psychological Science, contrary to the requirements of the Committee on Publication Ethics, COPE.

Eich relented after I shared an email to my coauthors in which I threatened to take the whole issue into social media where there would be no peer-review in the traditional outdated sense of the term. Numerous revisions of the manuscript were submitted, some of them in response to reviews by Fredrickson  and Kok who did not want a paper published. A year passed occurred before our paper was accepted and appeared on the website of the journal. You can read our paper here. I think you can see that fatal problems are obvious.

Heathers JA, Brown NJ, Coyne JC, Friedman HL. The elusory upward spiral a reanalysis of Kok et al.(2013). Psychological Science. 2015 May 29:0956797615572908.

In addition to the original paper not adhering to CONSORT, we noted

  1. There was no effect of whether participants were assigned to the loving kindness mediation vs. no-treatment control group on the key physiological variable, cardiac vagal tone. This is a thoroughly disguised null trial.
  2. Kok and Frederickson claimed that there was an effect of meditation on cardiac vagal tone, but any appearance of an effect was due to reduced vagal tone in the control group, which cannot readily be explained.
  3. Kok and Frederickson essentially interpreted changes in cardiac vagal tone as a surrogate outcome for more general changes in physical health. However, other researchers have noted that observed changes in cardiac vagal tone are not consistently related to changes in other health variables and are susceptible to variations in experimental conditions that have nothing to do with health.
  4. No attention was given to whether participants assigned to the loving kindness meditation actually practiced it with any frequency or fidelity. The article nonetheless reported that such data had been collected.

Point 2 is worth elaborating. Participants in the control condition received no intervention. Their assessment of cardiac vagal tone/heart rate variability was essentially a test/retest reliability test of what should have been a stable physiological characteristic. Yet, participants assigned to this no-treatment condition showed as much change as the participants who were assigned to meditation, but in the opposite direction. Kok and Fredrickson ignored this and attributed all differences to meditation. Houston, we have a problem, a big one, with unreliability of measurement in this study.

We could not squeeze all of our critique into our word limit, but James Heathers, who is an expert on cardiac vagal tone/heart rate variability elaborated elsewhere.

  • The study was underpowered from the outset, but sample size decreased from 65 to 52 to missing data.
  • Cardiac vagal tone is unreliable except in the context of carefully control of the conditions in which measurements are obtained, multiple measurements on each participant, and a much larger sample size. None of these conditions were met.
  • There were numerous anomalies in the data, including some participants included without baseline data, improbable baseline or follow up scores, and improbable changes. These alone would invalidate the results.
  • Despite not reporting  basic statistics, the article was full of graphs, impressive to the unimformed, but useless to readers attempting to make sense of what was done and with what results.

We later learned that the same data had been used for another published paper. There was no cross-citation and the duplicate publication was difficult to detect.

Kok, B. E., & Fredrickson, B. L. (2010). Upward spirals of the heart: Autonomic flexibility, as indexed by vagal tone, reciprocally and prospectively predicts positive emotions and social connectedness. Biological Psychology, 85, 432–436. doi:10.1016/j.biopsycho.2010.09.005

Pity the poor systematic reviewer and meta analyst trying to make sense of this RCT and integrate it with the rest of the literature concerning loving-kindness meditation.

This was not our only experience obtained data for a paper crucial to Fredrickson’s claims and having difficulty publishing  our findings. We obtained data for claims that she and her colleagues had solved the classical philosophical problem of whether we should pursue pleasure or meaning in our lives. Pursuing pleasure, they argue, will adversely affect genomic transcription.

We found we could redo extremely complicated analyses and replicate original findings but there were errors in the the original entering data that entirely shifted the results when corrected. Furthermore, we could replicate the original findings when we substituted data from a random number generator for the data collected from study participants. After similar struggles to what we experienced with Psychological Science, we succeeded in getting our critique published.

The original paper

Fredrickson BL, Grewen KM, Coffey KA, Algoe SB, Firestine AM, Arevalo JM, Ma J, Cole SW. A functional genomic perspective on human well-being. Proceedings of the National Academy of Sciences. 2013 Aug 13;110(33):13684-9.

Our critique

Brown NJ, MacDonald DA, Samanta MP, Friedman HL, Coyne JC. A critical reanalysis of the relationship between genomics and well-being. Proceedings of the National Academy of Sciences. 2014 Sep 2;111(35):12705-9.

See also:

Nickerson CA. No Evidence for Differential Relations of Hedonic Well-Being and Eudaimonic Well-Being to Gene Expression: A Comment on Statistical Problems in Fredrickson et al.(2013). Collabra: Psychology. 2017 Apr 11;3(1).

A partial account of the reanalysis is available in:

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure. PLOS Blogs Mind the Brain

Wrapping it up

Strong claims about health effects require strong evidence.

  • Evidence produced in randomized trials need to be reported according to established conventions like CONSORT and clear labeling of duplicate publications.
  • When research is conducted with public funds, these responsibilities are increased.

I have often identified health claims in high profile media like The New York Times and The Guardian. My MO has been to trace the claims back to the original sources in peer reviewed publications, and evaluate both the media reports and the quality of the primary sources.

I hope that I am arming citizen scientists for engaging in these activities independent of me and even to arrive at contradictory appraisals to what I offer.

  • I don’t think I can expect to get many people to ask for data and perform independent analyses and certainly not to overcome the barriers my colleagues and I have met in trying to publish our results. I share my account of some of those frustrations as a warning.
  • I still think I can offer some take away messages to citizen scientists interested in getting better quality, evidence-based information on the internet.
  • Assume most of the claims readers encounter about psychological states and behavior being simply changed and profoundly influencing physical health are false or exaggerated. When in doubt, disregard the claims and certainly don’t retweet or “like” them.
  • Ignore journalists who do not provide adequate links for their claims.
  • Learn to identify generally reliable sources and take journalists off the list when they have made extravagant or undocumented claims.
  • Appreciate the financial gains to be made by scientists who feed journalists false or exaggerated claims.

Advice to citizen scientists who are cultivating more advanced skills:

Some key studies that Brody invokes in support of her claims being science-based are poorly conducted and reported clinical trials that are not labeled as such. This is quite common in positive psychology, but you need to cultivate skills to even detect that is what is going on. Even prestigious psychology journals are often lax in labeling studies as RCTs and in enforcing reporting standards. Authors’ conflicts of interest are ignored.

It is up to you to

  • Identify when the claims you are being fed should have been evaluated in a clinical trial.
  • Be skeptical when the original research is not clearly identified as clinical trial but nonetheless compares participants who received the intervention and those who did not.
  • Be skeptical when CONSORT is not followed and there is no published protocol.
  • Be skeptical of papers published in journals that do not enforce these requirements.

Disclaimer

I think I have provided enough details for readers to decide for themselves whether I am unduly influenced by my experiences with Barbara Fredrickson and her data. She and her colleagues have differing accounts of her research and of the events I have described in this blog.

As a disclosure, I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

Will lessons in happiness solve the crisis in child mental health care?

bread and circussRome gave citizens bread and circuses. Is London giving citizens worthless randomized trials of inert interventions to solve the crisis of child mental health care without spending substantially more funds?

The UK  Department for Education (DfE) issued an Expression of Interest [ What’s that? ] for a large randomized trial comparing three preventive mental health interventions for promoting well-being among primary school children.

The three trialed interventions are:

Mindfulness

Mindfulness is the ability to direct attention to experience as it unfolds. It enables those who have learned it to be more able to be with their present experience, and respond more skilfully to whatever is happening. There is some evidence that it may be helpful in reducing anxiety, depressive symptoms and stress and improving wellbeing, attention, focus and cognitive skills.7 We know that mindfulness techniques are currently used by schools, with a range of existing programmes and approaches, but there is limited understanding of whether less intensive approaches work effectively in a school setting. The successful bidder will develop and trial a light touch (10-15 minute) intervention, comprising of simple exercises repeated at regular intervals (e.g. weekly or more than once a week) which provides teachers with materials to guide mindfulness practice e.g. audio tracks or guided exercises.

safety-net-PB-feeling-good-feeling-safe-resource-pack-a42Protective behaviours

Protective behaviours is a practical approach to personal safety, teaching children and young people to recognise early warning signs of not feeling safe and how to recognise where they can get help. It seeks to provide life skills, develop support structures and instil positive help seeking behaviours which can help keep children safe from a range of risks that may impact wellbeing and increase the risk of mental health problems. It is a well-established approach, with indications of ongoing use in schools, however evidence of effectiveness is limited. Some evidence suggests that it is beneficial for those at risk of mental health difficulties as well as the wider population, and it is relatively easy to integrate into the school environment. The successful bidder will develop and trial a light touch protective behaviours intervention which can easily be included in the school day, can be delivered by teachers/school staff to a whole class, with a small amount of training, and which builds on existing programmes and materials.

hands Relaxation and breathing-based techniques

Relaxation and breathing-based techniques and training for schools originated as targeted interventions to assist pupils with anxiety. However, there is emerging use of these approaches universally in primary schools, particularly in the form of short breathing exercises, with some reported increases in concentration, resilience, self-perception positivity and connection with others. There is currently limited evidence of wider use in schools or effectiveness, but there is a theoretical unpinning linking relaxation with improved wellbeing and engagement with learning. The successful bidder will develop and trial a light touch intervention that offers short regular exercises, delivered by teachers in the classroom with minimal training and materials, and which build on existing relation and breathing-based techniques.

Note that the requirement is that all three interventions be delivered in low intensity “light touch” versions, i.e, “easily be included in the school day, can be delivered by teachers/school staff to a whole class, with a small amount of training, and which builds on existing programmes and materials.”

The planned trial is ambitious and large-scale, involving:

  • Recruitment of 100 volunteer primary schools…representing a range of different school types, locations and demographics.
  • Even randomization of schools into one of three arms corresponding to the three interventions, with 33 schools in each arm.
  • Classes in each school evenly randomized to intervention or control group.
  • A small amount of funding would help cover costs of participation and to incentivise full engagement with the trial.:

Final selection of primary and secondary outcomes are left to applicants, but expected to include short measures of

  • Subjective Wellbeing
  • Mental health/psychological wellbeing
  • Engagement with education

The larger context

The expression of interest was a follow-up to “The Shared Society”, UK Prime Minister Theresa May’s recent speech at the Charity Commission. In that speech the Prime Minister identified “the burning injustice of mental illness” and stated:

“This is an historic opportunity to right a wrong, and give people deserving of compassion and support the attention and treatment they deserve. And for all of us to change the way we view mental illness so that striving to improve mental wellbeing is seen as just as natural, positive and good as striving to improve our physical wellbeing.”

However, the Independent noted:

The speech however barely announces any extra cash to improve underfunded services – with just an extra £15m expected to be pledged for creating “places of safety”. This amounts to about £23,000 per parliamentary constituency.

Research conducted by the Education Policy Institute Independent Commission on Children and Young People’s Mental Health in November found that a quarter of young people seeking mental health care are turned away by specialist services because of a lack of resources. Waiting times for treatment in many areas are also incredibly long.

The House of Commons Public Accounts Committee said in September that it was “sceptical” about the Government’s attempt to improve mental health services without a significant amount of extra cash.

Praise for the speech

Nonetheless, the Independent reported praise to the Prime Minister’s speech:

Paul Farmer, chief executive of Mind, the mental health charity, said it was good that the Prime Minister was talking about mental health.

“It’s important to see the Prime Minister talking about mental health and shows how far we have come in bringing the experiences of people with mental health problems up the political agenda,” he said.

“Mental health should be at the heart of government, and at the heart of society and communities – it’s been on the periphery for far too long.”

He said he welcomed the focus on prevention in schools and workplaces and support for people in crisis.

Sir Ian Cheshire, chairman of the Heads Together Campaign described the Prime Minister’s announcements as “extremely important and very welcome”.

“They show both a willingness to tackle the broad challenge of mental health support and a practical grasp of how to start making a real difference,” he said.

As I noted in another blog post, the Heads Together Campaign is an initiative of the Royals.

Paul Farmer, chief executive of Mind, the mental health charity, said it was good that the Prime Minister was talking about mental health.

“It’s important to see the Prime Minister talking about mental health and shows how far we have come in bringing the experiences of people with mental health problems up the political agenda,” he said.

“Mental health should be at the heart of government, and at the heart of society and communities – it’s been on the periphery for far too long.”

He said he welcomed the focus on prevention in schools and workplaces and support for people in crisis.

Sir Ian Cheshire, chairman of the Heads Together Campaign described the Prime Minister’s announcements as “extremely important and very welcome”.

“They show both a willingness to tackle the broad challenge of mental health support and a practical grasp of how to start making a real difference,” he said.

Praise for the interventions that were selected for evaluation

 An article in The Guardian reported praise for the interventions that were selected for evaluation:

Laura Henry, an early years consultant and Ofsted inspector, said the trials could save the government billions in social care and housing costs down the line. “I think it’s an excellent idea,” she said. “Over the last decade there has been a massive push to academia, results and school league tables and children’s personal social development has been left behind.

“A holistic approach is needed and children should be able to self-regulate their own behaviour.”

Henry, a former teacher whose elder son is on the autistic spectrum, said specially trained teachers should help with grieving techniques and that any questions about bullying and pupils’ friends needed to be sensitive.

“It’s absolutely the best way to spend DfE money,” she said. “It will save x amount of money in social care when they are adults.”

And:

The mindfulness trial was welcomed by the educational pioneer Sir Anthony Seldon, who was pooh-poohed when he brought in such classes while master of the private school Wellington College. He said: “It was negligent of government [in the past] to have this unintelligent response to wellbeing, saying this was la-la land and psychobabble. We have a crisis in mental health which is reducible now that government is beginning to take seriously the right interventions to look after the wellbeing of young people.”

Professor Alan Smithers, of the University of Buckingham, where Seldon is vice-chancellor, was more sceptical. He said: “It is good the government is having a trial and not rushing in. There are so many demands on resources for schools that it is important we know that mindfulness lessons work. “There are many calls on school funding: the need for teachers, the squeeze on budgets and school buildings.”

Lord Layard, Britain’s “happiness guru”, and Lord O’Donnell, the former cabinet secretary, will meet the government this month to discuss how to enable schools to measure children’s wellbeing as a guide to performance. They want schools to give similar weight to children’s happiness as to their academic results. Under their proposals, schools would be measured on whether pupils’ happiness improved or declined. Children would fill in a questionnaire asking, among other things, whether “I have at least one good friend”; “other people generally like me”; “other people pick on me or bully me”; and “I would rather be alone than with others”.

Pupils’ scores would be confidential, but could be used to alert the school to serious difficulties. NHS workers would provide psychological treatment to children in schools at short notice before they became so ill that they qualified for admission to mental health services.

The interventions are unlikely to improve mental health comes, even self-reported well-being, and may prove harmful.

I dare you to be positive Tssk! The UK has some talented mental health services researchers. Why aren’t we hearing their collective voices of outrage about a useless trial squandering millions of pounds, potentially harming schools and students, and mainly serving to distract from the government’s lack of action to correct the underfunding of both mental health care for children and the school systems?

Instead, we have some self-proclaimed authorities waxing enthusiastically. As a group, they are lacking in mental health training and serve to benefit immensely from these initiatives. Journalists should get them out of the picture or at least better reveal the conflicts of interest and balance their commentary with comments that are more evidence driven.

Even when delivered with full intensity, the interventions lack evidence of effectiveness needed to justify a large-scale trial. Yet The Expression of Interest specifies that they be delivered in a lite form – only a few minutes a week . This is Unlikely to improve the measured outcomes or impact on effective use of already scarce child mental health services with an acceptably long wait times.

Funding the trial is a poor substitute for better funding for mental health services and schools. Yet politicians and policymakers can point to them and argue that the UK is conducting the research desperately needed to address these issues and so we need to be patient.

I’m not sure we should consider these trials as serious attempts to contribute to the mental health services literature. Selection of the particular interventions to be trialed seems to be political and tied to what is already being done in some schools. Their existing implementation likely reflects vested interests that undoubtedly influenced the selection for trialing and hope to benefit financially from the opportunities it will provide. I don’t think that the mere fact that interventions are already in use justifies an ambitious and expensive evaluation of them unless there is further evidence that they are likely to be effective.

The Expression of Interest cites one review of mindfulness studies. I looked it up and it is unusually candid in indicating the limitations in quality and quantity of relevant studies that mindfulness training can affect such outcomes. The review stands in sharp contrast to the unbalanced and prematurely enthusiastic Mindful Nation UK report.

We should have serious concerns about the lack of evidence that Protective Behaviours could have any effect on the outcomes selected to evaluate the programs. Conceivably, it could do some harm to at risk children. Getting children to disclose bullying and frank abuse at home and school can only aggravate these problem and invite retaliation if effective intervention is not available to address these problems. I would be curious to know the extent to which primary school teachers are already aware of such problems but lack the tools or time to address them.

Basically, Protective Behaviours is a kind of screening program facilitated by encouragement to disclosed. Such programs can prove ineffective if they do not occur in a system prepared to quickly offer affective interventions. Such screening programs can compete for scarce resources that would otherwise be used to deal with already known problems requiring more intensive and focused intervention.

There is the precedent of GPs screening women for domestic abuse. Routine screening seemed to address a documented ignoring of the problem. However, the World Health Organization (WHO) withdrew the recommendation because of lack of any evidence that improved health outcomes for women and summoned consistent evidence that at least some women were harmed by ineffectual interventions that heighten the abuse that they were receiving.

The breathing and relaxation exercises might conceivably be a nonspecific control condition, except that all of the inventions are untried, lacking in evidence, and delivered in such a low intensity that they themselves are best nonspecific control conditions. I think it’s inconceivable that meaningful differences will be demonstrated among the three interventions. At best, the trials can conclude that they are equally effective or not effective at all. The question whether these interventions are better than other active interventions or other deployments of scarce resources left unaddressed.

eBook_PositivePsychology_345x550I will soon be offering e-books providing skeptical looks at mindfulness and positive psychology, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

Danish RCT of cognitive behavior therapy for whatever ails your physician about you

I was asked by a Danish journalist to examine a randomized controlled trial (RCT) of cognitive behavior therapy (CBT) for functional somatic symptoms. I had not previously given the study a close look.

I was dismayed by how highly problematic the study was in so many ways.

I doubted that the results of the study showed any benefits to the patients or have any relevance to healthcare.

I then searched and found the website for the senior author’s clinical offerings.  I suspected that the study was a mere experimercial or marketing effort of the services he offered.

Overall, I think what I found hiding in plain sight has broader relevance to scrutinizing other studies claiming to evaluate the efficacy of CBT for what are primarily physical illnesses, not psychiatric disorders. Look at the other RCTs. I am confident you will find similar problems. But then there is the bigger picture…

[A controversial assessment ahead? You can stop here and read the full text of the RCT  of the study and its trial registration before continuing with my analysis.]

Schröder A, Rehfeld E, Ørnbøl E, Sharpe M, Licht RW, Fink P. Cognitive–behavioural group treatment for a range of functional somatic syndromes: randomised trial. The British Journal of Psychiatry. 2012 Apr 13:bjp-p.

A summary overview of what I found:

 The RCT:

  • Was unblinded to patients, interventionists, and to the physicians continuing to provide routine care.
  • Had a grossly unmatched, inadequate control/comparison group that leads to any benefit from nonspecific (placebo) factors in the trial counting toward the estimated efficacy of the intervention.
  • Relied on subjective self-report measures for primary outcomes.
  • With such a familiar trio of design flaws, even an inert homeopathic treatment would be found effective, if it were provided with the same positive expectations and support as the CBT in this RCT. [This may seem a flippant comment that reflects on my credibility, not the study. But please keep reading to my detailed analysis where I back it up.]
  • The study showed an inexplicably high rate of deterioration in both treatment and control group. Apparent improvement in the treatment group might only reflect less deterioration than in the control group.
  • The study is focused on unvalidated psychiatric diagnoses being applied to patients with multiple somatic complaints, some of whom may not yet have a medical diagnosis, but most clearly had confirmed physical illnesses.

But wait, there is more!

  • It’s not CBT that was evaluated, but a complex multicomponent intervention in which what was called CBT is embedded in a way that its contribution cannot be evaluated.

The “CBT” did not map well on international understandings of the assumptions and delivery of CBT. The complex intervention included weeks of indoctrination of the patient with an understanding of their physical problems that incorporated simplistic pseudoscience before any CBT was delivered. We focused on goals imposed by a psychiatrist that didn’t necessarily fit with patients’ sense of their most pressing problems and the solutions.

OMGAnd the kicker.

  • The authors switched primary outcomes – reconfiguring the scoring of their subjective self-report measures years into the trial, based on a peeking at the results with the original scoring.

Investigators have a website which is marketing services. Rather than a quality contribution to the literature, this study can be seen as an experimercial doomed to bad science and questionable results from before the first patient was enrolled. An undeclared conflict of interest in play? There is another serious undeclared conflict of interest for one of the authors.

For the uninformed and gullible, the study handsomely succeeds as an advertisement for the investigators’ services to professionals and patients.

Personally, I would be indignant if a primary care physician tried to refer me or friend or family member to this trial. In the absence of overwhelming evidence to the contrary, I assume that people around me who complain of physical symptoms have legitimate physical concerns. If they do not yet have a confirmed diagnosis, it serves little purpose to stop the probing and refer them to psychiatrists. This trial operates with an anachronistic Victorian definition of psychosomatic condition.

something is rotten in the state of DenmarkBut why should we care about a patently badly conducted trial with switched outcomes? Is it only a matter of something being rotten in the state of Denmark? Aside from the general impact on the existing literature concerning CBT for somatic conditions, results of this trial  were entered into a Cochrane review of nonpharmacological interventions for medically unexplained symptoms. I previously complained about one of the authors of this RCT also being listed as an author on another Cochrane review protocol. Prior to that, I complained to Cochrane  about this author’s larger research group influencing a decision to include switched outcomes in another Cochrane review.  A lot of us rightfully depend heavily on the verdict of Cochrane reviews for deciding best evidence. That trust is being put into jeopardy.

Detailed analysis

1.This is an unblinded trial, a particularly weak methodology for examining whether a treatment works.

The letter that alerted physicians to the trial had essentially encouraged them to refer patients they were having difficulty managing.

‘Patients with a long-term illness course due to medically unexplained or functional somatic symptoms who may have received diagnoses like fibromyalgia, chronic fatigue syndrome, whiplash associated disorder, or somatoform disorder.

Patients and the physicians who referred them subsequently got feedback about to which group patients were assigned, either routine care or what was labeled as CBT. This information could have had a strong influence on the outcomes that were reported, particularly for the patients left in routine care.

Patients’ learning that they did not get assigned to the intervention group was undoubtedly disappointing and demoralizing. The information probably did nothing to improve the positive expectations and support available to patients in routine. This could have had a nocebo effect. The feedback may have contributed to the otherwise  inexplicably high rates of subjective deterioration [to be noted below] reported by patients left in the routine care condition. In contrast, the authors’ disclosure that patients had been assigned to the intervention group undoubtedly boosted the morale of both patients and physicians and also increased the gratitude of the patients. This would be reflected in the responses to the subjective outcome measures.

The gold standard alternative to an unblinded trial is a double-blind, placebo-controlled trial in which neither providers, nor patients, nor even the assessors rating outcomes know to which group particular patients were assigned. Of course, this is difficult to achieve in a psychotherapy trial. Yet a fair alternative is a psychotherapy trial in which patients and those who refer them are blind to the nature of the different treatments, and in which an effort is made to communicate credible positive expectations about the comparison control group.

Conclusion: A lack of blinding seriously biases this study toward finding a positive effect for the intervention, regardless of whether the intervention has any active, effective component.

2. A claim that this is a randomized controlled trial depends on the adequacy of the control offered by the comparison group, enhanced routine care. Just what is being controlled by the comparison? In evaluating a psychological treatment, it’s important that the comparison/control group offers the same frequency and intensity of contact, positive expectations, attention and support. This trial decidedly did not.

 There were large differences between the intervention and control conditions in the amount of contact time. Patients assigned to the cognitive therapy condition received an additional 9 group sessions with a psychiatrist of 3.5 hour duration, plus the option of even more consultations. The over 30 hours of contact time with a psychiatrist should be very attractive to patients who wanted it and could not otherwise obtain it. For some, it undoubtedly represented an opportunity to have someone to listen to their complaints of pain and suffering in a way that had not previously happened. This is also more than the intensity of psychotherapy typically offered in clinical trials, which is closer to 10 to 15, 50-minute sessions.

The intervention group thus received substantially more support and contact time, which was delivered with more positive expectations. This wealth of nonspecific factors favoring the intervention group compromises an effort to disentangle the specific effects of any active ingredient in the CBT intervention package. From what has been said so far, the trials’ providing a fair and generalizable evaluation of the CBT intervention is nigh impossible.

Conclusion: This is a methodologically poor choice of control groups with the dice loaded to obtain a positive effect for CBT.

3.The primary outcomes, both as originally scored and after switching, are subjective self-report measures that are highly responsive to nonspecific treatments, alleviation of mild depressive symptoms and demoralization. They are not consistently related to objective changes in functioning. They are particularly problematic when used as outcome measures in the context of an unblinded clinical trial within an inadequate control group.

There have been consistent demonstrations that assigning patients to inert treatments and measuring the outcomes with subjective measures may register improvements that will not correspond to what would be found with objective measures.

For instance, a provocative New England Journal of Medicine study showed that sham acupuncture as effective as an established medical treatment – an albuterol inhaler – for asthma when judged with subjective measures, but there was a large superiority for the established medical treatment obtained with objective measures.

There have been a number of demonstrations that treatments such as the one offered in the present study to patient populations similar to those in the study produce changes in subjective self-report that are not reflected in objective measures.

Much of the improvement in primary outcomes occurred before the first assessment after baseline and not very much afterwards. The early response is consistent with a placebo response.

The study actually included one largely unnoticed objective measure, utilization of routine care. Presumably if the CBT was effective as claimed, it would have produced a significant reduction in healthcare utilization. After all, isn’t the point of this trial to demonstrate that CBT can reduce health-care utilization associated with (as yet) medically unexplained symptoms? Curiously, utilization of routine care did not differ between groups.

The combination of the choice of subjective outcomes, unblinded nature of the trial, and poorly chosen control group bring together features that are highly likely to produce the appearance of positive effects, without any substantial benefit to the functioning and well-being of the patients.

Conclusion: Evidence for the efficacy of a CBT package for somatic complaints that depends solely on subjective self-report measures is unreliable, and unlikely to generalize to more objective measures of meaningful impact on patients’ lives.

4. We need to take into account the inexplicably high rates of deterioration in both groups, but particularly in the control group receiving enhanced care.

There was an unexplained deterioration of 50% deterioration in the control group and 25% in the intervention group. Rates of deterioration are only given a one-sentence mention in the article, but deserve much more attention. These rates of deterioration need to qualify and dampen any generalizable clinical interpretation of other claims about outcomes attributed to the CBT. We need to keep in mind that the clinical trials cannot determine how effective treatments are, but only how different a treatment is from a control group. So, an effect claimed for a treatment and control can largely or entirely come from deterioration in the control group, not what the treatment offers. The claim of success for CBT probably largely depends on the deterioration in the control group.

One interpretation of this trial is that spending an extraordinary 30 hours with a psychiatrist leads to only half the deterioration experienceddoing nothing more than routine care. But this begs the question of why are half the patients left in routine care deteriorating in such a large proportion. What possibly could be going on?

Conclusion: Unexplained deterioration in the control group may explain apparent effects of the treatment, but both groups are doing badly.

5. The diagnosis of “functional somatic symptoms” or, as the authors prefer – Severe Bodily Distress Syndromes – is considered by the authors to be a psychiatric diagnosis. It is not accepted as a valid diagnosis internationally. Its validation is limited to the work done almost entirely within the author group, which is explicitly labeled as “preliminary.” This biased sample of patients is quite heterogeneous, beyond their physicians having difficulty managing them. They have a full range of subjective complaints and documented physical conditions. Many of these patients would not be considered primarily having a psychiatric disorder internationally and certainly within the US, except where they had major depression or an anxiety disorder. Such psychiatric disorders were not an exclusion criteria.

Once sent on the pathway to a psychiatric diagnosis by their physicians’ making a referral to the study, patients had to meet additional criteria:

To be eligible for participation individuals had to have a chronic (i.e. of at least 2 years duration) bodily distress syndrome of the severe multi-organ type, which requires functional somatic symptoms from at least three of four bodily systems, and moderate to severe impairment.in daily living.

The condition identified in the title of the article is not validated as a psychiatric diagnosis. Two papers to which the authors refer to their  own studies ( 1 , 2 ) from a single sample. The title of one of these papers makes a rather immodest claim:

Fink P, Schröder A. One single diagnosis, bodily distress syndrome, succeeded to capture 10 diagnostic categories of functional somatic syndromes and somatoform disorders. Journal of Psychosomatic Research. 2010 May 31;68(5):415-26.

In neither the two papers nor the present RCT is there sufficient effort to rule out a physical basis for the complaints qualifying these patients for a psychiatric diagnosis. There is also a lack of follow-up to see if physical diagnoses were later applied.

Citation patterns of these papers strongly suggest  the authors are not having got much traction internationally. The criteria of symptoms from three out of four bodily systems is arbitrary and unvalidated. Many patients with known physical conditions would meet these criteria without any psychiatric diagnosis being warranted.

The authors relate what is their essentially homegrown diagnosis to functional somatic syndromes, diagnoses which are themselves subject to serious criticism. See for instance the work of Allen Frances M.D., who had been the chair of the American Psychiatric Association ‘s Diagnostic and Statistical Manual (DSM-IV) Task Force. He became a harsh critic of its shortcomings and the failures of APA to correct coverage of functional somatic syndromes in the next DSM.

Mislabeling Medical Illness As Mental Disorder

Unless DSM-5 changes these incredibly over inclusive criteria, it will greatly increase the rates of diagnosis of mental disorders in the medically ill – whether they have established diseases (like diabetes, coronary disease or cancer) or have unexplained medical conditions that so far have presented with somatic symptoms of unclear etiology.

And:

The diagnosis of mental disorder will be based solely on the clinician’s subjective and fallible judgment that the patient’s life has become ‘subsumed’ with health concerns and preoccupations, or that the response to distressing somatic symptoms is ‘excessive’ or ‘disproportionate,’ or that the coping strategies to deal with the symptom are ‘maladaptive’.

And:

 “These are inherently unreliable and untrustworthy judgments that will open the floodgates to the overdiagnosis of mental disorder and promote the missed diagnosis of medical disorder.

The DSM 5 Task force refused to adopt changes proposed by Dr. Frances.

Bad News: DSM 5 Refuses to Correct Somatic Symptom Disorder

Leading Frances to apologize to patients:

My heart goes out to all those who will be mislabeled with this misbegotten diagnosis. And I regret and apologize for my failure to be more effective.

The chair of The DSM Somatic Symptom Disorder work group has delivered a scathing critique of the very concept of medically unexplained symptoms.

Dimsdale JE. Medically unexplained symptoms: a treacherous foundation for somatoform disorders?. Psychiatric Clinics of North America. 2011 Sep 30;34(3):511-3.

Dimsdale noted that applying this psychiatric diagnosis sidesteps the quality of medical examination that led up to it. Furthermore:

Many illnesses present initially with nonspecific signs such as fatigue, long before the disease progresses to the point where laboratory and physical findings can establish a diagnosis.

And such diagnoses may encompass far too varied a group of patients for any intervention to make sense:

One needs to acknowledge that diseases are very heterogeneous. That heterogeneity may account for the variance in response to intervention. Histologically, similar tumors have different surface receptors, which affect response to chemotherapy. Particularly in chronic disease presentations such as irritable bowel syndrome or chronic fatigue syndrome, the heterogeneity of the illness makes it perilous to diagnose all such patients as having MUS and an underlying somatoform disorder.

I tried making sense of a table of the additional diagnoses that the patients in this study had been given. A considerable proportion of patients had physical conditions that would not be considered psychiatric problems in the United States.. Many patients could be suffering from multiple symptoms not only from the conditions, but side effects of the medications being offered. It is very difficult to manage multiple medications required by multiple comorbidities. Physicians from the community found their competence and ability to spend time with these patients taxing.

table of functional somatic symptoms

Most patients had a diagnosis of “functional headaches.” It’s not clear what this designation means, but conceivably it could include migraine headaches, which are accompanied by multiple physical complaints. CBT is not an evidence-based treatment of choice for functional headaches, much less migraines.

Over a third of the patients had irritable bowel syndrome (IBS). A systematic review of the comorbidity  of irritable bowel syndrome concluded physical comorbidity is the norm in IBS:

The nongastrointestinal nonpsychiatric disorders with the best-documented association are fibromyalgia (median of 49% have IBS), chronic fatigue syndrome (51%), temporomandibular joint disorder (64%), and chronic pelvic pain (50%).

In the United States, many patients and specialists would consider considering irritable bowel syndrome as a psychiatric condition offensive and counterproductive. There is growing evidence that irritable bowel syndrome is a disturbance in the gut microbiota. It involves a gut-brain interaction, but the primary direction of influence is of the disturbance in the gut on the brain. Anxiety and depression symptoms are secondary manifestations, a product of activity in the gut influencing the nervous system.

Most of the patients in the sample had a diagnosis of fibromyalgia and over half of all patients in this study had a diagnosis of chronic fatigue syndrome.

Other patients had diagnosable anxiety and depressive disorders, which, particularly at the lower end of severity, are responsive to nonspecific treatments.

Undoubtedly many of these patients, perhaps most of them, are demoralized by not been able to get a  diagnosis for what they have good basis to believe is a medical condition, aside from the discomfort, pain, and interference with their life that they are experiencing. They could be experiencing a demoralization secondary to physical illness.

These patients presented with pain, fatigue, general malaise, and demoralization. I have trouble imagining how their specific most pressing concerns could be addressed in group settings. These patients pose particular problems for making substantive clinical interpretation of outcomes that are highly general and subjective.

Conclusion: Diagnosing patients with multiple physical symptoms as having a psychiatric condition is highly controversial. Results will not generalize to countries and settings where the practice is not accepted. Many of the patients involved in the study had recognizable physical conditions, and yet they are being shunted to psychiatrists who focused only on their attitude towards the symptoms. They are being denied the specialist care and treatments that might conceivably reduce the impact of their conditions on their lives

6. The “CBT” offered in this study is as part of a complex, multicomponent treatment that does not resemble cognitive behavior therapy as it is practiced in the United States.

it is thoughtAs seen in figure 1 in the article, The multicomponent intervention is quite complex and consists of more than cognitive behavior therapy. Moreover, at least in the United States, CBT has distinctive elements of collaborative empiricism. Patients and therapist work together selecting issues on which to focus, developing strategies, with the patients reporting back on efforts to implement them. From the details available in the article, the treatment sounded much more like an exhortation or indoctrination, even arguing with the patients, if necessary. An English version available on the web of the educational material used in initial sessions confirmed a lot of condescending pseudoscience was presented to convince the patients that their problems were largely in their heads.

Without a clear application of learning theory, behavioral analysis, or cognitive science, the “CBT”  treatment offered in this RCT has much more in common with the creative novation therapy offered by Hans Eysenck, which is now known to have been justified with fraudulent data. Indeed,  the educational materials  for this study to what is offered in Eysenck’s study reveal striking similarities. Eysenck was advancing the claim that his intervention could prevent cardiovascular disease and cancer and overcome the iatrogenic effects. I know, this sounds really crazy, but see my careful documentation elsewhere.

Conclusion: The embedding of an unorthodox “CBT” in a multicomponent intervention in this study does not allow isolating any specific, active component ofCBT that might be at work.

7. The investigators disclose having altered their scoring of their primary outcome years after the trial began, and probably after a lot of outcome data had been collected.

I found a casual disclosure in the method section of this article unsettling, particularly noting that the original trial registration was:

We found an unexpected moderate negative correlation of the physical and mental component summary measures, which are constructed as independent measures. According to the SF-36 manual, a low or zero correlation of the physical and mental components is a prerequisite of their use.23 Moreover, three SF-36 scales that contribute considerably to the PCS did not fulfil basic scaling assumptions.31 These findings, together with a recent report of problems with the PCS in patients with physical and mental comorbidity,32 made us concerned that the PCS would not reliably measure patients’ physical health in the study sample. We therefore decided before conducting the analysis not to use the PCS, but to use instead the aggregate score as outlined above as our primary outcome measure. This decision was made on 26 February 2009 and registered as a protocol change at clinical trials. gov on 11 March 2009. Only baseline data had been analysed when we made our decision and the follow-up data were still concealed.

Switching outcomes, particularly after some results are known, constitutes a serious violation of best research practices and leads to suspicion of the investigators refining their hypotheses after they had peeked at the data. See How researchers dupe the public with a sneaky practice called “outcome switching”

The authors had originally proposed a scoring consistent with a very large body of literature. Dropping the original scoring precludes any direct comparison with this body of research, including basic norms. They claim that they switched scoring because two key subscales were correlated in the opposite direction of what is reported in the larger literature. This is troubling indication that something has gone terribly wrong in authors’ recruitment of a sample. It should not be pushed under the rug.

The authors claim that they switched outcomes based only on examining of baseline data from their study. However, one of the authors, Michael Sharpe is also an author on the controversial PACE trial  A parallel switch was made to the scoring of the subjective self-reports in that trial. When the data were eventually re-analyzed using the original scoring, any positive findings for the trial were substantially reduced and arguably disappeared.

Even if the authors of the present RCT did not peekat their outcome data before deciding to switch scoring of the primary outcome, they certainly had strong indications from other sources that the original scoring would produce weak or null findings. In 2009, one of the authors, Michael Sharpe had access to results of a relevant trial. What is called the FINE trial had null findings, which affected decisions to switch outcomes in the PACE trial. Is it just a coincidence that the scoring of the outcomes was then switched for the present RCT?

Conclusion: The outcome switching for the present trial  represents bad research practices. For the trial to have any credibility, the investigators should make their data publicly available so these data could be independently re-analyzed with the original scoring of primary outcomes.

The senior author’s clinic

 I invite readers to take a virtual tour of the website for the senior author’s clinical services  ]. Much of it is available in English. Recently, I blogged about dubious claims of a health care system in Detroit achieving a goal of “zero suicide.” . I suggested that the evidence for this claim was quite dubious, but was a powerful advertisement for the health care system. I think the present report of an RCT can similarly be seen as an infomercial for training and clinical services available in Denmark.

Conflict of interest

 No conflict of interest is declared for this RCT. Under somewhat similar circumstances, I formally complained about undeclared conflicts of interest in a series of papers published in PLOS One. A correction has been announced, but not yet posted.

Aside from the senior author’s need to declare a conflict of interest, the same can be said for one of the authors, Michael Sharpe.

Apart from the professional and reputational interest, (his whole career has been built making strong claims about such interventions) Sharpe works for insurance companies, and publishes on the subject. He declared a conflict of interest for the for PACE trial.

MS has done voluntary and paid consultancy work for government and for legal and insurance companies, and has received royalties from Oxford University Press.

Here’s Sharpe’s report written for the social benefits reinsurance company UnumProvident.

If results of this are accepted at face, they will lend credibility to the claims that effective interventions are available to reduce social disability. It doesn’t matter that the intervention is not effective. Rather persons receiving social disability payments can be disqualified because they are not enrolled in such treatment.

Effects on the credibility of Cochrane collaboration report

The switched outcomes of the trial were entered into a Cochrane systematic review, to which primary care health professionals look for guidance in dealing with a complex clinical situation. The review gives no indication of the host of problems that I exposed here. Furthermore, I have glanced at some of the other trials included and I see similar difficulties.

I been unable to convince the Cochrane to clean up conflicts of interest that are attached to switched outcomes being entered in reviews. Perhaps some of my readers will want to approach Cochrane to revisit this issue.
I think this post raises larger issues about whether Cochrane has any business conducting and disseminating reviews of such a bogus psychiatric diagnosis, medically unexplained symptoms. These reviews do patients no good, and may sidetrack them from getting the medical care they deserve. The reviews do serve the interest of special interests, including disability insurance companies.

Special thanks to John Peters and to Skeptical Cat for their assistance with my writing this blog. However, I have sole responsibility for any excesses or distortions.

 

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.