When psychotherapy trials have multiple flaws…

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

mind the brain logo

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

We can learn to spot features of psychotherapy trials that are likely to lead to exaggerated claims of efficacy for treatments or claims that will not generalize beyond the sample that is being studied in a particular clinical trial. We can look to the adequacy of sample size, and spot what Cochrane collaboration has defined as risk of bias in their handy assessment tool.

We can look at the case-mix in the particular sites where patients were recruited.  We can examine the adequacy of diagnostic criteria that were used for entering patients to a trial. We can examine how blinded the trial was in terms of whoever assigned patients to particular conditions, but also what the patients, the treatment providers, and their evaluaters knew which condition to which particular patients were assigned.

And so on. But what about combinations of these factors?

We typically do not pay enough attention multiple flaws in the same trial. I include myself among the guilty. We may suspect that flaws are seldom simply additive in their effect, but we don’t consider whether they may be even synergism in the negative effects on the validity of a trial. As we will see in this analysis of a clinical trial, multiple flaws can provide more threats to the validity trial than what we might infer when the individual flaws are considered independently.

The particular paper we are probing is described in its discussion section as the “largest RCT to date testing the efficacy of group CBT for patients with CFS.” It also takes on added importance because two of the authors, Gijs Bleijenberg and Hans Knoop, are considered leading experts in the Netherlands. The treatment protocol was developed over time by the Dutch Expert Centre for Chronic Fatigue (NKCV, http://www.nkcv.nl; Knoop and Bleijenberg, 2010). Moreover, these senior authors dismiss any criticism and even ridicule critics. This study is cited as support for their overall assessment of their own work.  Gijs Bleijenberg claims:

Cognitive behavioural therapy is still an effective treatment, even the preferential treatment for chronic fatigue syndrome.

But

Not everybody endorses these conclusions, however their objections are mostly baseless.

Spoiler alert

This is a long read blog post. I will offer a summary for those who don’t want to read through it, but who still want the gist of what I will be saying. However, as always, I encourage readers to be skeptical of what I say and to look to my evidence and arguments and decide for themselves.

Authors of this trial stacked the deck to demonstrate that their treatment is effective. They are striving to support the extraordinary claim that group cognitive behavior therapy fosters not only better adaptation, but actually recovery from what is internationally considered a physical condition.

There are some obvious features of the study that contribute to the likelihood of a positive effect, but these features need to be considered collectively, in combination, to appreciate the strength of this effort to guarantee positive results.

This study represents the perfect storm of design features that operate synergistically:

perfect storm

 Referral bias – Trial conducted in a single specialized treatment setting known for advocating psychological factors maintaining physical illness.

Strong self-selection bias of a minority of patients enrolling in the trial seeking a treatment they otherwise cannot get.

Broad, overinclusive diagnostic criteria for entry into the trial.

Active treatment condition carry strong message how patients should respond to outcome assessment with improvement.

An unblinded trial with a waitlist control lacking the nonspecific elements (placebo) that confound the active treatment.

Subjective self-report outcomes.

Specifying a clinically significant improvement that required only that a primary outcome be less than needed for entry into the trial

Deliberate exclusion of relevant objective outcomes.

Avoidance of any recording of negative effects.

Despite the prestige attached to this trial in Europe, the US Agency for Healthcare Research and Quality (AHRQ) excludes this trial from providing evidence for its database of treatments for chronic fatigue syndrome/myalgic encephalomyelitis. We will see why in this post.

factsThe take away message: Although not many psychotherapy trials incorporate all of these factors, most trials have some. We should be more sensitive to when multiple factors occur in the same trial, like bias in the site for patient recruitment; lacking of blinding; lack of balance between active treatment and control condition in terms of nonspecific factors, and subjective self-report measures.

The article reporting the trial is

Wiborg JF, van Bussel J, van Dijk A, Bleijenberg G, Knoop H. Randomised controlled trial of cognitive behaviour therapy delivered in groups of patients with chronic fatigue syndrome. Psychotherapy and Psychosomatics. 2015;84(6):368-76.

Unfortunately, the article is currently behind a pay wall. Perhaps readers could contact the corresponding author Hans.knoop@radboudumc.nl  and request a PDF.

The abstract

Background: Meta-analyses have been inconclusive about the efficacy of cognitive behaviour therapies (CBTs) delivered in groups of patients with chronic fatigue syndrome (CFS) due to a lack of adequate studies. Methods: We conducted a pragmatic randomised controlled trial with 204 adult CFS patients from our routine clinical practice who were willing to receive group therapy. Patients were equally allocated to therapy groups of 8 patients and 2 therapists, 4 patients and 1 therapist or a waiting list control condition. Primary analysis was based on the intention-to-treat principle and compared the intervention group (n = 136) with the waiting list condition (n = 68). The study was open label. Results: Thirty-four (17%) patients were lost to follow-up during the course of the trial. Missing data were imputed using mean proportions of improvement based on the outcome scores of similar patients with a second assessment. Large and significant improvement in favour of the intervention group was found on fatigue severity (effect size = 1.1) and overall impairment (effect size = 0.9) at the second assessment. Physical functioning and psychological distress improved moderately (effect size = 0.5). Treatment effects remained significant in sensitivity and per-protocol analyses. Subgroup analysis revealed that the effects of the intervention also remained significant when both group sizes (i.e. 4 and 8 patients) were compared separately with the waiting list condition. Conclusions: CBT can be effectively delivered in groups of CFS patients. Group size does not seem to affect the general efficacy of the intervention which is of importance for settings in which large treatment groups are not feasible due to limited referral

The trial registration

http://www.isrctn.com/ISRCTN15823716

Who was enrolled into the trial?

Who gets into a psychotherapy trial is a function of the particular treatment setting of the study, the diagnostic criteria for entry, and patient preferences for getting their care through a trial, rather than what is being routinely provided in that setting.

 We need to pay particular attention to when patients enter psychotherapy trials hoping they will receive a treatment they prefer and not to be assigned to the other condition. Patients may be in a clinical trial for the betterment of science, but in some settings, they are willing to enroll because of a probability of getting treatment they otherwise could not get. This in turn also affects the evaluation of both the condition in which they get the preferred treatment, but also their evaluation of the condition in which they are denied it. Simply put, they register being pleased with what they wanted or not being pleased if they did not get what they wanted.

The setting is relevant to evaluating who was enrolled in a trial.

The authors’ own outpatient clinic at the Radboud University Medical Center was the site of the study. The group has an international reputation for promoting the biopsychosocial model, in which psychological factors are assumed to be the decisive factor in maintaining somatic complaints.

All patients were referred to our outpatient clinic for the management of chronic fatigue.

There is thus a clear referral bias  or case-mix bias but we are not provided a ready basis for quantifying it or even estimating its effects.

The diagnostic criteria.

The article states:

In accordance with the US Center for Disease Control [9], CFS was defined as severe and unexplained fatigue which lasts for at least 6 months and which is accompanied by substantial impairment in functioning and 4 or more additional complaints such as pain or concentration problems.

Actually, the US Center for Disease Control would now reject this trial because these entry criteria are considered obsolete, overinclusive, and not sufficiently exclusive of other conditions that might be associated with chronic fatigue.*

There is a real paradigm shift happening in America. Both the 2015 IOM Report and the Centers for Disease Control and Prevention (CDC) website emphasize Post Exertional Malaise and getting more ill after any effort with M.E. CBT is no longer recommended by the CDC as treatment.

cdc criteriaThe only mandatory symptom for inclusion in this study is fatigue lasting 6 months. Most properly, this trial targets chronic fatigue [period] and not the condition, chronic fatigue syndrome.

Current US CDC recommendations  (See box  7-1 from the IoM document, above) for diagnosis require postexertional malaise for a diagnosis of myalgic encephalomyelitis (ME). See below.

pemPatients meeting the current American criteria for ME would be eligible for enrollment in this trial, but it’s unclear what proportion of the patients enrolled actually met the American criteria. Because of the over-inclusiveness of the entry diagnostic criteria, it is doubtful whether the results would generalize to American sample. A look at patient flow into the study will be informative.

Patient flow

Let’s look at what is said in the text, but also in the chart depicting patient flow into the trial for any self-selection that might be revealed.

In total, 485 adult patients were diagnosed with CFS during the inclusion period at our clinic (fig. 1). One hundred and fifty-seven patients were excluded from the trial because they declined treatment at our clinic, were already asked to participate in research incompatible with inclusion (e.g. research focusing on individual CBT for CFS) or had a clinical reason for exclusion (i.e. they received specifically tailored interventions because they were already unsuccessfully treated with individual CBT for CFS outside our clinic or were between 18 and 21 years of age and the family had to be involved in the therapy). Of the 328 patients who were asked to engage in group therapy, 99 (30%) patients indicated that they were unwilling to receive group therapy. In 25 patients, the reason for refusal was not recorded. Two hundred and four patients were randomly allocated to one of the three trial conditions. Baseline characteristics of the study sample are presented in table 1. In total, 34 (17%) patients were lost to follow-up. Of the remaining 170 patients, 1 patient had incomplete primary outcome data and 6 patients had incomplete secondary outcome data.

flow chart

We see that the investigators invited two thirds of patients attending the clinic to enroll in the trial. Of these, 41% refused. We don’t know the reason for some of the refusals, but almost a third of the patients approached declined because they did not want group therapy. The authors left being able to randomize 42% of patients coming to the clinic or less than two thirds of patients they actually asked. Of these patients, a little more than two thirds received the treatment to which were randomized and were available for follow-up.

These patients receiving treatment to which they were randomized and who were available for follow-up are self-selected minority of the patients coming to the clinic. This self-selection process likely reduced the proportion of patients with myalgic encephalomyelitis. It is estimated that 25% of patients meeting the American criteria a housebound and 75% are unable to work. It’s reasonably to infer that patients being the full criteria would opt out of a treatment that require regular attendance of a group session.

The trial is biased to ambulatory patients with fatigue and not ME. Their fatigue is likely due to some combinations of factors such as multiple co-morbidities, as-yet-undiagnosed medical conditions, drug interactions, and the common mild and subsyndromal  anxiety and depressive symptoms that characterize primary care populations.

The treatment being evaluated

Group cognitive behavior therapy for chronic fatigue syndrome, either delivered in a small (4 patients and 1 therapist) or larger (8 patients and 2 therapists) group format.

The intervention consisted of 14 group sessions of 2 h within a period of 6 months followed by a second assessment. Before the intervention started, patients were introduced to their group therapist in an individual session. The intervention was based on previous work of our research group [4,13] and included personal goal setting, fixing sleep-wake cycles, reducing the focus on bodily symptoms, a systematic challenge of fatigue-related beliefs, regulation and gradual increase in activities, and accomplishment of personal goals. A formal exercise programme was not part of the intervention.

Patients received a workbook with the content of the therapy. During sessions, patients were explicitly invited to give feedback about fatigue-related cognitions and behaviours to fellow patients. This aspect was introduced to facilitate a pro-active attitude and to avoid misperceptions of the sessions as support group meetings which have been shown to be insufficient for the treatment of CFS.

And note:

In contrast to our previous work [4], we communicated recovery in terms of fatigue and disabilities as general goal of the intervention.

Some impressions of the intensity of this treatment. This is a rather intensive treatment with patients having considerable opportunities for interactions with providers. This factor alone distinguishes being assigned to the intervention group versus being left in the wait list control group and could prove powerful. It will be difficult to distinguish intensity of contact from any content or active ingredients of the therapy.

I’ll leave for another time a fuller discussion of the extent to which what was labeled as cognitive behavior therapy in this study is consistent with cognitive therapy as practiced by Aaron Beck and other leaders of the field. However, a few comments are warranted. What is offered in this trial does not sound like cognitive therapy as Americans practice it. What is often in this trial seems emphasize challenging beliefs, pushing patients to get more active, along with psychoeducational activities. I don’t see indications of the supportive, collaborative relationship in which patients are encouraged to work on what they want to work on, engage in outside activities (homework assignments) and get feedback.

What is missing in this treatment is what Beck calls collaborative empiricism, “a systemic process of therapist and patient working together to establish common goals in treatment, has been found to be one of the primary change agents in cognitive-behavioral therapy (CBT).”

Importantly, in Beck’s approach, the therapist does not assume cognitive distortions on the part of the patient. Rather, in collaboration with the patient, the therapist introduces alternatives to the interpretations that the patient has been making and encourages the patient to consider the difference. In contrast, rather than eliciting goal statements from patients, therapist in this study imposes the goal of increased activity. Therapists in this study also seem ready to impose their views that the patients’ fatigue-related beliefs are maladaptive.

The treatment offered in this trial is complex, with multiple components making multiple assumptions that seem quite different from what is called cognitive therapy or cognitive behavioral therapy in the US.

The authors’ communication of recovery from fatigue and disability seems a radical departure not only from cognitive behavior therapy for anxiety and depression and pain, but for cognitive behavior therapy offered for adaptation to acute and chronic physical illnesses. We will return to this “communication” later.

The control group

Patients not randomized to group CBT were placed on a waiting list.

Think about it! What do patients think about having gotten involved in all the inconvenience and burden of a clinical trial in hope that they would get treatment and then being assigned to the control group with just waiting? Not only are they going to be disappointed and register that in their subjective evaluations of the outcome assessments patients may worry about jeopardizing the right to the treatment they are waiting for if they overly endorse positive outcomes. There is a potential for  nocebo effect , compounding the placebo effect of assignment to the CBT active treatment groups.

What are informative comparisons between active treatments and  control conditions?

We need to ask more often what inclusion of a control group accomplishes for the evaluation of a psychotherapy. In doing so, we need to keep in mind that psychotherapies do not have effect sizes, only comparisons of psychotherapies and control condition have effect sizes.

A pre-post evaluation of psychotherapy from baseline to follow-up includes the effects of any active ingredient in the psychotherapy, a host of nonspecific (placebo) factors, and any changes that would’ve occurred in the absence of the intervention. These include regression to the mean– patients are more likely to enter a clinical trial now, rather than later or previously, if there has been exacerbation of their symptoms.

So, a proper comparison/control condition includes everything that the patients randomized to the intervention group get except for the active treatment. Ideally, the intervention and the comparison/control group are equivalent on all these factors, except the active ingredient of the intervention.

That is clearly not what is happening in this trial. Patients randomized to the intervention group get the intervention, the added intensity and frequency of contact with professionals that the intervention provides, and all the support that goes with it; and the positive expectations that come with getting a therapy that they wanted.

Attempts to evaluate the group CBT versus the wait-list control group involved confounding the active ingredients of the CBT and all these nonspecific effects. The deck is clearly being stacked in favor of CBT.

This may be a randomized trial, but properly speaking, this is not a randomized controlled trial, because the comparison group does not control for nonspecific factors, which are imbalanced.

The unblinded nature of the trial

In RCTs of psychotropic drugs, the ideal is to compare the psychotropic drug to an inert pill placebo with providers, patients, and evaluate being blinded as to whether the patients received psychotropic drug or the comparison pill.

While it is difficult to achieve a comparable level of blindness and a psychotherapy trial, more of an effort to achieve blindness is desirable. For instance, in this trial, the authors took pains to distinguish the CBT from what would’ve happened in a support group. A much more adequate comparison would therefore be CBT versus either a professional or peer-led support group with equivalent amounts of contact time. Further blinding would be possible if patients were told only two forms of group therapy were being compared. If that was the information available to patients contemplating consenting to the trial, it wouldn’t have been so obvious from the outset to the patients being randomly assigned that one group was preferable to the other.

Subjective self-report outcomes.

The primary outcomes for the trial were the fatigue subscale of the Checklist Individual Strength;  the physical functioning subscale of the Short Health Survey 36 (SF-36); and overall impairment as measured by the Sickness Impact Profile (SIP).

Realistically, self-report outcomes are often all that is available in many psychotherapy trials. Commonly these are self-report assessments of anxiety and depressive symptoms, although these may be supplemented by interviewer-based assessments. We don’t have objective biomarkers with which to evaluate psychotherapy.

These three self-report measures are relatively nonspecific, particularly in a population that is not characterized by ME. Self-reported fatigue in a primary care population lacks discriminative validity with respect to pain, anxiety and depressive symptoms, and general demoralization.  The measures are susceptible to receipt of support and re-moralization, as well as gratitude for obtaining a treatment that was sought.

Self-report entry criteria include a score 35 or higher on the fatigue severity subscale. Yet, a score of less than 35 on this scale at follow up is part of what is defined as a clinically significant improvement with a composite score from combined self-report measures.

We know from medical trials that differences can be observed with subjective self-report measures that will not be found with objective measures. Thus, mildly asthmatic patients will fail to distinguish in their subjective self-reports between [  between the effective inhalant albuterol, an inert inhalant, and sham acupuncture, but will rate improvement better than getting no intervention.  However,  there will be a strong advantage over the other three conditions with an objective measure, maximum forced expiratory volume in 1 second (FEV1) as assessed  with spirometry.

The suppression of objective outcome measures

We cannot let these the authors of this trial off the hook in their dependence on subjective self-report outcomes. They are instructing patients that recovery is the goal, which implies that it is an attainable goal. We can reasonably be skeptical about acclaim of recovery based on changes in self-report measures. Were the patients actually able to exercise? What was their exercise capacity, as objectively measured? Did they return to work?

These authors have included such objective measurements in past studies, but not included them as primary outcomes, nor, even in some cases, reported them in the main paper reporting the trial.

Wiborg JF, Knoop H, Stulemeijer M, Prins JB, Bleijenberg G. How does cognitive behaviour therapy reduce fatigue in patients with chronic fatigue syndrome? The role of physical activity. Psychol Med. 2010 Jan 5:1

The senior authors’ review fails to mention their three studies using actigraphy that did not find effects for CBT. I am unaware of any studies that did find enduring effects.

Perhaps this is what they mean when they say the protocol has been developed over time – they removed what they found to be threats to the findings that they wanted to claim.

Dismissing of any need to consider negative effects of treatment

Most psychotherapy fail to assess any adverse effects of treatment, but this is usually done discretely, without mention. In contrast, this article states

Potential harms of the intervention were not assessed. Previous research has shown that cognitive behavioural interventions for CFS are safe and unlikely to produce detrimental effects.

Patients who meet stringent criteria for ME would be put at risk for pressure to exert themselves. By definition they are vulnerable to postexertional malaise (PEM). Any trail of this nature needs to assess that risk. Maybe no adverse effects would be found. If that were so, it would strongly indicate the absence of patients with appropriate diagnoses.

Timing of assessment of outcomes varied between intervention and control group.

I at first did not believe what I was reading when I encountered this statement in the results section.

The mean time between baseline and second assessment was 6.2 months (SD = 0.9) in the control condition and 12.0 months (SD = 2.4) in the intervention group. This difference in assessment duration was significant (p < 0.001) and was mainly due to the fact that the start of the therapy groups had to be frequently postponed because of an irregular patient flow and limited treatment capacities for group therapy at our clinic. In accordance with the treatment manual, the second assessment was postponed until the fourteenth group session was accomplished. The mean time between the last group session and the second assessment was 3.3 weeks (SD = 3.5).

So, outcomes were assessed for the intervention group shortly after completion of therapy, when nonspecific (placebo) effects would be stronger, but a mean of six months later than for patients assigned to the control condition.

Post-hoc statistical controls are not sufficient to rescue the study from this important group difference, and it compounds other problems in the study.

Take away lessons

Pay more attention to how limitations any clinical trial may compound each other in terms of the trial provide exaggerated estimates of the effects of treatment or the generalizability of the results to other settings.

Be careful of loose diagnostic criteria because a trial may not generalize to the same criteria being applied in settings that are different either in terms of patient population of the availability of different treatments. This is particularly important when a treatment setting has a bias in referrals and only a minority of patients being invited to participate in the trial actually agree and are enrolled.

Ask questions about just what information is obtained in comparing active treatment group and the study to its control/comparison. For start, just what is being controlled and how might that affect the estimates of the effectiveness of the active treatment?

Pay particular attention to the potent combination of the trial being unblinded, a weak comparision/control, and an active treatment that is not otherwise available to patients.

Note

*The means of determining whether the six months of fatigue might be accounted for by other medical factors was specific to the setting. Note that a review of medical records for sufficient for an unknown proportion of patients, with no further examination or medical tests.

The Department of Internal Medicine at the Radboud University Medical Center assessed the medical examination status of all patients and decided whether patients had been sufficiently examined by a medical doctor to rule out relevant medical explanations for the complaints. If patients had not been sufficiently examined, they were seen for standard medical tests at the Department of Internal Medicine prior to referral to our outpatient clinic. In accordance with recommendations by the Centers for Disease Control, sufficient medical examination included evaluation of somatic parameters that may provide evidence for a plausible somatic explanation for prolonged fatigue [for a list, see [9]. When abnormalities were detected in these tests, additional tests were made based on the judgement of the clinician of the Department of Internal Medicine who ultimately decided about the appropriateness of referral to our clinic. Trained therapists at our clinic ruled out psychiatric comorbidity as potential explanation for the complaints in unstructured clinical interviews.

workup

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.

 

COBRA study would have shown homeopathy can be substituted for cognitive behavior therapy for depression

If The Lancet COBRA study had evaluated homeopathy rather than behavioural activation (BA), homeopathy would likely have similarly been found “non-inferior” to cognitive behavior therapy.

This is not an argument for treating depression with homeopathy, but an argument that the 14 talented authors of The Lancet COBRA study stacked the deck for their conclusion that BA could be substituted for CBT in routine care for depression without loss of effectiveness. Conflict of interest and catering to politics intruded on science in the COBRA trial.

If a study like COBRA produces phenomenally similar results with treatments based on distinct mechanisms of change, one possibility is that background nonspecific factors are dominating the results. Insert homeopathy, a bogus treatment with strong nonspecific effects, in place of BA, and a non-inferiority may well be shown.

Why homeopathy?

Homeopathy involves diluting a substance so thoroughly that no molecules are likely to be present in what is administered to patients. The original substance is first diluted to one part per 10,000 part alcohol or distilled water. This process is repeated six times, ending up with the original material diluted by a factor of 100−6=10−12 .

Nonetheless, a super diluted and essentially inert substance is selected and delivered within a complex ritual.  The choice of the particular substance being diluted and the extent of its dilution is determined with detailed questioning of patients about their background, life style, and personal functioning. Naïve and unskeptical patients are likely to perceive themselves as receiving exceptionally personalized medicine delivered by a sympathetic and caring provider. Homeopathy thus has potentially strong nonspecific (placebo) elements that may be lacking in the briefer and less attentive encounters of routine medical care.

As an academic editor at PLOS One, I received considerable criticism for having accepted a failed trial of homeopathy for depression. The study had been funded by the German government and had fallen miserably short in efforts to recruit the intended sample size. I felt the study should be published in PLOS One  to provide evidence whether such and worthless studies should be undertaken in the future. But I also wanted readers to have the opportunity to see what I had learned from the article about just how ritualized homeopathy can be, with a strong potential for placebo effects.

Presumably, readers would then be better equipped to evaluate when authors claim in other contexts that homeopathy is effective from clinical trials with it was inadequate control of nonspecific effects. But that is also a pervasive problem in psychotherapy trials [ 1,  2 ]  that do not have a suitable comparison/control group.

I have tried to reinforce this message in the evaluation of complementary or integrative treatments in Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned.

The Lancet COBRA study

The Lancet COBRA study has received extraordinary promotion as evidence for the cost-effectiveness of substituting behavioural activation therapy (BA) delivered by minimally trained professionals for cognitive behaviour therapy (CBT) for depression. The study  is serving as the basis for proposals to cut costs in the UK National Health Service by replacing more expensive clinical psychologists with less trained and experienced providers.

Coached by the Science Media Centre, the authors of The Lancet study focused our attention on their finding no inferiority of BA to CBT. They are distracting us from the more important question of whether either treatment had any advantage over nonspecific interventions in the unusual context in which they were evaluated.

The editorial accompanying the COBRA study suggest a BA involves a simple message delivered by providers with very little training:

“Life will inevitably throw obstacles at you, and you will feel down. When you do, stay active. Do not quit. I will help you get active again.”

I encourage readers to stop and think how depressed persons suffering substantial impairment, including reduced ability to experience pleasure, would respond to such suggestions. It sounds all too much like the “Snap out of it, Debbie” they may have already heard from people around them or in their own self-blame.

Snap out of it, Debbie (from South Park)

 BA by any other name…

Actually, this kind of activation is routinely provided in in primary care in some countries as a first stage treatment in a stepped care approach to depression.

In such a system, when emergent mild to moderate depressive symptoms are uncovered in a primary medical care setting, providers are encouraged neither to initiate an active treatment nor even make a formal psychiatric diagnosis of a condition that could prove self-limiting with a brief passage of time. Rather, providers are encouraged to defer diagnosis and schedule a follow-up appointment. This is more than simple watchful waiting. Until the next appointment, providers encourage patients to undertake some guided self-help, including engagement in pleasant activities of their choice, much as apparently done in the BA condition in the COBRA study. Increasingly, they may encourage Internet-based therapy.

In a few parts of the UK, general practitioners may refer patients to a green gym.

green gym

It’s now appreciated that to have any effectiveness, such prescriptions have to be made in a relationship of supportive accountability. For patients to adhere adequately to such prescriptions and not feel they are simply being dismissed by the provider and sent away. Patients need to have a sense that the prescription is occurring within the context of a relationship with someone who cares with whether they carry out and benefit from the prescription.

Used in this way, this BA component of stepped care could possibly be part of reducing unnecessary medication and the need for more intensive treatment. However, evaluation of cost effectiveness is complicated by the need for a support structure in which treatment can be monitored, including any antidepressant medication that is subsequently prescribed. Otherwise, the needs of a substantial number of patients needing more intensive, quality care for depression would be neglected.

The shortcomings of COBRA as an evaluation of BA in context

COBRA does not provide an evaluation of any system offering BA to the large pool of patients who do not require more intensive treatment in a system where they would be provided appropriate timely evaluation and referral onwards.

It is the nature of mild to moderate depressive symptoms being presented in primary care, especially when patients are not specifically seeking mental health treatment, that the threshold for a formal diagnosis of major depression is often met by the minimum or only one more than the five required symptoms. Diagnoses are of necessity unreliable, in part because the judgment of particular symptoms meeting a minimal threshold of severity is unreliable. After a brief passage of time and in the absence of formal treatment, a substantial proportion of patients will no longer meet diagnostic criteria.

COBRA also does not evaluate BA versus CBT in the more select population that participates in clinical trials of treatment for depression. Sir David Goldberg is credited  with first describing the filters that operate on the pathway of patients from presenting a complex combination of problems in living and psychiatric symptoms in primary medical care to treatment in specialty settings.

Results of the COBRA study cannot be meaningfully integrated into the existing literature concerning BA as a component of stepped care or treatment for depression that is sufficient in itself.

More recently, I reviewed in detail The Lancet COBRA study, highlighting how one of the most ambitious and heavily promoted psychotherapy studies ever – was noninformative.  The authors’ claim was unwarranted that it would be wise to substitute BA delivered by minimally trained providers for cognitive behavior therapy delivered by clinical psychologists.

I refer readers to that blog post for further elaboration of some points I will be making here. For instance, some readers might want to refresh their sense of how a noninferiority trial differs from a conventional comparison of two treatments.

Risk of bias in noninferiority trial

 Published reports of clinical trials are notoriously unreliable and biased in terms of the authors’ favored conclusions.

With the typical evaluation of an active treatment versus a control condition, the risk of bias is that reported results will favor the active treatment. However, the issue of bias in a noninferiority trial is more complex. The investigators’ interest is in demonstrating that within certain limits, there are no significant differences between two treatments. Yet, although it is not always tested directly, the intention is to show that this lack of difference is due them both being effective, rather than ineffective.

In COBRA, the authors’ clear intention was to show that less expensive BA was not inferior to CBT, with the assumption that both were effective. Biases can emerge from building in features of the design, analysis, and interpretation of the study that minimized differences between these two treatments. But bias can also arise from a study design in which nonspecific effects are distributed across interventions so that any difference in active ingredients is obscured by shared features of the circumstances in which the interventions are delivered. As in Alice in Wonderland [https://en.wikipedia.org/wiki/Dodo_bird_verdict ], the race is rigged so that almost everybody can get a prize.

Why COBRA could have shown almost any treatment with nonspecific effects was noninferior to CBT for depression

 1.The investigators chose a population and a recruitment strategy that increase the likelihood that patients participating in the trial would likely get better with minimal support and contact available in either of the two conditions – BA versus CBT.

The recruited patients were not actively seeking treatment. They were identified from records of GPs has having had a diagnosis of depression, but were required to not currently being in psychotherapy.

GP recording of a diagnosis of depression has poor concordance with a formal, structured interview-based diagnosis, with considerable overdiagnosis and overtreatment.

A recent Dutch study found that persons meeting interview-based criteria for major depression in the community who do not have a past history of treatment mostly are not found to be depressed upon re-interview.

To be eligible for participation in the study, the patients also had to meet criteria for major depression in a semi structured research interview with (Structured Clinical Interview for the Diagnostic and Statistical Manual of  Mental Disorders, Fourth Edition [SCID]. Diagnoses with the SCID obtained under these circumstances are also likely to have a considerable proportion of false positives.

A dirty secret from someone who has supervised thousands of SCID interviews of medical patients. The developers of the SCID recognized that it yielded a lot of false positives and inflated rates of disorder among patients who are not seeking mental health care.

They attempted to compensate by requiring that respondents not only endorse symptoms, but indicate that the symptoms are a source of impairment. This is the so-called clinical significance criterion. Respondents automatically meet the criterion if they are seeking mental health treatment. Those who are not seeking treatment are asked directly whether the symptoms impair them. This is a particularly on validated aspect of the SCID in patients typically do not endorse their symptoms as a source of impairment.

When we asked breast cancer patients who otherwise met criteria for depression with the SCID whether the depressive symptoms impaired them, they uniformly said something like ‘No, my cancer impairs me.’ When we conducted a systematic study of the clinical significance criterion, we found that whether or not it was endorsed substantially affected individual in overall rates of diagnosis. Robert Spitzer, who developed the SCID interview along with his wife Janet Williams, conceded to me in a symposium that application of the clinical significance criterion was a failure.

What is the relevance in a discussion of the COBRA study? I would wager that the authors, like most investigators who use the SCID, did not inquire about the clinical significance criterion, and as a result they had a lot of false positives.

The population being sampled in the recruitment strategy used in COBRA is likely to yield a sample unrepresentative of patients participating in the usual trials of psychotherapy and medication for depression.

2. Most patients participating in COBRA reported already receiving antidepressants at baseline, but adherence and follow-up are unknown, but likely to be inadequate.

Notoriously, patients receiving a prescription for an antidepressant in primary care actually take the medication inconsistently and for only a short time, if at all. They receive inadequate follow-up and reassessment. Their depression outcomes may actually be poorer than for patients receiving a pill placebo in the context of a clinical trial, where there is blinding and a high degree of positive expectations, attention and support.

Studies, including one by an author of the COBRA study suggests that augmenting adequately managed treatment with antidepressants with psychotherapy is unlikely to improve outcomes.

We’re stumbling upon one of the more messy features of COBRA. Most patients had already been prescribed medication at baseline, but their adherence and follow-up is left unreported, but is likely to be poor. The prescription is likely to have been made up to two years before baseline.

It would not be cost-effective to introduce psychotherapy to such a sample without reassessing whether they were adequately receiving medication. Such a sample would also be highly susceptible to nonspecific interventions providing positive expectations, support, and attention that they are not receiving in their antidepressant treatment. There are multiple ways in which nonspecific effects could improve outcomes – perhaps by improving adherence, but perhaps because of the healing effects of support on mild depressive symptoms.

3. The COBRA authors’ way of dealing with co-treatment with antidepressants blocked readers ability to independently evaluate main effects and interactions with BA versus CBT.

 The authors used antidepressant treatment as a stratification factor, insuring that the 70% of patients receiving them were evenly distributed the BA in CBT conditions. This strategy made it more difficult to separate effects of antidepressants. However, the problem is compounded by the authors failure to provide subgroup analyses based on whether patients had received an antidepressant prescription, as well as the authors failure to provide any descriptions of the extent to which patients received management of their antidepressants at baseline or during active psychotherapy and follow-up. The authors incorporated data concerning the cost of medication into their economic analyses, but did not report the data in a way that could be scrutinized.

I anticipate requesting these data from the authors to find out more, although they have not responded to my previous query concerning anomalies in the reporting of how long since patients had first received a prescription for antidepressants.

4. The 12 month assessment designated as the primary outcomes capitalized on natural recovery patterns, unreliability of initial diagnosis, and simple regression to the mean.

Depression identified in the community and in primary care patient populations is variable in the course, but typically resolves in nine months. Making reassessment of primary outcomes at 12 months increases the likelihood that effects of active ingredients of the two treatments would be lost in a natural recovery process.

5. The intensity of treatment (allowable number of 20 sessions plus for additional sessions) offered in the study exceeded what is available in typical psychotherapy trials and exceeded what was actually accessed by patients.

Allowing this level of intensity of treatment generates a lot of noise in any interpretation of the resulting data. Offering so much treatment encourages patients dropping out, with the loss of their follow-up data. We can’t tell if they simply dropped out because they had received what they perceived as sufficient treatment or if they were dissatisfied. This intensity of offered treatment reduces generalizability to what actually occurs in routine care and comparing and contrasting results of the COBRA study to the existing literature.

 6. The low rate of actual uptake of psychotherapy and retention of patients for follow-up present serious problems for interpreting the results of the COBRA study.

Intent to treat analyses with imputation of missing data are simply voodoo statistics with so much missing data. Imputation and other multivariate techniques make the assumption that data are missing at random, but as I just noted, this is an improbable assumption. [I refer readers back to my previous blog post who want to learn more about intent to treat versus per-protocol analyses].

The authors cite past literature in their choice to emphasize the per-protocol analyses. That means that they based their interpretation of the results on 135 of 221 patients originally assigned to the BA and in the 151 of 219 patients originally signed to CBT. This is a messy approach and precludes generalizing back to original assignment. That’s why that intent to treat analyses are emphasized in conventional evaluations of psychotherapy.

A skeptical view of what will be done with the COBRA data

 The authors clear intent was to produce data supporting an argument that more expensive clinical psychologists could be replaced by less trained clinicians providing a simplified treatment. The striking lack of differences between BA and CBT might be seen as strong evidence that BA could replace CBT. Yet, I am suggesting that the striking lack of differences could also indicate features built into the design that swamped any differences in limited any generalizability to what would happen if all depressed patients were referred to BA delivered by clinicians with little training versus CBT. I’m arguing that homeopathy would have done as well.

BA is already being implemented in the UK and elsewhere as part of stepped care initiatives for depression. Inclusion of BA is inadequately evaluated, as is the overall strategy of stepped care. See here for an excellent review of stepped care initiatives and a tentative conclusion that they are moderately effective, but that many questions remain.

If the COBRA authors were most committed to improving the quality of depression care in the UK, they would’ve either designed their study as a fairer test of substituting BA for CBT or they would have tackled the more urgent task of evaluating rigorously whether stepped care initiatives work.

Years ago, collaborative care programs for depression were touted as reducing overall costs. These programs, which were found to be robustly effective in many contexts, involved placing depression managers in primary care to assist the GPs in improved monitoring and management of treatment. Often the most immediate and effective improvement was that patients got adequate follow-up, where previously they were simply being ignored. Collaborative care programs did not prove to be cheaper, and not surprising, because better care is often more expensive than ineptly provided inadequate care.

We should be extremely skeptical of experienced investigators who claim that they demonstrate that they can cut costs and maintain quality with a wholesale reduction in the level of training of providers treating depression, a complex and heterogeneous disorder, especially when their expensive study fails to deal with this complexity and heterogeneity.

 

Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned

  • A chance to test your rules of thumb for quickly evaluating clinical trials of alternative or integrative  medicine in prestigious journals.
  • A chance to increase your understanding of the importance of  well-defined control groups and blinding in evaluating the risk of bias of clinical trials.
  • A chance to understand the difference between merely evidence-based treatments versus science-based treatments.
  • Lessons learned can be readily applied to many wasteful evaluations of psychotherapy with shared characteristics.

A press release from the University of Michigan about a study of acupressure for fatigue in cancer patients was churnaled  – echoed – throughout the media. It was reproduced dozens of times, with little more than an editor’s title change from one report to the next.

Fortunately, the article that inspired all the fuss was freely available from the prestigious JAMA: Oncology. But when I gained access, I quickly saw that it was not worth my attention, based on what I already knew or, as I often say, my prior probabilities. Rules of thumb is a good enough term.

So the article became another occasion for us to practice our critical appraisal skills, including, importantly, being able to make reliable and valid judgments that some attention in the media is worth dismissing out of hand, even when tied to an article in a prestigious medical journal.

The press release is here: Acupressure reduced fatigue in breast cancer survivors: Relaxing acupressure improved sleep, quality of life.

A sampling of the coverage:

sample coverage

As we’ve come to expect, the UK Daily Mail editor added its own bit of spin:

daily mailHere is the article:

Zick SM, Sen A, Wyatt GK, Murphy SL, Arnedt J, Harris RE. Investigation of 2 Types of Self-administered Acupressure for Persistent Cancer-Related Fatigue in Breast Cancer Survivors: A Randomized Clinical Trial. JAMA Oncol. Published online July 07, 2016. doi:10.1001/jamaoncol.2016.1867.

Here is the Trial registration:

All I needed to know was contained in a succinct summary at the Journal website:

key points

This is a randomized clinical trial (RCT) in which two active treatments that

  • Lacked credible scientific mechanisms
  • Were predictably shown to be better than
  • A routine care that lacked the positive expectations and support.
  • A primary outcome assessed by  subjectiveself-report amplified the illusory effectiveness of the treatments.

But wait!

The original research appeared in a prestigious peer-reviewed journal published by the American Medical Association, not a  disreputable journal on Beall’s List of Predatory Publishers.

Maybe  this means publication in a peer-reviewed prestigious journal is insufficient to erase our doubts about the validity of claims.

The original research was performed with a $2.65 million peer-reviewed grant from the National Cancer Institute.

Maybe NIH is wasting scarce money on useless research.

What is acupressure?

 According to the article

Acupressure, a method derived from traditional Chinese medicine (TCM), is a treatment in which pressure is applied with fingers, thumbs, or a device to acupoints on the body. Acupressure has shown promise for treating fatigue in patients with cancer,23 and in a study24 of 43 cancer survivors with persistent fatigue, our group found that acupressure decreased fatigue by approximately 45% to 70%. Furthermore, acupressure points termed relaxing (for their use in TCM to treat insomnia) were significantly better at improving fatigue than another distinct set of acupressure points termed stimulating (used in TCM to increase energy).24 Despite such promise, only 5 small studies24– 28 have examined the effect of acupressure for cancer fatigue.

290px-Acupuncture_point_Hegu_(LI_4)You can learn more about acupressure here. It is a derivative of acupuncture, that does not involve needles, but the same acupuncture pressure points or acupoints as acupuncture.

Don’t be fooled by references to traditional Chinese medicine (TCM) as a basis for claiming a scientific mechanism.

See Chairman Mao Invented Traditional Chinese Medicine.

Chairman Mao is quoted as saying “Even though I believe we should promote Chinese medicine, I personally do not believe in it. I don’t take Chinese medicine.”

 

Alan Levinovitz, author of the Slate article further argues:

 

In truth, skepticism, empiricism, and logic are not uniquely Western, and we should feel free to apply them to Chinese medicine.

After all, that’s what Wang Qingren did during the Qing Dynasty when he wrote Correcting the Errors of Medical Literature. Wang’s work on the book began in 1797, when an epidemic broke out in his town and killed hundreds of children. The children were buried in shallow graves in a public cemetery, allowing stray dogs to dig them up and devour them, a custom thought to protect the next child in the family from premature death. On daily walks past the graveyard, Wang systematically studied the anatomy of the children’s corpses, discovering significant differences between what he saw and the content of Chinese classics.

And nearly 2,000 years ago, the philosopher Wang Chong mounted a devastating (and hilarious) critique of yin-yang five phases theory: “The horse is connected with wu (fire), the rat with zi (water). If water really conquers fire, [it would be much more convincing if] rats normally attacked horses and drove them away. Then the cock is connected with ya (metal) and the hare with mao (wood). If metal really conquers wood, why do cocks not devour hares?” (The translation of Wang Chong and the account of Wang Qingren come from Paul Unschuld’s Medicine in China: A History of Ideas.)

Trial design

A 10-week randomized, single-blind trial comparing self-administered relaxing acupressure with stimulating acupressure once daily for 6 weeks vs usual care with a 4-week follow-up was conducted. There were 5 research visits: at screening, baseline, 3 weeks, 6 weeks (end of treatment), and 10 weeks (end of washout phase). The Pittsburgh Sleep Quality Index (PSQI) and Long-Term Quality of Life Instrument (LTQL) were administered at baseline and weeks 6 and 10. The Brief Fatigue Inventory (BFI) score was collected at baseline and weeks 1 through 10.

Note that the trial was “single-blind.” It compared two forms of acupressure, relaxing versus stimulating. Only the patient was blinded to which of these two treatments was being provided, except patients clearly knew whether or not they were randomized to usual care. The providers were not blinded and were carefully supervised by the investigators and provided feedback on their performance.

The combination of providers not being blinded, patients knowing whether they were randomized to routine care, and subjective self-report outcomes together are the makings of a highly biased trial.

Interventions

Usual care was defined as any treatment women were receiving from health care professionals for fatigue. At baseline, women were taught to self-administer acupressure by a trained acupressure educator.29 The 13 acupressure educators were taught by one of the study’s principal investigators (R.E.H.), an acupuncturist with National Certification Commission for Acupuncture and Oriental Medicine training. This training included a 30-minute session in which educators were taught point location, stimulation techniques, and pressure intensity.

Relaxing acupressure points consisted of yin tang, anmian, heart 7, spleen 6, and liver 3. Four acupoints were performed bilaterally, with yin tang done centrally. Stimulating acupressure points consisted of du 20, conception vessel 6, large intestine 4, stomach 36, spleen 6, and kidney 3. Points were administered bilaterally except for du 20 and conception vessel 6, which were done centrally (eFigure in Supplement 2). Women were told to perform acupressure once per day and to stimulate each point in a circular motion for 3 minutes.

Note that the control/comparison condition was an ill-defined usual care in which it is not clear that patients received any attention and support for their fatigue. As I have discussed before, we need to ask just what was being controlled by this condition. There is no evidence presented that patients had similar positive expectations and felt similar support in this condition to what was provided in the two active treatment conditions. There is no evidence of equivalence of time with a provider devoted exclusively to the patients’ fatigue. Unlike patients assigned to usual care, patients assigned to one of the acupressure conditions received a ritual delivered with enthusiasm by a supervised educator.

Note the absurdity of the  naming of the acupressure points,  for which the authority of traditional Chinese medicine is invoked, not evidence. This absurdity is reinforced by a look at a diagram of acupressure points provided as a supplement to the article.

relaxation acupuncture pointsstimulation acupressure points

 

Among the many problems with “acupuncture pressure points” is that sham stimulation generally works as well as actual stimulation, especially when the sham is delivered with appropriate blinding of both providers and patients. Another is that targeting places of the body that are not defined as acupuncture pressure points can produce the same results. For more elaborate discussion see Can we finally just say that acupuncture is nothing more than an elaborate placebo?

 Worth looking back at credible placebo versus weak control condition

In a recent blog post   I discussed an unusual study in the New England Journal of Medicine  that compared an established active treatment for asthma to two credible control conditions, one, an inert spray that was indistinguishable from the active treatment and the other, acupuncture. Additionally, the study involved a no-treatment control. For subjective self-report outcomes, the active treatment, the inert spray and acupuncture were indistinguishable, but all were superior to the no treatment control condition. However, for the objective outcome measure, the active treatment was more effective than all of the three comparison conditions. The message is that credible placebo control conditions are superior to control conditions lacking and positive expectations, including no treatment and, I would argue, ill-defined usual care that lacks positive expectations. A further message is ‘beware of relying on subjective self-report measures to distinguish between active treatments and placebo control conditions’.

Results

At week 6, the change in BFI score from baseline was significantly greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.6 [1.5] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.1 [1.6] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P  = .29). At week 10, the change in BFI score from baseline was greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.3 [1.4] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.0 [1.5] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P > .99) (Figure 2). The mean percentage fatigue reductions at 6 weeks were 34%, 27%, and −1% in relaxing acupressure, stimulating acupressure, and usual care, respectively.

These are entirely expectable results. Nothing new was learned in this study.

The bottom line for this study is that there was absolutely nothing to be gained by comparing an inert placebo condition to another inert placebo condition to an uninformative condition without clear evidence the control condition offered control of nonspecific factors – positive expectations, support, and attention. This was a waste of patient time and effort, as well as government funds, and produced results that were potentially misleading to patients. Namely, results are likely to be misinterpreted the acupressure is an effective, evidence-based treatment for cancer-related fatigue.

How the authors explained their results

Why might both acupressure arms significantly improve fatigue? In our group’s previous work, we had seen that cancer fatigue may arise through multiple distinct mechanisms.15 Similarly, it is also known in the acupuncture literature that true and sham acupuncture can improve symptoms equally, but they appear to work via different mechanisms.40 Therefore, relaxing acupressure and stimulating acupressure could elicit improvements in symptoms through distinct mechanisms, including both specific and nonspecific effects. These results are also consistent with TCM theory for these 2 acupoint formulas, whereby the relaxing acupressure acupoints were selected to treat insomnia by providing more restorative sleep and improving fatigue and the stimulating acupressure acupoints were chosen to improve daytime activity levels by targeting alertness.

How could acupressure lead to improvements in fatigue? The etiology of persistent fatigue in cancer survivors is related to elevations in brain glutamate levels, as well as total creatine levels in the insula.15 Studies in acupuncture research have demonstrated that brain physiology,41 chemistry,42 and function43 can also be altered with acupoint stimulation. We posit that self-administered acupressure may have similar effects.

Among the fallacies of the authors’ explanation is the key assumption that they are dealing with a specific, active treatment effect rather than a nonspecific placebo intervention. Supposed differences between relaxing versus stimulating acupressure arise in trials with a high risk of bias due to unblinded providers of treatment and inadequate control/comparison conditions. ‘There is no there there’ to be explained, to paraphrase a quote attributed to Gertrude Stein

How much did this project cost?

 According to the NIH Research Portfolios Online Reporting Tools website, this five-year project involved support by the federal government of $2,265,212 in direct and indirect costs. The NCI program officer for investigator-initiated  R01CA151445 is Ann O’Marawho serves ina similar role for a number of integrative medicine projects.

How can expenditure of this money be justified for determining whether so-called stimulating acupressure is better than relaxing acupressure for cancer-related fatigue?

 Consider what could otherwise have been done with these monies.

 Evidence-based versus science based medicine

Proponents of unproven “integrative cancer treatments” can claim on the basis of the study the acupressure is an evidence-based treatment. Future Cochrane Collaboration Reviews may even cite this study as evidence for this conclusion.

I normally label myself as an evidence-based skeptic. I require evidence for claims of the efficacy of treatments and am skeptical of the quality of the evidence that is typically provided, especially when it comes from enthusiasts of particular treatments. However, in other contexts, I describe myself as a science based medicine skeptic. The stricter criteria for this term is that not only do I require evidence of efficacy for treatments, I require evidence for the plausibility of the science-based claims of mechanism. Acupressure might be defined by some as an evidence-based treatment, but it is certainly not a science-based treatment.

For further discussion of this important distinction, see Why “Science”-Based Instead of “Evidence”-Based?

Broader relevance to psychotherapy research

The efficacy of psychotherapy is often overestimated because of overreliance on RCTs that involve inadequate comparison/control groups. Adequately powered studies of the comparative efficacy of psychotherapy that include active comparison/control groups are infrequent and uniformly provide lower estimates of just how efficacious psychotherapy is. Most psychotherapy research includes subjective patient self-report measures as the primary outcomes, although some RCTs provide independent, blinded interview measures. A dependence on subjective patient self-report measures amplifies the bias associated with inadequate comparison/control groups.

I have raised these issues with respect to mindfulness-based stress reduction (MBSR) for physical health problems  and for prevention of relapse in recurrence in patients being tapered from antidepressants .

However, there is a broader relevance to trials of psychotherapy provided to medically ill patients with a comparison/control condition that is inadequate in terms of positive expectations and support, along with a reliance on subjective patient self-report outcomes. The relevance is particularly important to note for conditions in which objective measures are appropriate, but not obtained, or obtained but suppressed in reports of the trial in the literature.

Getting realistic about changing the direction of suicide prevention research

A recent JAMA: Psychiatry article makes some important points about the difficulties addressing suicide as a public health problem before sliding into the authors’ promotion of their personal agendas.

Christensen H, Cuijpers P, Reynolds CF. Changing the Direction of Suicide Prevention Research: A Necessity for True Population Impact. JAMA Psychiatry. 2016.

This issue of Mind the Brain:

  • Reviews important barriers to effective approaches to reducing suicide, as cited in the editorial.
  • Discusses editorials in general as a form of privileged access publishing by which non-peer-reviewed material makes its way into ostensibly peer reviewed journals.
  • Identifies the self-promotional and personal agendas of the authors reflected in the editorial.
  • Notes that the leading means of death by suicide in the United States is not even mentioned, much less addressed in this editorial. I’ll discuss the politics behind this and why its absence reduces this editorial to a venture in triviality, except that it is a call for the waste of millions of dollars.

Barriers to reducing mortality by suicide

stop suicidePrevention of death by suicide becomes an important public health and clinical goal because of suicide’s contribution to overall mortality, the seeming senselessness of suicide, and its costs at a personal and social level. Yet as a relatively infrequent event, death by suicide resists prediction and effective preventive intervention.

Evidence concerning the formidable barriers to reducing death by suicide inevitably clashes with the strong emotional appeals and political agendas of those demanding suicide intervention programs.

Skeptics encounter stiff resistance and even vilification when they insist that clinical and social policy concerning suicide should be based on evidence.

Robin WilliamsA skeptic soon finds that trying to contest emotional and political appeals quickly becomes like trying to counter Ted Cruz or Donald Trump with evidence contradicting their proposals for dealing with terrorism or immigration. This is particularly likely after suicides by celebrities or a cluster of suicides by teenagers in a community. Who wants to pay attention to evidence when emotions are high and tears are flowing?

See my recent blog post, Preventing Suicide in All the Wrong Ways for some inconvenient truths about suicide and suicide prevention.

The JAMA: Psychiatry article’s identification of barriers

The JAMA: Psychiatry article identifies some key barriers to progress in reducing deaths due to suicide [bullet points added to direct quotes]:

  • Suicide rates in most Western countries have not decreased in the last decade, a finding that compares unfavorably with the progress made in other areas, such as breast and skin cancers, human immunodeficiency virus, and automobile accidents, for which the rates have decreased by 40% to 80%.
  • Preventing suicide is not easy. The base rate of suicide is low, making it hard to determine which individuals are at risk.
  • Our current approach to the epidemiologic risk factors has failed because prediction studies have no clinical utility—even the highest odds ratio is not informative at the individual level.
  • Decades of research on predicting suicides failed to identify any new predictors, despite the large numbers of studies.
  • A previous suicide attempt is our best marker of a future attempt, but 60% of suicides are by persons who had made no previous attempts.
  • Although recent studies in cognitive neuroscience have shed light on the cognitive “lesions” that underlie suicide risk, especially deficits in executive functioning, we have no biological markers of suicide risk, or indeed of any mental illness.
  • People at risk of suicide do not seek help. Eighty percent of people at risk have been in contact with health services prior to their attempts, but they do not identify themselves, largely because they do not think that they need help.
  • As clinicians, we know something about the long-term risk factors for suicide, but we are much less able to disambiguate short-term risk or high-risk factors from the background of long-term risk factors.

How do editorials come about? Not peer review!

 Among the many privileges of being editor-in-chief or associate editors of journals is the opportunity to commission articles that do not undergo peer review. Editors and their friends are among the regular recipients of these gifts that largely escape scrutiny.

Editorials often provide a free opportunity for self-citation and promotion of agenda. Over the years, I’ve noticed that editorials are frequently used to increase the likelihood that particular research topics will become a priority for funding for the particular ideas will be given advantage in competition for funding.

Editorials of great opportunities for self citation. If an editorial in a prestigious journal cites articles published in less prestigious places, readers will often cite the article, without bothering to examine the original source. This is a way of providing false authority  to poor quality or irrelevant evidence.

Not only do authors of commissioned articles get to say what they wish without peer review, they can restrict what can be said in reply. Journals are less willing to publish letters to the editor about editorials rather than empirical papers. They often give the writers of the editorial veto power over what criticism is published. Journals always give the writers of the editorial last word in any exchange.

So, editorials and commentaries can be free sweet plums if you know how to use them strategically.

The authors

Helen Christensen, PhD Black Dog Institute, University of New South Wales, Randwick, New South Wales, Australia.

Pim Cuijpers, PhD Department of Clinical, Neuro, and Developmental Psychology, Vrije Universiteit Amsterdam, the Netherlands

Charles F. Reynolds III, MD Department of Psychiatry and Neurology, Western Psychiatric Institute and Clinic, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania.

The authors’ agendas

Helen Christianson

Helen Christianson is the Chief Scientist and Director of the Black Dog Institute, which is described at its website:

Our unique approach incorporates clinical services with our cutting-edge research, our health professional training and community education programs. We combine expertise in clinical management with innovative research to develop new, and more effective, strategies for people living with mental illness. We also place emphasis on teaching people to recognise the symptom of poor mental health in themselves and others, as well as providing them with the right psychological tools to hold the black dog at bay.

A key passage in the JAMA: Psychiatry editorial references her work.

Modeling studies have shown that if all evidence-based suicide prevention strategies were integrated into 1 multifaceted systems approach, about 20% to 25% of all suicides might be prevented.

Here is the figure from the editorial:

suicide prevenino strategies

The paper that is cited  would be better characterized as an advocacy piece, rather than a balanced systematic review.

Most fundamentally, Christiansen makes the mistake of summing attributable risk factors  to obtain a grand total of what would be accomplished if all of  a set of risk factors were addressed.

The problem is that attributable risk factors are dubious estimates derived from correlational analyses which assume that the entire correlation coefficient represents a modifiable risk. Such estimates ignore confounding. If one adds together attributable risk factors calculated in this manner, one gets a grossly inflated view of how much a phenomenon can be controlled. The attributable risk factors are themselves correlated and they share common confounds. That’s why it is bad science to combine them.

Christiansen identifies the top three modifiable risk for suicide as (1) training general practitioners in detection and treatment of suicidal risk, and notably depression; (2) training of gatekeepers such as school personnel, police, (and in some contexts, clergy) who might have contact with persons on the verge of dying by suicide; and (3) psychosocial treatments, namely psychotherapy.

Training of general practitioners and gatekeepers has not been shown to be an effective way of reducing rates of suicide. #Evidenceplease. I’ve been an external scientific advisor to over a decade of programs in Europe which emphasized these strategies. We will soon be publishing the last of our disappointing results.

Think of it: in order to be effective in averting death by suicide, training of police requires that police be on the scene in circumstances where they could use that training to prevent someone from dying by suicide, say, by jumping from a bridge or self-inflicted gun wounds. The likelihood is low that it would be a police officer with sufficient training being in the right place at the right time, with sufficient time and control of the situation to prevent a death. A police officer who had received training would unlikely encounter only a few, if any situations in an entire career.

The problem of death by suicide being an infrequent event that is poorly predicted again rears its ugly head.

Christiansen also makes a dubious assumption that more readily availability of psychotherapy will substantially reduce the risk of suicide. The problem is that persons who die by suicide are often in contact with professionals, but they either break the contact shortly before death or never disclose their intentions.

Christiansen provides a sizable estimate for the reduction in risk for suicide by means restriction.

]. Yet, I suspect that she underestimates the influence of this potentially modifiable factor.

She focuses on restricting access to prescription medications used in suicides by overdose. I don’t know if death-by-overdose data holds for even Australia, but the relevant means needing restriction in the United States is access to firearms. I will say more about that later.

So, Christiansen makes use of the editorial to sell her pet ideas and her institute markets training.

Pim Cuijpers

Pim Cuijpers doesn’t cite himself and doesn’t need to. He is rapidly accumulating a phenomenal record of publications and citations. But he is an advocate for large-scale programs incorporating technology, and notably the Internet to reduce suicide. His interests are reflected in passages like

Large-scale trials are also needed. Even if we did all of these things, large-scale research programs with millions of people are required, and technology by itself will not be enough. Although new large trials show that the effects of community programs can be effective,1,6 studies need to be bigger, combining all evidence-based medical and community strategies, using technology effectively to reduce costs of identification and treatment.

And

Help-seeking may well be assisted by using social media. Online social networks such as Facebook can be used to provide peer support and to change community attitudes in the ways already used by marketing industries. We can use the networks of “influencers” to modify attitudes and behavior in specific high-risk groups, such as the military, where suicide rates are high, or “captive audiences” in schools.

Disseminating effective programs is no longer difficult using online mental health programs. Although some early suicide apps and websites have been tested, better online interventions are needed that can respond to temporal fluctuations in suicide risk. The power of short-term prediction tools should be combined with the timely delivery of unobtrusive online or app personalized programs. However, if these development are not supported by government or industry and implemented at a population level, they will remain missed opportunities.

suicide is preventable
100% PREVENTABLE BY WHOM?

Pim Cuijpers is based the Netherlands and writing at a time when enthusiasm of  the European Research Council  is waning in funding large-scale suicide prevention programs, especially expensive ones requiring millions of participants. Such studies have been going on for over a decade and the yield is not impressive.

The projects on which I consulted adopted the reasonable assumption that because suicide is a rare event, a population of 500,000 would not be sufficient to detect a statistically significant reduction in suicide rates of less than 30%. Consider all the extraneous events that can impinge on comparisons between intervention and control sites in the time period in which the intervention could conceivably be influential. this is too low an estimate of the sample that would be needed.

The larger the sample, the greater the likelihood of extraneous influences, the greater the likelihood that the intervention wouldn’t prove effective at key moments when it was needed to avert a death by suicide, and the greater the cost. See more about this here.

Pim Cuijpers has been quite influential in developing in evaluating web-based and app-based interventions. But after initial enthusiasm, the field is learning that such resources are not effective if left unattended without users being provided with a sense that they are in some sort of a human relationship within which they are consistent use of this technology is being monitored and appreciated, as seen in appropriate feedback. Pim Cuijpers has contributed the valuable concept of supportive accountability.  I have borrowed it to explain what is missing when primary care physicians simply give depressed patients a password to an Internet program and leave it at that, expecting they will get any benefit.

Evaluations of such technology have been limited to whether they reduce depressive symptoms. There is as much a leap from evidence of such reductions, when they occur, claims about preventing suicide, as there is from leaping from evidence that psychotherapy reduces the depressive symptoms to a case that psychotherapy prevents suicide.

Enlisting users of Facebook to monitor and report expressions of suicidality is not evidence based, It is evaluated by some as a disaster and a consumer group is circulating a petition   demanding  that such practices stop. A critical incident  was

man gets arrested for fake suicide messageCharles F. Reynolds

Although Charles Reynolds does not reference his paper in the text of the editorial, but nonetheless cites it.

I have critiqued the study elsewhere. It was funded in a special review only because of political pressure from Senator Harry Reid. The senator’s father had died by suicide shortly after a visit to a primary care physician. Harry Reid required that Congress fund a study showing that improving the detection and treatment of suicidality in the elderly by primary care physicians would reduce suicide.

I was called by an NIMH program officer when I failed to submit a letter of intent concerning applying for that initiative. I told her it was a boondoggle because no one could show a reduction in suicides by targeting physician behavior. She didn’t disagree, but said a project would have to funded. She ended up a co-author on the PROSPECT paper. You don’t often see program officers getting authorship on papers from projects they fund.

The resulting PROSPECT study involved 20 primary care practices in three regions of the Northeastern United States. In the course of the intervention study, one patient in the intervention group died by suicide and two patients, one in each of the intervention and control group, made serious attempts. A multimillion dollar study confronted the low incidence of suicide, even among the elderly. Furthermore, the substantial baseline differences among the practices dwarfed any differences in suicidal ideation in the intervention versus control group. And has of discussed elsewhere [  ], suicidal ideation is a surrogate end point that can be changed by factors that do not alter risk for suicide. No one advocating more money for these kind of studies would want to get into the details of this one.

 

So, the editorial acknowledges the difficulties studying and preventing suicide as a public health issue. It suggests that an unprecedented large study costing millions of dollars would be necessary if progress is to be made. There are formidable barriers to implementing an intervention in a large population of the complexity of the editorial suggests is necessary. Just look at the problems that PROSPECT encountered.

Who will set the direction of suicide prevention research?

The editorial opens with a citation of a blog by the then Director of NIMH

Insel T. Director’s Blog: Targeting suicide. National Institutes of Health website. Posted April 2, 2015.

The blog calls for a large increase in funding for the research concerning suicide and its prevention. The definition of the problem is shaped by politics more than evidence. But at least the blog post is more candid than the editorial in making a passing reference to the leading means of suicide in the United States, firearms.

51 percent of suicide deaths in the U.S. were by firearms. Research has already demonstrated that reducing access to lethal means (including gun locks and barriers on bridges) can reduce death rates.

Great, but surely death by firearms deserves more mentioned than a passing reference to locks on guns if the Director of NIMH is serious about asking Congress for a massive increase in funding for suicide research. Or is he being smart in avoiding the issue and even brave in the passing reference that he makes to firearms?

Firearms deserve not only mention, but thoughtful analysis. But in the United States, it is politically dangerous and could threaten future funding. So we talk about other things.

Banning research on the role of firearms in suicide

For a source that is much more honest, evidence-based, and well argued than this JAMA: Psychiatry editorial, I recommend A Psychiatrist Debunks the Biggest Myths Surrounding Gun Suicides.

In 1996, Congress imposed a ban on research concerning the effects of gun ownership on public health, including suicide.

In the spring of 1996, the National Rifle Association and its allies set their sights on the Centers for Disease Control and Prevention for funding increasingly assertive studies on firearms ownership and the effects on public health. The gun rights advocates claimed the research veered toward advocacy and covered such logical ground as to be effectively useless.

At first, the House tried to close down the CDC’s entire, $46 million National Center for Injury Prevention. When that failed, [Congressman Jay Dickey to whom the Dickey amendment is named] Dickey stepped in with an alternative: strip $2.6 million that the agency had spent on gun studies that year. The money would eventually be re-appropriated for studies unrelated to guns. But the far more damaging inclusion was language that stated, “None of the funds made available for injury prevention and control at the Centers for Disease Control and Prevention may be used to advocate or promote gun control.”

Dickey proclaimed victory — an end, he said at the time, to the CDC’s attempts “to raise emotional sympathy” around gun violence. But the agency spent the subsequent years petrified of doing any research on gun violence, making the costs of the amendment clear even to Dickey himself.

He said the law was over-interpreted. Now, he looks at simple advances in highway safety — safety barriers, for example — and wonders what could have been done for guns.

The Dickey amendment does not specifically ban NIMH from investigating the role of firearms in suicide, but I think Tom Insel and all NIMH directors before and after him get the message.

Recently an effort to repeal the Dickey amendment failed:

Just hours before the mass shooting in San Bernardino on Wednesday, physicians gathered on Capitol Hill to demand an end to the Dickey Amendment restricting federal funding for gun violence research. Members of Doctors for America, the American College of Preventative Medicine, the American Academy of Pediatrics and others presented a petition against the research ban signed by more than 2,000 doctors.

“Gun violence is probably the only thing in this country that kills so many people, injures so many people, that we are not actually doing sufficient research on,” Dr. Alice Chen, the executive director of Doctors for America, told The Huffington Post.

Well over half a million people have died by firearms since 1996, when the ban on gun violence research was enacted, according to a HuffPost calculation of data through 2013 from Centers for Disease Control and Prevention. According to its sponsors, the Dickey Amendment was supposed to tamp down funding for what the National Rifle Association and other critics claimed was anti-gun advocacy research by the CDC’s National Center for Injury Prevention. In effect, it stopped federal gun violence research almost entirely.

So, why didn’t the Associate Editor of the JAMA: Psychiatry, Charles Reynolds exercise his editorial prerogative and support this effort to repeal the Dickey amendment, rather than lining up with his co-authors in a call for more wasteful research that avoids this important issue?

Effect of a missing clinical trial on what we think about cognitive behavior therapy

  • Data collection for a large, well-resourced study of cognitive behavior therapy (CBT) for psychosis was completed years ago, but the study remains unpublished.
  • Its results could influence the overall evaluation of CBT versus alternative treatments if integrated with what is already known.
  • Political considerations can determine whether completed psychotherapy studies get published or remain lost.
  • This rich example demonstrates the strong influence of publication bias on how we assess psychotherapies.
  • What can be done to reduce the impact of this particular study having gone missing?

A few years ago Ben Goldacre suggested that we do a study of the registration of clinical trials.

lets'collaborate

I can’t remember the circumstances, but Goldacre and I did not pursue the idea further. I was already committed to studying psychological interventions, in which Goldacre was much less interested. Having battled to get American Psychological Association to fully accept and implement CONSORT in its journals, I was well aware how difficult it was getting the professional organizations offering the prime outlets for psychotherapy studies to accept needed reform. I wanted to stay focused on that.

I continue to follow Goldacre’s work closely and cite him often. I also pay particular attention to John Ioannidis’ follow up of his documentation that much of what we found in the biomedical literature is false or exaggerated, like:

Ioannidis JP. Clinical trials: what a waste. BMJ. 2014 Dec 10;349:g7089

Many trials are entirely lost, as they are not even registered. Substantial diversity probably exists across specialties, countries, and settings. Overall, in a survey conducted in 2012, only 30% of journal editors requested or encouraged trial registration.

In a seeming parallel world, I keep showing that in psychology the situation is worse. I had a simple explanation why that I now recognize was naïve: Needed reforms enforced by regulatory bodies like the US Food and Drug Administration (FDA) take longer to influence the psychotherapy literature, where there are no such pressures.

I think we now know that in both biomedicine and, again, psychology, that broad declarations of government and funding bodies and even journals’ of a commitment to disclose a conflict of interest, registering trials, sharing data, are insufficient to ensure that the literature gets cleaned up.

Statements were published across 14 major medical journals endorsing routine data sharing]. Editors of some of the top journals immediately took steps to undermine the implementation in their particular journals. Think of the specter of “research parasites, raised by the editors of New England Journal of Medicine (NEJM).

Another effort at reform

Following each demonstration that reforms are not being implemented, we get more pressures to do better. For instance, the 2015 World Health Organization (WHO) position paper:

Rationale for WHO’s New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results

WHO’s 2005 statement called for all interventional clinical trials to be registered. Subsequently, there has been an increase in clinical trial registration prior to the start of trials. This has enabled tracking of the completion and timeliness of clinical trial reporting. There is now a strong body of evidence showing failure to comply with results-reporting requirements across intervention classes, even in the case of large, randomised trials [37]. This applies to both industry and investigator-driven trials. In a study that analysed reporting from large clinical trials (over 500 participants) registered on clinicaltrials.gov and completed by 2009, 23% had no results reported even after a median of 60 months following trial completion; unpublished trials included nearly 300,000 participants [3]. Among randomised clinical trials (RCTs) of vaccines against five diseases registered in a variety of databases between 2006–2012, only 29% had been published in a peer-reviewed journal by 24 months following study completion [4]. At 48 months after completion, 18% of trials were not reported at all, which included over 24,000 participants. In another study, among 400 randomly selected clinical trials, nearly 30% did not publish the primary outcomes in a journal or post results to a clinical trial registry within four years of completion [5].

Why is this a problem?

  • It affects understanding of the scientific state of the art.

  • It leads to inefficiencies in resource allocation for both research and development and financing of health interventions.

  • It creates indirect costs for public and private entities, including patients themselves, who pay for suboptimal or harmful treatments.

  • It potentially distorts regulatory and public health decision making.

Furthermore, it is unethical to conduct human research without publication and dissemination of the results of that research. In particular, withholding results may subject future volunteers to unnecessary risk.

How the psychotherapy literature is different from a medical literature.

Unfortunately for the trustworthiness of the psychotherapy literature, the WHO statement is limited to medical interventions. We probably won’t see any direct effects on the psychotherapy literature anytime soon.

The psychotherapy literature has all the problems in implementing reforms that we see in biomedicine – and more. Professional organizations like the American Psychological Association and British Psychological Society publishing psychotherapy research have the other important function of ensuring their clinical membership developer’s employment opportunities. More opportunities for employment show the organizations are meeting their members’ needs this results in more dues-paying members.

The organizations don’t want to facilitate third-party payers citing research that particular interventions that their membership is already practicing are inferior and need to be abandoned. They want the branding of members practicing “evidence-based treatment” but not the burden of members having to make decisions based on what is evidence-based. More basically, psychologists’ professional organizations are cognizant of the need to demonstrate a place in providing services that are reimbursed because they improve mental and physical health. In this respect, they are competing with biomedical interventions for the same pot of money.

So, journals published by psychological organizations have vested interests and not stringently enforcing standards. The well-known questionable research practices of investigators are strengthened by questionable publication practices, like confirmation bias, that are tied to the organizations’ institutional agenda.

And the lower status journals that are not published by professional organizations may compromise their standards for publishing psychotherapy trials because of the status that having these articles confers.

Increasingly, medical journals like The Lancet and The Lancet Psychiatry are seen as more prestigious for publishing psychotherapy trials, but they take less seriously the need to enforce standards for psychotherapy studies the regulatory agencies require for biomedical interventions. Example: The Lancet violated its own policies and accepted publication Tony Morrison’s CBT for psychosis study  for publication when it wasn’t registered until after the trial and started. The declared outcomes were vague enough so they could be re-specified after results were known .

Bottom line, in the case of publishing all psychotherapy trials consistent with published protocols: the problem is taken less seriously than if it were a medical trial.

Overall, there is less requirement for psychotherapy trials be registered and less attention paid by editors and reviewers as to whether trials were registered, and whether outcomes are analytic plans were consistent between the registration in the published study.

In a recent blog post, I identified results of a trial that had been published with switched outcomes and then re-published in another paper with different outcomes, without the registration even being noted.

But for all the same reasons cited by the recent WHO statement, publication of all psychotherapy trials matters.

archaeologist digging for goldRecovering an important CBT trial gone missing

I am now going to review the impact of a large, well resourced study of CBT for psychosis remaining on published. I identified the study by a search of the ISRCTN:

The ISRCTN registry is a primary clinical trial registry recognised by WHO and ICMJE that accepts all clinical research studies (whether proposed, ongoing or completed), providing content validation and curation and the unique identification number necessary for publication. All study records in the database are freely accessible and searchable.

I then went back to the literature to see what it happened with it. Keep in mind that this step is not even possible for the many psychotherapy trials that are simply not registered at all.

Many trials are not registered because they are considered pilot and feasibility studies and therefore not suitable for entering effect sizes into the literature. Yet, if significant results are found, they will be exaggerated because they come from an underpowered study. And such results become the basis for entering results into the literature as if it were a planned clinical trial, with considerable likelihood of not being able to be replicated.

There are whole classes of clinical and health psychology interventions that are dominated by underpowered, poor quality studies that should have been flagged as for evidence or excluded altogether. So, in centering on this trial, I’m picking an important example because it was available to be discovered, but there is much of their there is not available to be discovered, because it was not registered.

CBT versus supportive therapy for persistent positive symptoms in psychotic disorders

The trial registration is:

Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders SRCTN29242879DOI 10.1186/ISRCTN29242879

The trial registration indicates that recruitment started on January 1, 2007 and ended on December 31, 2008.

No publications are listed. I and others have sent repeated emails to the principal investigator inquiring about any publications and have failed to get a response. I even sent a German colleague to visit him and all he would say was that results were being written up. That was two years ago.

Google Scholar indicates the principal investigator continues to publish, but not the results of this trial.

A study to die for

The study protocol is available as a PDF

Klingberg S, Wittorf A, Meisner C, Wölwer W, Wiedemann G, Herrlich J, Bechdolf A, Müller BW, Sartory G, Wagner M, Kircher T. Cognitive behavioural therapy versus supportive therapy for persistent positive symptoms in psychotic disorders: The POSITIVE Study, a multicenter, prospective, single-blind, randomised controlled clinical trial. Trials. 2010 Dec 29;11(1):123.

The methods section makes it sound like a dream study with resources beyond what is usually encountered for psychotherapy research. If the protocol is followed, the study would be an innovative, large, methodologically superior study.

Methods/Design: The POSITIVE study is a multicenter, prospective, single-blind, parallel group, randomised clinical trial, comparing CBT and ST with respect to the efficacy in reducing positive symptoms in psychotic disorders. CBT as well as ST consist of 20 sessions altogether, 165 participants receiving CBT and 165 participants receiving ST. Major methodological aspects of the study are systematic recruitment, explicit inclusion criteria, reliability checks of assessments with control for rater shift, analysis by intention to treat, data management using remote data entry, measures of quality assurance (e.g. on-site monitoring with source data verification, regular query process), advanced statistical analysis, manualized treatment, checks of adherence and competence of therapists.

The study was one of the rare ones providing for systematic assessments of adverse events and any harm to patients. Preumably if CBT is powerful enough to affect positive change, it can have negative effects as well. But these remain entirely a matter of speculation.

Ratings of outcome were blinded and steps were taken to preserve the blinding even if an adverse event occurred. This is important because blinded trials are less susceptible to investigator bias.

Another unusual feature is the use of a supportive therapy (ST) credible, but nonspecific condition as a control/comparison.

ST is thought as an active treatment with respect to the patient-therapist relationship and with respect to therapeutic commitment [21]. In the treatment of patients suffering from psychotic disorders these ingredients are viewed to be essential as it has been shown consistently that the social network of these patients is limited. To have at least one trustworthy person to talk to may be the most important ingredient in any kind of treatment. However, with respect to specific processes related to modification of psychotic beliefs, ST is not an active treatment. Strategies specifically designed to change misperceptions or reasoning biases are not part of ST.

Use of this control condition allows evaluation of the important question of whether any apparent effects of CBT are due to the active ingredients of that approach or to the supportive therapeutic relationship within which the active ingredients are delivered.

Being able to rule out the effects of CBT are due to nonspecific effects justifies the extra resources needed to provide specialized training in CBT, if equivalent effects are obtained in the ST group, it suggests that equivalent outcomes can be achieved simply by providing more support to patients, presumably by less trained and maybe even lay personnel.

It is a notorious feature of studies of CBT for psychosis that they lack comparison/control groups in any way equivalent to the CBT in terms of nonspecific intensity, support, encouragement, and positive expectations. Too often, the control group are ill-defined treatment as usual (TAU) that lacks regular contact and inspires any positive expectations. Basically CBT is being compared to inadequate treatment and sometimes no treatment and so any apparent effects that are observed are due to correcting these inadequacies, not any active ingredient.

The protocol hints in passing at the investigators’ agenda.

This clinical trial is part of efforts to intensify psychotherapy research in the field of psychosis in Germany, to contribute to the international discussion on psychotherapy in psychotic disorders, and to help implement psychotherapy in routine care.

Here we see an aim to justify implementation of CBT for psychosis in routine care in Germany. We have seen something similar with repeated efforts of German to demonstrate that long-term psychodynamic psychotherapy is more effective than shorter, less expensive treatments, despite the lack of credible data [ ].

And so, if the results would not contribute to getting psychotherapy implemented in routine care in Germany, do they get buried?

Science & Politics of CBT for Psychosis

A rollout of a CBT study for psychosis published in Lancet made strong claims in a BBC article and audiotape promotion.

morroson slide-page-0

 

 

 

The attention attracted critical scrutiny that these claims couldn’t sustain. After controversy on Twitter, the BBC headline was changed to a more modest claim.

Criticism mounted:

  • The study retained fewer participants receiving CBT at the end of the study than authors.
  • The comparison treatment was ill-defined, but for some patients meant no treatment because they were kicked out of routine care for refusing medication.
  • A substantial proportion of patients assigned to CBT began taking antipsychotic medication by the end of the study.
  • There was no evidence that the response to CBT was comparable to that achieved with antipsychotic medication alone in clinical trials.
  • No evidence that less intensive, nonspecific supportive therapy would not have achieved the same results as CBT.

And the authors ended up conceding in a letter to the editor that their trial had been registered after data collection had started and it did not produce evidence of equivalence to antipsychotic medication.

In a blog post containing the actual video of the presentation before his British Psychological Society, Keith Laws declares

Politics have overcome the science in CBT for psychosis

Recently the British Psychological Society invited me to give a public talk entitled CBT: The Science & Politics behind CBT for Psychosis. In this talk, which was filmed…, I highlight the unquestionable bias shown by the National Institute of Clinical Excellence (NICE) committee  (CG178) in their advocacy of CBT for psychosis.

The bias is not concealed, but unashamedly served-up by NICE as a dish that is high in ‘evidence-substitute’, uses data that are past their sell-by-date and is topped-off with some nicely picked cherries. I raise the question of whether committees – with such obvious vested interests – should be advocating on mental health interventions.

I present findings from our own recent meta-analysis (Jauhar et al 2014) showing that three-quarters of all RCTs have failed to find any reduction in the symptoms of psychosis following CBT. I also outline how trials which have used non-blind assessment of outcomes have inflated effect sizes by up to 600%. Finally, I give examples where CBT may have adverse consequences – both for the negative symptoms of psychosis and for relapse rates.

A pair of well-conducted and transparently reported Cochrane reviews suggest there is little evidence for the efficacy of CBT for psychosis (*)

cochrane slide-page-0                          cochrane2-page-0

 

These and other slides are available in a slideshow presentation of a talk I gave at the Edinburgh Royal  Infirmary.

Yet, even after having to be tempered in the face of criticism, the original claims of the Morrison study get echoed in the antipsychiatry Understanding Psychosis:

“Other forms of therapy can also be helpful, but so far it is CBTp that has been most intensively researched. There have now been several meta-analyses (studies using a statistical technique that allows findings from various trials to be averaged out) looking at its effectiveness. Although they each yield slightly different estimates, there is general consensus that on average, people gain around as much benefit from CBT as they do from taking psychiatric medication.”

Such misinformation can confuse patients making difficult decisions about whether to accept antipsychotic medication.

go on without mejpgIf the results from the missing CBT for psychosis study became available…

If the Klingberg study were available and integrated with existing data, it would be one of the largest and highest quality studies and it would provide insight into any advantage of CBT for psychosis. For those who can be convinced by data, a null finding from a large studythat added to mostly small and methodologically unsophisticated studies could be decisive.

A recent meta-analysis of CBT for prevention of psychosis by Hutton and Taylor includes six studies and mentions the trial protocol in passing:

Two recent trials of CBT for established psychosis provide examples of good practice for reporting harms (Klingberg et al. 20102012) and CONSORT (Consolidated Standards of Reporting Trials) provide a sensible set of recommendations (Ioannidis et al. 2004).

Yet, it does not provide indicate why it is missing and is not included in a list of completed but unpublished studies. Yet, the protocol indicates a study considerably larger than any of the studies that were included.

To communicate a better sense of the potential importance of this missing study and perhaps place more pressures on the investigators to release its results, I would suggest that future meta-analyses state:

The protocol for Klingberg et al. Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders indicates that recruitment was completed in 2008. No publications have resulted. Emails to Professor Klingberg about the status of the study failed to get a response. If the study were completed consistent with its protocol, it would represent one of the largest studies of CBT for psychosis ever and one of the few with a fair comparison between CBT and supportive therapy. Inclusion of the results could potentially substantially modify the conclusions of the current meta-analysis.

 

Was independent peer review of the PACE trial articles possible?

I ponder this question guided by Le Chavalier C. Auguste Dupin, the first fictional detective, before anyone was called “detective.”

mccartney too manyArticles reporting the PACE trial have extraordinary numbers of authors, acknowledgments, and institutional affiliations. A considerable proportion of all persons and institutions involved in researching chronic fatigue and related conditions in the UK have a close connection to PACE.

This raises issues about

  • Obtaining independent peer review of these articles that is not tainted by reviewer conflict of interest.
  • Just what authorship on a PACE trial paper represents and whether granting of authorship conforms to international standards.
  • The security of potential critics contemplating speaking out about whatever bad science they find in the PACE trial articles. The security of potential reviewers who are negative and can be found out. Critics within the UK risk isolation and blacklisting from a large group who have investments in what could be exaggerated estimates of the quality and outcome of PACE trial.
  • Whether grants associated with multimillion pound PACE study could have received the independent peer review that is so crucial to assuring that proposals selected to be funded are of the highest quality.

Issues about the large number of authors, acknowledgments, and institutional affiliations become all the more salient as critics [1, 2, 3] find again serious flaws inthe conduct and the reporting of the Lancet Psychiatry 2015 long-term follow-up study. Numerous obvious Questionable Research Practices (QRPs) survived peer review. That implies at least ineptness in peer review or even Questionable Publication Practices (QPPs).

The important question becomes: how is the publication of questionable science to be explained?

Maybe there were difficulties finding reviewers with relevant expertise who were not in some way involved in the PACE trial or affiliated with departments and institutions that would be construed as benefiting from a positive review outcome, i.e. a publication?

Or in the enormous smallness of the UK, is independent peer review achieved by persons putting those relationships and affiliations aside to produce an impeccably detached and rigorous review process?

The untrustworthiness of both the biomedical and psychological literatures are well-established. Nonpharmacological interventions have fewer safeguards than drug trials, in terms of adherence to preregistration, reporting standards like CONSORT, and enforcement of sharing of data.

Open-minded skeptics should be assured of independent peer review of nonpharmacological clinical trials, particularly when there is evidence that persons and groups with considerable financial interests attempt to control what gets published and what is said about their favored interventions. Reviewers with potential conflicts of interest should be excluded from evaluation of manuscripts.

Independent peer review of the PACE trial by those with relevant expertise might not be possible the UK where much of the conceivable expertise is in some way directly or indirectly attached to the PACE trial.

A Dutch observer’s astute observations about the PACE articles

My guest blogger Dutch research biologist Klaas van Dijk  called attention to the exceptionally large number of authors and institutions listed for a pair of PACE trial papers.

klaasKlaas noted

The Pubmed entry for the 2011 Lancet paper lists 19 authors:

B J Angus, H L Baber, J Bavinton, M Burgess, T Chalder, L V Clark, D L Cox, J C DeCesare, K A Goldsmith, A L Johnson, P McCrone, G Murphy, M Murphy, H O’Dowd, PACE trial management group*, L Potts, M Sharpe, R Walwyn, D Wilks and P D White (re-arranged in an alphabetic order).

The actual article from the Lancet website ( http://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(11)60096-2.pdf and also http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext ) lists 19 authors who are acting ‘on behalf of the PACE trial management group†’. But the end of the paper (page 835) states: “PACE trial group.” This term is not identical to “PACE trial management group”.
.
In total, another 19 names are listed under “PACE trial group” (page 835): Hiroko Akagi, Mansel Aylward, Barbara Bowman Jenny Butler, Chris Clark, Janet Darbyshire, Paul Dieppe, Patrick Doherty, Charlotte Feinmann, Deborah Fleetwood, Astrid Fletcher, Stella Law, M Llewelyn, Alastair Miller, Tom Sensky, Peter Spencer, Gavin Spickett, Stephen Stansfeld and Alison Wearden (re-arranged in an alphabetic order).

There is no overlap with the first 19 people who are listed as author of the paper.

So how many people can claim to be an author of this paper? Are all these 19 people of the “PACE trial management group” (not identical to “PACE trial group”???) also some sort of co-author of this paper? Do all these 19 people of the second group also agree with the complete contents of the paper? Do all 38 people agree with the full contents of the paper?

The paper lists many affiliations:
* Queen Mary University of London, UK
* King’s College London, UK
* University of Cambridge, UK
* University of Cumbria, UK
* University of Oxford, UK
* University of Edinburgh, UK
* Medical Research Council Clinical Trials Unit, London, UK
* South London and Maudsley NHS Foundation Trust, London, UK
* The John Radcliffe Hospital, Oxford, UK
* Royal Free Hospital NHS Trust, London, UK
* Barts and the London NHS Trust, London, UK
* Frenchay Hospital NHS Trust, Bristol, UK;
* Western General Hospital, Edinburgh, UK

Do all these affiliations also agree with the full contents of the paper? Am I right to assume that all 38 people (names see above) and all affiliations / institutes (see above) plainly refuse to give critics / other scientists / patients / patient groups (etc.) access to the raw research data of this paper and am I am right with my assumption that it is therefore impossible for all others (including allies of patients / other scientists / interested students, etc.) to conduct re-calculations, check all statements with the raw data, etc?

Decisions whether to accept manuscripts for publication are made in dark places based on opinions offered by people whose identities may be known only to editors. Actually, though, in a small country like the UK, peer-reviewed may be a lot less anonymous than intended and possibly a lot less independent and free of conflict of interests. Without a lot more transparency than is currently available concerning peer review the published papers underwent, we are left to our speculation.

Prepublication peer review is just one aspect of the process of getting research findings vetted and shaped and available to the larger scientific community, and an overall process that is now recognized as tainted with untrustworthiness.

Rules for granting authorship

Concerns about gift and unwarranted authorship have increased not only because of growing awareness of unregulated and unfair practices, but because of the importance attached to citations and authorship for professional advancement. Journals are increasingly requiring documentation that all authors have made an appropriate contribution to a manuscript and have approved the final version

Yet operating rules for granting authorship in many institutional settings vary greatly from the stringent requirements of journals. Contrary to the signed statements that corresponding authors have to make in submitting a manuscript to a journal, many clinicians expect an authorship in return for access to patients. Many competitive institutions award and withhold authorship based on politics and good or bad behavior that have nothing to do with requirements of journals.

Basically, despite the existence of numerous ethical guidelines and explicit policies, authors and institutions can largely do what they want when it comes to granting and withholding authorship.

Persons are quickly disappointed when they are naïve enough to complain about unwarranted authorships or being forced to include authors on papers without appropriate contribution or being denied authorship for an important contribution. They quickly discover that whistleblowers are generally considered more of a threat to institutions and punished more severely than alleged wrongdoers, no matter how strong the evidence may be.

The Lancet website notes

The Lancet is a signatory journal to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, issued by the International Committee of Medical Journal Editors (ICMJE Recommendations), and to the Committee on Publication Ethics (COPE) code of conduct for editors. We follow COPE’s guidelines.

The ICMJE recommends that an author should meet all four of the following criteria:

  • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;
  • Drafting the work or revising it critically for important intellectual content;
  • Final approval of the version to be published;
  • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.”

The intent of these widely endorsed recommendations is that persons associated with a large project have to do a lot to claim their places as authors.

Why the fuss about acknowledgments?

I’ve heard from a number of graduate students and junior investigators that they have had their first manuscripts held up in the submission process because they did not obtain written permission for acknowledgments. Why is that considered so important?

Mention in an acknowledgment is an honor. But it implies involvement in a project and approval of a resulting manuscript. In the past, there were numerous instances where people were named in acknowledgments without having given permission. There was a suspicion sometimes confirmed, that they had been acknowledged only to improve the prospects of a manuscript for getting published. There are other instances where persons were included in acknowledgments without permission with the intent of authors avoiding them in the review process because of the appearance of a conflict of interest.

The expectation is that anyone contributing enough to a manuscript to be acknowledged as a potential conflict of interest in deciding whether it is suitable for publication.

But, as in other aspects of a mysterious and largely anonymous review process, whether people who were acknowledged in manuscripts were barred from participating in review of a manuscript cannot be established by readers.

What is the responsibility of reviewers to declare conflict of interest?

Reviewers are expected to declare conflicts of interest accepting a manuscript to review. But often they are presented with a tick box without a clear explanation of the criteria for the appearance of conflict of interest. But reviewers can usually continue considering a manuscript after acknowledging that they do have an association with authors or institutional affiliation, but they do not consider it a conflict. It is generally accepted that statement.

Authors excluding from the review process persons they consider to have a negative bias

In submitting a manuscript, authors are offered an opportunity to identify persons who should be excluded because of the appearance of a negative bias. Editors generally take these requests quite seriously. As an editor, I sometimes receive a large number of requested exclusions by authors who worry about opinions of particular people.

While we don’t know what went on in prepublication peer review, the PACE investigators have repeatedly and aggressively attempted to manipulate post publication portrayals of their trial in the media. Can we rule out that they similarly try to control potential critics in the prepublication peer review of their papers?

The 2015 Lancet Psychiatry secondary mediation analysis article

Chalder, T., Goldsmith, K. A., Walker, J., & White, P. D. Sharpe, M., Pickles, A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. The Lancet Psychiatry, 2: 141–52

The acknowledgments include

We acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, excluding ARP, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, LV Clark, DL Cox, JC DeCesare, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks. This report is independent research partly arising from a doctoral research fellowship supported by the NIHR.

Fifteen of the authors of the 2011 Lancet PACE paper are no longer present, and another author has been added. The PACE Trial Management Group is again acknowledged, but there is no mention of the separate PACE trial group. We can’t tell why there has been a major reduction in the number of authors and acknowledgments or why it came about. Or whether people who would been dropped participated in a review of this paper. But what is obvious is that this is an exceedingly flawed mediation analysis crafted to a foregone conclusion. I’ll say more about that in future blogs, but we can only speculate how the bad publication practices made it through peer review.

This article is a crime against the practice of secondary mediation analyses. If I were a prospect of author present in a discussion, I would flee before it became a crime scene.

I am told I have over 350 publications, but I considered vulgar for authors to keep track of exact numbers. But there are many potential publications that are not included in this number because I declined authorship because I could not agree with the spin that others were trying to put on the reporting of the findings. In such instances, I exclude myself from review of the resulting manuscript because of the appearance of a conflict of interest. We can ponder how many of the large pool of past PACE authors refused authorship on this paper when it was offered and homely declined to participate in subsequent peer review because of the appearance of a conflict of interest.

The 2015 Lancet Psychiatry long-term follow-up article

Sharpe, M., Goldsmith, K. A., Chalder, T., Johnson, A.L., Walker, J., & White, P. D. (2015). Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry, http://dx.doi.org/10.1016/S2215-0366(15)00317-X

The acknowledgments include

We gratefully acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, L V Clark, D L Cox, J C DeCesare, E Feldman, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks, and the King’s Clinical Trials Unit. We thank Hannah Baber for facilitating the long-term follow-up data collection.

Again, there are authors and acknowledgments missing from the early paper and were in the dark about how and why that happened and whether missing persons were considered free enough of conflict of interest to evaluate this article when it was in manuscript form. But as documented in a blog post at Mind the Brain, there were serious, obvious flaws in the conduct and reporting of the follow-up study. It is a crime against best practices for the proper conduct and reporting of clinical trials. And again we can speculate how it got through peer review.

… And grant reviews?

Where can UK granting agencies obtain independent peer review of past and future grants associated with the PACE trial? To take just one example, the 2015 Lancet Psychiatry secondary mediation analysis was funded in part by a NIHR doctoral research fellowship grant. The resulting paper has many fewer authors than the 2011 Lancet. Did everyone who was an author or mentioned in the acknowledgments on that paper exclude themselves from review of the screen? Who, then, would be left

In Germany and the Netherlands, concerns about avoiding the appearance of conflict of interest in obtaining independent peer review of grants has led to heavy reliance on expertise from outside the country. This does not imply any improprieties from expertise within these countries, but rather the necessity of maintaining a strong appearance that vested interests have not unduly influenced grant review. Perhaps the situation of apparent with the PACE trial suggests that journals and grant review panels within the UK might consider similar steps.

Contemplating the evidence against independent peer review

  • We have a mob of people as authors and mentions in acknowledgments. We have a huge conglomerate of institutions acknowledged.
  • We have some papers with blatant questionable research and reporting practices published in prestigious journals after ostensible peer review.
  • We are left in the dark about what exactly happened in peer review, but that the articles were adequately peer reviewed is a crucial part of their credability.

What are we to conclude?

The_Purloined_LetterI think of what Edgar Allen Poe’s wise character, Le Chevalier C. Auguste Dupin would say. For those of you who don’t know who he is:

Le Chevalier C. Auguste Dupin  is a fictional detective created by Edgar Allan Poe. Dupin made his first appearance in Poe’s “The Murders in the Rue Morgue” (1841), widely considered the first detective fiction story.[1] He reappears in “The Mystery of Marie Rogêt” (1842) and “The Purloined Letter” (1844)…

Poe created the Dupin character before the word detective had been coined. The character laid the groundwork for fictitious detectives to come, including Sherlock Holmes, and established most of the common elements of the detective fiction genre.

I think if we asked Dupin, he would say the danger is that the question is too fascinating to give up, but impossible to resolve without evidence we cannot access. We can blog, we can discuss this important question, but in the end we cannot answer it with certainty.

Sigh.