Effect of a missing clinical trial on what we think about cognitive behavior therapy

  • Data collection for a large, well-resourced study of cognitive behavior therapy (CBT) for psychosis was completed years ago, but the study remains unpublished.
  • Its results could influence the overall evaluation of CBT versus alternative treatments if integrated with what is already known.
  • Political considerations can determine whether completed psychotherapy studies get published or remain lost.
  • This rich example demonstrates the strong influence of publication bias on how we assess psychotherapies.
  • What can be done to reduce the impact of this particular study having gone missing?

A few years ago Ben Goldacre suggested that we do a study of the registration of clinical trials.

lets'collaborate

I can’t remember the circumstances, but Goldacre and I did not pursue the idea further. I was already committed to studying psychological interventions, in which Goldacre was much less interested. Having battled to get American Psychological Association to fully accept and implement CONSORT in its journals, I was well aware how difficult it was getting the professional organizations offering the prime outlets for psychotherapy studies to accept needed reform. I wanted to stay focused on that.

I continue to follow Goldacre’s work closely and cite him often. I also pay particular attention to John Ioannidis’ follow up of his documentation that much of what we found in the biomedical literature is false or exaggerated, like:

Ioannidis JP. Clinical trials: what a waste. BMJ. 2014 Dec 10;349:g7089

Many trials are entirely lost, as they are not even registered. Substantial diversity probably exists across specialties, countries, and settings. Overall, in a survey conducted in 2012, only 30% of journal editors requested or encouraged trial registration.

In a seeming parallel world, I keep showing that in psychology the situation is worse. I had a simple explanation why that I now recognize was naïve: Needed reforms enforced by regulatory bodies like the US Food and Drug Administration (FDA) take longer to influence the psychotherapy literature, where there are no such pressures.

I think we now know that in both biomedicine and, again, psychology, that broad declarations of government and funding bodies and even journals’ of a commitment to disclose a conflict of interest, registering trials, sharing data, are insufficient to ensure that the literature gets cleaned up.

Statements were published across 14 major medical journals endorsing routine data sharing]. Editors of some of the top journals immediately took steps to undermine the implementation in their particular journals. Think of the specter of “research parasites, raised by the editors of New England Journal of Medicine (NEJM).

Another effort at reform

Following each demonstration that reforms are not being implemented, we get more pressures to do better. For instance, the 2015 World Health Organization (WHO) position paper:

Rationale for WHO’s New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results

WHO’s 2005 statement called for all interventional clinical trials to be registered. Subsequently, there has been an increase in clinical trial registration prior to the start of trials. This has enabled tracking of the completion and timeliness of clinical trial reporting. There is now a strong body of evidence showing failure to comply with results-reporting requirements across intervention classes, even in the case of large, randomised trials [37]. This applies to both industry and investigator-driven trials. In a study that analysed reporting from large clinical trials (over 500 participants) registered on clinicaltrials.gov and completed by 2009, 23% had no results reported even after a median of 60 months following trial completion; unpublished trials included nearly 300,000 participants [3]. Among randomised clinical trials (RCTs) of vaccines against five diseases registered in a variety of databases between 2006–2012, only 29% had been published in a peer-reviewed journal by 24 months following study completion [4]. At 48 months after completion, 18% of trials were not reported at all, which included over 24,000 participants. In another study, among 400 randomly selected clinical trials, nearly 30% did not publish the primary outcomes in a journal or post results to a clinical trial registry within four years of completion [5].

Why is this a problem?

  • It affects understanding of the scientific state of the art.

  • It leads to inefficiencies in resource allocation for both research and development and financing of health interventions.

  • It creates indirect costs for public and private entities, including patients themselves, who pay for suboptimal or harmful treatments.

  • It potentially distorts regulatory and public health decision making.

Furthermore, it is unethical to conduct human research without publication and dissemination of the results of that research. In particular, withholding results may subject future volunteers to unnecessary risk.

How the psychotherapy literature is different from a medical literature.

Unfortunately for the trustworthiness of the psychotherapy literature, the WHO statement is limited to medical interventions. We probably won’t see any direct effects on the psychotherapy literature anytime soon.

The psychotherapy literature has all the problems in implementing reforms that we see in biomedicine – and more. Professional organizations like the American Psychological Association and British Psychological Society publishing psychotherapy research have the other important function of ensuring their clinical membership developer’s employment opportunities. More opportunities for employment show the organizations are meeting their members’ needs this results in more dues-paying members.

The organizations don’t want to facilitate third-party payers citing research that particular interventions that their membership is already practicing are inferior and need to be abandoned. They want the branding of members practicing “evidence-based treatment” but not the burden of members having to make decisions based on what is evidence-based. More basically, psychologists’ professional organizations are cognizant of the need to demonstrate a place in providing services that are reimbursed because they improve mental and physical health. In this respect, they are competing with biomedical interventions for the same pot of money.

So, journals published by psychological organizations have vested interests and not stringently enforcing standards. The well-known questionable research practices of investigators are strengthened by questionable publication practices, like confirmation bias, that are tied to the organizations’ institutional agenda.

And the lower status journals that are not published by professional organizations may compromise their standards for publishing psychotherapy trials because of the status that having these articles confers.

Increasingly, medical journals like The Lancet and The Lancet Psychiatry are seen as more prestigious for publishing psychotherapy trials, but they take less seriously the need to enforce standards for psychotherapy studies the regulatory agencies require for biomedical interventions. Example: The Lancet violated its own policies and accepted publication Tony Morrison’s CBT for psychosis study  for publication when it wasn’t registered until after the trial and started. The declared outcomes were vague enough so they could be re-specified after results were known .

Bottom line, in the case of publishing all psychotherapy trials consistent with published protocols: the problem is taken less seriously than if it were a medical trial.

Overall, there is less requirement for psychotherapy trials be registered and less attention paid by editors and reviewers as to whether trials were registered, and whether outcomes are analytic plans were consistent between the registration in the published study.

In a recent blog post, I identified results of a trial that had been published with switched outcomes and then re-published in another paper with different outcomes, without the registration even being noted.

But for all the same reasons cited by the recent WHO statement, publication of all psychotherapy trials matters.

archaeologist digging for goldRecovering an important CBT trial gone missing

I am now going to review the impact of a large, well resourced study of CBT for psychosis remaining on published. I identified the study by a search of the ISRCTN:

The ISRCTN registry is a primary clinical trial registry recognised by WHO and ICMJE that accepts all clinical research studies (whether proposed, ongoing or completed), providing content validation and curation and the unique identification number necessary for publication. All study records in the database are freely accessible and searchable.

I then went back to the literature to see what it happened with it. Keep in mind that this step is not even possible for the many psychotherapy trials that are simply not registered at all.

Many trials are not registered because they are considered pilot and feasibility studies and therefore not suitable for entering effect sizes into the literature. Yet, if significant results are found, they will be exaggerated because they come from an underpowered study. And such results become the basis for entering results into the literature as if it were a planned clinical trial, with considerable likelihood of not being able to be replicated.

There are whole classes of clinical and health psychology interventions that are dominated by underpowered, poor quality studies that should have been flagged as for evidence or excluded altogether. So, in centering on this trial, I’m picking an important example because it was available to be discovered, but there is much of their there is not available to be discovered, because it was not registered.

CBT versus supportive therapy for persistent positive symptoms in psychotic disorders

The trial registration is:

Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders SRCTN29242879DOI 10.1186/ISRCTN29242879

The trial registration indicates that recruitment started on January 1, 2007 and ended on December 31, 2008.

No publications are listed. I and others have sent repeated emails to the principal investigator inquiring about any publications and have failed to get a response. I even sent a German colleague to visit him and all he would say was that results were being written up. That was two years ago.

Google Scholar indicates the principal investigator continues to publish, but not the results of this trial.

A study to die for

The study protocol is available as a PDF

Klingberg S, Wittorf A, Meisner C, Wölwer W, Wiedemann G, Herrlich J, Bechdolf A, Müller BW, Sartory G, Wagner M, Kircher T. Cognitive behavioural therapy versus supportive therapy for persistent positive symptoms in psychotic disorders: The POSITIVE Study, a multicenter, prospective, single-blind, randomised controlled clinical trial. Trials. 2010 Dec 29;11(1):123.

The methods section makes it sound like a dream study with resources beyond what is usually encountered for psychotherapy research. If the protocol is followed, the study would be an innovative, large, methodologically superior study.

Methods/Design: The POSITIVE study is a multicenter, prospective, single-blind, parallel group, randomised clinical trial, comparing CBT and ST with respect to the efficacy in reducing positive symptoms in psychotic disorders. CBT as well as ST consist of 20 sessions altogether, 165 participants receiving CBT and 165 participants receiving ST. Major methodological aspects of the study are systematic recruitment, explicit inclusion criteria, reliability checks of assessments with control for rater shift, analysis by intention to treat, data management using remote data entry, measures of quality assurance (e.g. on-site monitoring with source data verification, regular query process), advanced statistical analysis, manualized treatment, checks of adherence and competence of therapists.

The study was one of the rare ones providing for systematic assessments of adverse events and any harm to patients. Preumably if CBT is powerful enough to affect positive change, it can have negative effects as well. But these remain entirely a matter of speculation.

Ratings of outcome were blinded and steps were taken to preserve the blinding even if an adverse event occurred. This is important because blinded trials are less susceptible to investigator bias.

Another unusual feature is the use of a supportive therapy (ST) credible, but nonspecific condition as a control/comparison.

ST is thought as an active treatment with respect to the patient-therapist relationship and with respect to therapeutic commitment [21]. In the treatment of patients suffering from psychotic disorders these ingredients are viewed to be essential as it has been shown consistently that the social network of these patients is limited. To have at least one trustworthy person to talk to may be the most important ingredient in any kind of treatment. However, with respect to specific processes related to modification of psychotic beliefs, ST is not an active treatment. Strategies specifically designed to change misperceptions or reasoning biases are not part of ST.

Use of this control condition allows evaluation of the important question of whether any apparent effects of CBT are due to the active ingredients of that approach or to the supportive therapeutic relationship within which the active ingredients are delivered.

Being able to rule out the effects of CBT are due to nonspecific effects justifies the extra resources needed to provide specialized training in CBT, if equivalent effects are obtained in the ST group, it suggests that equivalent outcomes can be achieved simply by providing more support to patients, presumably by less trained and maybe even lay personnel.

It is a notorious feature of studies of CBT for psychosis that they lack comparison/control groups in any way equivalent to the CBT in terms of nonspecific intensity, support, encouragement, and positive expectations. Too often, the control group are ill-defined treatment as usual (TAU) that lacks regular contact and inspires any positive expectations. Basically CBT is being compared to inadequate treatment and sometimes no treatment and so any apparent effects that are observed are due to correcting these inadequacies, not any active ingredient.

The protocol hints in passing at the investigators’ agenda.

This clinical trial is part of efforts to intensify psychotherapy research in the field of psychosis in Germany, to contribute to the international discussion on psychotherapy in psychotic disorders, and to help implement psychotherapy in routine care.

Here we see an aim to justify implementation of CBT for psychosis in routine care in Germany. We have seen something similar with repeated efforts of German to demonstrate that long-term psychodynamic psychotherapy is more effective than shorter, less expensive treatments, despite the lack of credible data [ ].

And so, if the results would not contribute to getting psychotherapy implemented in routine care in Germany, do they get buried?

Science & Politics of CBT for Psychosis

A rollout of a CBT study for psychosis published in Lancet made strong claims in a BBC article and audiotape promotion.

morroson slide-page-0

 

 

 

The attention attracted critical scrutiny that these claims couldn’t sustain. After controversy on Twitter, the BBC headline was changed to a more modest claim.

Criticism mounted:

  • The study retained fewer participants receiving CBT at the end of the study than authors.
  • The comparison treatment was ill-defined, but for some patients meant no treatment because they were kicked out of routine care for refusing medication.
  • A substantial proportion of patients assigned to CBT began taking antipsychotic medication by the end of the study.
  • There was no evidence that the response to CBT was comparable to that achieved with antipsychotic medication alone in clinical trials.
  • No evidence that less intensive, nonspecific supportive therapy would not have achieved the same results as CBT.

And the authors ended up conceding in a letter to the editor that their trial had been registered after data collection had started and it did not produce evidence of equivalence to antipsychotic medication.

In a blog post containing the actual video of the presentation before his British Psychological Society, Keith Laws declares

Politics have overcome the science in CBT for psychosis

Recently the British Psychological Society invited me to give a public talk entitled CBT: The Science & Politics behind CBT for Psychosis. In this talk, which was filmed…, I highlight the unquestionable bias shown by the National Institute of Clinical Excellence (NICE) committee  (CG178) in their advocacy of CBT for psychosis.

The bias is not concealed, but unashamedly served-up by NICE as a dish that is high in ‘evidence-substitute’, uses data that are past their sell-by-date and is topped-off with some nicely picked cherries. I raise the question of whether committees – with such obvious vested interests – should be advocating on mental health interventions.

I present findings from our own recent meta-analysis (Jauhar et al 2014) showing that three-quarters of all RCTs have failed to find any reduction in the symptoms of psychosis following CBT. I also outline how trials which have used non-blind assessment of outcomes have inflated effect sizes by up to 600%. Finally, I give examples where CBT may have adverse consequences – both for the negative symptoms of psychosis and for relapse rates.

A pair of well-conducted and transparently reported Cochrane reviews suggest there is little evidence for the efficacy of CBT for psychosis (*)

cochrane slide-page-0                          cochrane2-page-0

 

These and other slides are available in a slideshow presentation of a talk I gave at the Edinburgh Royal  Infirmary.

Yet, even after having to be tempered in the face of criticism, the original claims of the Morrison study get echoed in the antipsychiatry Understanding Psychosis:

“Other forms of therapy can also be helpful, but so far it is CBTp that has been most intensively researched. There have now been several meta-analyses (studies using a statistical technique that allows findings from various trials to be averaged out) looking at its effectiveness. Although they each yield slightly different estimates, there is general consensus that on average, people gain around as much benefit from CBT as they do from taking psychiatric medication.”

Such misinformation can confuse patients making difficult decisions about whether to accept antipsychotic medication.

go on without mejpgIf the results from the missing CBT for psychosis study became available…

If the Klingberg study were available and integrated with existing data, it would be one of the largest and highest quality studies and it would provide insight into any advantage of CBT for psychosis. For those who can be convinced by data, a null finding from a large studythat added to mostly small and methodologically unsophisticated studies could be decisive.

A recent meta-analysis of CBT for prevention of psychosis by Hutton and Taylor includes six studies and mentions the trial protocol in passing:

Two recent trials of CBT for established psychosis provide examples of good practice for reporting harms (Klingberg et al. 20102012) and CONSORT (Consolidated Standards of Reporting Trials) provide a sensible set of recommendations (Ioannidis et al. 2004).

Yet, it does not provide indicate why it is missing and is not included in a list of completed but unpublished studies. Yet, the protocol indicates a study considerably larger than any of the studies that were included.

To communicate a better sense of the potential importance of this missing study and perhaps place more pressures on the investigators to release its results, I would suggest that future meta-analyses state:

The protocol for Klingberg et al. Cognitive behavioural treatment for persistent positive symptoms in psychotic disorders indicates that recruitment was completed in 2008. No publications have resulted. Emails to Professor Klingberg about the status of the study failed to get a response. If the study were completed consistent with its protocol, it would represent one of the largest studies of CBT for psychosis ever and one of the few with a fair comparison between CBT and supportive therapy. Inclusion of the results could potentially substantially modify the conclusions of the current meta-analysis.

 

Cognitive behavior and psychodynamic therapy no better than routine care for anorexia.

Putting a positive spin on an ambitious, multisite trial doomed from the start.

I announced in my last blog post that this one would be about bad meta-analyses of weakStop_Press_2 data used to secure insurance reimbursement for long-term psychotherapy. But that is postponed so that I can give timely coverage to the report in Lancet of results of the Anorexia Nervosa Treatment of OutPatients (ANTOP) randomized clinical trial (RCT). The trial, proclaimed the largest ever of its kind, compared cognitive behavior therapy, focal psychodynamic therapy, and “optimized” routine care for the treatment of anorexia.

This post is an adapt sequel to my last one. I had expressed a lot of enthusiasm for a RCT comparing cognitive behavior therapy (CBT) to psychoanalytic therapy for bulimia. I was impressed with its design and execution and the balanced competing investigator allegiances. The article’s reporting was transparent, substantially reducing risk of bias and allowing a clear message. You will not see me very often being so positive about a piece of research in this blog, although I did note some limitations.

Hands down, CBT did better than psychoanalytic therapy in reducing binging and purging, despite there being only five months of cognitive therapy and two years of psychoanalysis. This difference seems to be a matter of psychoanalysis doing quite poorly, and not that the cognitive behavior CBT doing so well.

However, on my Facebook wall, Ioana Cristea, a known contrarian and evidence-based skeptic like myself, posted a comment about my blog:

Did you see there’s also a recent very similar Lancet study for anorexia? With different results, of course.

She was referring to

Zipfel, Stephan, Beate Wild, Gaby Groß, Hans-Christoph Friederich, Martin Teufel, Dieter Schellberg, Katrin E. Giel et al. Focal psychodynamic therapy, cognitive behaviour therapy, and optimised treatment as usual in outpatients with anorexia nervosa (ANTOP study): randomised controlled trial. The Lancet (2013).

The abstract of the Lancet article is available here, but the full text is behind a pay wall. Fortunately, the registered trial protocol for the study is available open access here. You can at least get the details of what the authors said they were going to do, ahead of doing it.

For an exceedingly quick read, try the press release for the trial here, entitled

Largest therapy trial worldwide: Psychotherapy treats anorexia effectively.

Or an example of a thorough uncritical churnalling of this press release in the media here.

What we are told about anorexia

anorexia-cuando-21
Media portrayals of anorexia often show the extreme self-starvation associated with the severe disorder, but this study recruited women with mild to moderate anorexia.

The introduction of the ANTOP article states

  • Anorexia nervosa is associated with serious medical morbidity and pronounced psychosocial comorbidity.
  • It has the highest mortality rate of all mental disorders, and relapse happens frequently.
  • The course of illness is very often chronic, particularly if left untreated.

A sobering accompanying editorial in Lancet stated

The evidence base for anorexia nervosa treatment is meagre1, 2 and 3 considering the extent to which this disorder erodes quality of life and takes far too many lives prematurely.4 But clinical trials for anorexia nervosa are difficult to conduct, attributable partly to some patients’ deep ambivalence about recovery, the challenging task of offering a treatment designed to remove symptoms that patients desperately cling to, the fairly low prevalence of the disorder, and high dropout rates. The combination of high dropout and low treatment acceptability has led some researchers to suggest that we pause large-scale clinical trials for anorexia nervosa until we resolve these fundamental obstacles.

What the authors claim that this study found.

The press release states

Overall, the two new types of therapy demonstrated advantages compared to the optimized therapy as usual,” said Prof. Zipfel. “At the end of our study, focal psychodynamic therapy proved to be the most successful method, while the specific cognitive behavior therapy resulted in more rapid weight gain.

And the abstract

At the end of treatment, BMI [body mass index] had increased in all study groups (focal psychodynamic therapy 0·73 kg/m², enhanced cognitive behavior therapy 0·93 kg/m², optimised treatment as usual 0·69 kg/m²); no differences were noted between groups (mean difference between focal psychodynamic therapy and enhanced cognitive behaviour therapy –0·45, 95% CI –0·96 to 0·07; focal psychodynamic therapy vs optimised treatment as usual –0·14, –0·68 to 0·39; enhanced cognitive behaviour therapy vs optimised treatment as usual –0·30, –0·22 to 0·83). At 12-month follow-up, the mean gain in BMI had risen further (1·64 kg/m², 1·30 kg/m², and 1·22 kg/m², respectively), but no differences between groups were recorded (0·10, –0·56 to 0·76; 0·25, –0·45 to 0·95; 0·15, –0·54 to 0·83, respectively). No serious adverse events attributable to weight loss or trial participation were recorded.

How can we understand results presented in terms of changes in BMI?

body-mass-index-formulaYou can find out more about BMI [body mass index] here and you can calculate your own here. But note that BMI is a controversial measure, does not directly assess body fat, and is not particularly accurate for people who are large- or small-framed or fit or athletic.

These patients had to have been quite underweight to be diagnosed with anorexia, and so how much weight did they gain as result of treatment?  The authors should have given us the results in numbers that make sense to most people.

The young adult women in the study averaged 46.7 kg or 102.7 pounds at the beginning of the study. I had to do some calculations to translate the changes in BMI reported by these authors with the assumption that they were an average height of 5’6”, like other German women.

Four months after beginning the 10 month treatment, the women had gained an average of 5 pounds and at 12 months after the end of treatment (so 22 months after beginning treatment), they had gained another 3 pounds.

On average, the women participating in the trial were still underweight 22 months after the trial’s start and would have still qualified for entering the trial, at least according to the weight criterion.

How the authors explain their results.

Optimised treatment as usual, combining psychotherapy and structured care from a family doctor, should be regarded as solid baseline treatment for adult outpatients with anorexia nervosa. Focal psychodynamic therapy proved advantageous in terms of recovery at 12-month follow-up, and enhanced cognitive behaviour therapy was more effective with respect to speed of weight gain and improvements in eating disorder psychopathology. Long-term outcome data will be helpful to further adapt and improve these novel manual-based treatment approaches.

My assessment after reading this article numerous times and consulting supplementary material:

  • Anorexia was treated with two therapies, each compared to an unusual control condition termed “optimized” treatment as usual. When the study was over and even in follow-up, anorexia won and the treatments lost.
  • In interpreting these results, note that the study involved a sample of young women with mostly only mild to moderate anorexia. Only a little more than half had full syndrome anorexia.
  • In post hoc “exploratory analyses,” the authors emphasized a single measure at a single time point that favored focal psychodynamic therapy, despite null findings with most other standard measures at all time points.
  • The authors expressed their outcomes in within-group effect sizes. This is an unusual way that exaggerated results, particularly when comparisons are made to the effect sizes reported for other studies.
  • Put another way, results of the trial were very likely spun, starting with the abstract, and continuing in the results and press release.
  • The study demonstrates the difficulty treating anorexia and evaluating this treatment. Only modest increases in body weight were obtained despite intensive treatment.  Interpretation of what happened is complicated by high rates of dropping out of therapy and loss to follow-up, and the necessity of inpatient stays and other supplementary treatment.
  • The optimized routine care condition involved ill-described, uncontrolled  psychotherapeutic and medical interventions. Little sense can be made of this clinical trial except that availability of manualized treatment proved no better (or no worse), and none of the treatments, including routine care, did particularly well.
  • The study is best understood as testing the effectiveness of treating anorexia in some highly unusual circumstances in Germany, not an efficacy trial testing the strength of the two treatments. Results are not generalizable to either of the psychotherapies administered by themselves in other contexts.
  • The study probably demonstrates that  meaningful RCTs of the treatment of anorexia cannot be conducted in Germany with generalizable results.
  • Maybe this trial is just another demonstration that we do not know enough to undertake a randomized study of the treatment of anorexia that would yield readily interpretable findings.

Sad, sad, sad. So you can stop here if all you wanted was my evaluation. Or you can continue reading to find out how I arrived at and whether you agree.

Outcomes for the trial: why am I so unimpressed?

On average, the women were still underweight at follow up, despite having had only mildly to moderate anorexia at the start of the study.  The sample was quite heterogeneous at baseline. We don’t know how much of the modest weight gain and the minority of women who were considered “fully recovered” represents small improvements in women starting with higher BMI and milder, subsyndromal anorexia at baseline.

Any discussion of outcomes has to take into account the substantial number of women not completing treatment and lost to follow up.

Missing data can be estimated with fancy imputational techniques. But they are not magic, and involve some assumptions that cannot be tested with loss of patients to follow up in such small treatment groups. And yet, we need some way to account for all patients initially entering a clinical trial (termed an intent-to-treat analysis) for valid, generalizable results. So, we cannot ignore these problems and simply concentrate just on the women completing treatment and remaining available.

And then there is the issue of nonstudy treatment, including inpatient stays. The study has no way of taking them into account, other than reporting them. Inpatient stays could have occurred for different reasons across the three conditions. We cannot determine if the inpatient stays contributed to the results that were observed or maybe interfered with the outpatient treatment. But here too, we cannot simply ignore this factor.

We certainly cannot assume that failures to complete treatment, loss to follow up and the necessity of inpatient stays are randomly distributed between groups. We cannot convincingly rule out that some combination of these factors are decisive for the results that were obtained.

The spinning of the trial in favor of focal psychodynamic treatment.

positive spin 2The preregistration of the trial listed BMI at the end of treatment as the primary outcome. That means the investigators staked any claims about the trial on this outcome at this time point. There were no overall differences.

The preregistration also listed numerous secondary outcomes: the Morgan-Russell-criteria; general axis I psychopathology (SCID I) ; eating disorder specific psychopathology (SIAB-Ex; Eating Disorder Inventory-2) severity of depressive comorbidity (PHQ-9); and quality of life according to the SF-36. Not all of these outcomes are reported in the article, and for the ones that are reported, almost all are not significantly different at any timepoint.

The authors’ failure to designate one or two of these variables a priori (ahead of time) sets them up to pick-the-best hypothesizing after results are known or HARKING. We do not actually know what was done, but there is a high risk of bias.

We should in general be highly skeptical about post hoc exploratory analyses of variables that were not pre-designated as outcomes for a clinical trial, in either primary or secondary analyses.

In table 3 of their article, the investigators present within-group effect sizes that portray the manualized treatments as doing impressively well.

 ANTOP study 1 page-page-0

Yet, as I will discuss in forthcoming blogs, within-group effect sizes are highly misleading compared to the usually reported between-group effect sizes. These within-group effect sizes attribute all changes that occurred in a particular group to the effects of the intervention. That includes claiming credit for nonspecific effects common across conditions, as well as any improvement due to positive expectations or patients bouncing back after having enrolled in the study at a particular bad time.

The conventional strategy is to provide between-group effect sizes comparing a treatment to what was obtained the other groups.  This preserves the effects of randomization and makes use of what can be learned from comparison/control conditions. Treatment do not have effect sizes, but comparisons of treatments do.

As an example, we do not pay much attention to the within-group effect size for antidepressants in a particular study, because these numbers do not take into account how the antidepressants did relative to a pill placebo condition. Presumably the pill placebo is chemically inert, but it is provided with the same attention from clinicians, positive expectations, and support that come with the antidepressant. Once these factors shared by both the antidepressant and pill placebo conditions are taken into account, the effect size for antidepressant decreases.

Take a look at weight gain by the end of the 12 month follow-up among patients receiving focal psychodynamic therapy. In Table 3, the within-group effect size for focal psychodynamic therapy is a whopping 1.6, p < .001. But the more appropriate between-group effect size for comparing focal psychodynamic therapy to treatment as usual shown in Table 2 is  a wimpy, nonsignificant .13, p< .48 (!)

An extraordinary “optimized” treatment as usual.

Descriptions in the preregistered study protocol, press releases, and methods section of the article do not do justice to the “optimized” treatment as usual. The method section did not rouse particular concern from me. It described patients assigned to the treatment as usual being provided with a list of psychotherapists specializing in the treatment of eating disorders and their family physicians assuming an active role in monitoring and providing actual treatment. This does not sound particularly unusual for a comparison/control group. After all, it would be unethical to leave women with such a threatening, serious disorder on a waiting list just to allow a comparison.

But then I came across this shocker description of the optimized routine care condition in the discussion section:

Under close guidance from their family doctor—eg, regular weight monitoring and essential blood testing—and with close supervision of their respective study centre, patients allocated optimised treatment as usual were able to choose their favourite treatment approach and setting (intensity, inpatient, day patient, or outpatient treatment) and their therapist, in accordance with German national treatment guidelines for anorexia nervosa.11 Moreover, comparisons of applied dosage and intensity of treatment showed that all patients— irrespective of treatment allocation—averaged a similar number of outpatient sessions over the course of the treatment and follow-up periods (about 40 sessions). These data partly reflect an important achievement of the German health-care system: that access to psychotherapy treatment is covered by insurance. However, patients allocated optimised treatment as usual needed additional inpatient treatment more frequently (41%) than either those assigned focal psychodynamic therapy (23%) or enhanced cognitive behaviour therapy (35%).

OMG! I have never seen such intensive treatment-as-usual in a clinical trial. I doubt anything like this treatment would be available elsewhere in the world as standard care.

This description raises a number of disturbing questions about the trial:

Why would any German women with anorexia enroll in the clinical trial? Although a desire to contribute to science is sometimes a factor, the main reason for patients entering clinical trials are because they think they will get better treatment and maybe because they think they can get a preferred treatment which they cannot get it elsewhere. But, if this is the situation of routine care in Germany, why would eligible women not just remain in routine care without the complications of being in a clinical trial?

At one point, the authors claim that 1% of the population has a diagnosis of anorexia. That represents a lot of women. Yet, they were only able to randomize 242 patients, despite a massive two-year effort to recruit patients involving 10 German departments of psychotherapy and psychosomatic medicine. It appears that a very small minority of the available patients were recruited, raising questions about the representativeness of the sample.

Patients had little incentive to remain in the clinical trial rather than dropping out. Dropping out of the clinical trial would still give them access to free treatment–without the hassle of remaining in the trial.

In a more typical trial, patients assigned to treatment as usual are provided with a list of referrals. Often few bother to complete a referral or remain in treatment, and so we can assume that the treatment-as-usual condition usually represents minimal treatment, providing a suitable comparison  with a positive outcome for more active, free treatment. In the United States, patients enrolling in clinical trials often either do not have health insurance or can find only providers who will not accept what health insurance they have for the treatment they want. Patients in the United States enter a clinical trial just to get the possibility of treatment, very different circumstances than in Germany.

Overall, no matter what condition patients were assigned, all received about the same amount of outpatient psychotherapy, about 40 sessions. How could these authors have expected to find a substantial difference between the two manualized treatments and this intensity of routine care? Differences between groups of the magnitude they assumed in calculating sample sizes under these conditions would be truly extraordinary.

Alot of attention and support is provided in 40 sessions of such psychotherapy, making it difficult to detect the specific effects provided by the manualized therapies, above and beyond the attention support they provide..

In short, the manualized treatments were doomed to null findings in comparison to treatment as usual. The only thing really unexpected about this trial is that all three conditions did so poorly.

What is a comparison/control group supposed to accomplish, anyway?

Investigators undertaking randomized controlled trials of psychotherapies know about the necessity of comparison/control groups, but they generally understand less the implication of their choice of a comparison/control group.

Most evidence-based treatments earned their status by proving superior in a clinical trial to a control group such as wait list or no treatment at all. Such comparisons provide the backbone to claims of evidence-based treatments, but are not particularly informative. It may simply be that many manualized, structured treatments are no better than other active treatments patients have similar intensity of treatment, positive expectations, and attention and support.

Some investigators, however, are less interested in establishing the efficacy of treatments, then in demonstrating the effectiveness of particular treatments over what is already being done in the community. Effectiveness studies typically find small effects been obtained in straw-man comparisons between treatments and the weak effects observed in control groups.

But even if their intention is to conduct an effectiveness study, investigators need to better describe the nature of of treatment as usual, if they are to make reasonable generalizations to other clinical and health system contexts.

We know that the optimized treatment as usual was exceptionally intensive, but we have no idea from the published article what it entailed, except lots of treatment, as much as what was provided provided in the active treatment conditions. It may even be that some of the women assigned to optimized treatment obtained therapists providing much the same treatment.

Again, if all of the conditions had done well in terms of improved patient outcomes, then we could have concluded that introducing manualized treatment does not accomplish much in Germany at least. But my assessment is that none of the three conditions did particularly well.

The optimized treatment as usual is intensive but not evidence-based. In my last blog post, we viewed a situation in which less treatment proved better than more. Maybe the availability of intensive and extensive treatment discourages women from taking responsibility for their health threatening condition. They do not improve, simply because they can always get more treatment. That is simply a hypothesis, but Germany is spending lots of money assuming that it is incorrect.

Why Germany may not be the best place to do a clinical trial for treatment of anorexia.

Germany may not be an appropriate place to do a clinical trial of treatment for anorexia for a number of reasons:

  • The ready availability of free, intensive treatment prevents recruitment of a large, representative sample of women with anorexia to a potentially burdensome clinical trial.
  • There is less incentive for women to remain in the study once they are enrolled because they can always drop out and get the same intensity of treatment elsewhere.
  • The control/comparison group of “optimized” treatment as usual complied with the extensive requirements of the German national treatment guidelines for anorexia nervosa. But these standards are not evidence-based and appear to have produced mediocre outcomes in at least this trial.
  • Treatment as usual available to everyone is not necessarily effective, but it precludes detecting incremental improvements obtained by less intensive, but focused treatments.

Prasad and Ioannidis have recently called attention to the pervasiveness of non-evidence-based medical treatments and practice guidelines that are not either cost-effective, ensuring good patient outcomes, or avoiding unnecessary risks. They propose de-implementing such unproven practices, but acknowledge the likelihood that cultural values, vested interests, and politics can interfere with efforts to subject established but unproven practices to empirical test.

Surely, that would be the case in any effort to de-implement guidelines the treatment of anorexia in Germany.

The potentially life-threatening nature of anorexia may discourage any temporary suspension of treatment guidelines until evidence can be obtained. But we need only to look to the example of similarly life-threatening cancers where improved treatments only came about only when investigators were able to suspend well-established but unproven treatments and conduct randomized trials.

It would be unethical to assigned women with anorexia to waitlist control or no treatment when free treatment is readily available in the community. So, there may be no other options but to use treatment has usual has a control condition.

If so, a finding of no differences between groups is almost certainly guaranteed. And given the poor performance of routine care observed in this study, such results were not represent the familiar Dodo Bird Verdict for comparisons between psychotherapies in which all of the treatments were winners in all get prizes.

Why it may be premature to conduct randomized trials of treatment of anorexia.

This may well be, as the investigators proclaim in their press release, the largest ever RCT of treatment for anorexia. But it is very difficult to make sense of it, other than to conclude that no treatments, including treatment as usual, had particularly impressive results.

For me, this study highlights the anonymous barriers to conducting a well-controlled RCT for anorexia with patients representative of the kinds that would seek treatment in real-world clinical context.

There are unsolved issues of patient dropout and retention for follow-up that seriously threaten the integrity of any results. We just do not know how to recruit a representative sample of patients with anorexia and keep them in therapy and around for follow-up.

Maybe we should ask women with anorexia about what they think. Maybe we could enlist some of them to assist in a design of a randomized trial or at least a treatment investigators could retain sufficient numbers of them to conduct a randomized trial

I am not sure how we would otherwise get this understanding without involving women with anorexia in the design of treatment in future clinical trials.

There are unsolved issues of medical surveillance and co-treatment confounding. Anorexia poses physical health problems in the threats associated with sudden weight loss. But we do not have evidence-based protocols in place for standardizing surveillance and decision-making.

Before we undertake massive randomized trials such as ANTOP, we need to get information to set basic parameters from nonrandomized but nonetheless informative small-scale studies. Obviously the investigators in this study could not even estimate effect sizes in order to set sample sizes.

Well,  you presumably having made it through this long read, what do you think?