Putting a positive spin on an ambitious, multisite trial doomed from the start.
I announced in my last blog post that this one would be about bad meta-analyses of weak data used to secure insurance reimbursement for long-term psychotherapy. But that is postponed so that I can give timely coverage to the report in Lancet of results of the Anorexia Nervosa Treatment of OutPatients (ANTOP) randomized clinical trial (RCT). The trial, proclaimed the largest ever of its kind, compared cognitive behavior therapy, focal psychodynamic therapy, and “optimized” routine care for the treatment of anorexia.
This post is an adapt sequel to my last one. I had expressed a lot of enthusiasm for a RCT comparing cognitive behavior therapy (CBT) to psychoanalytic therapy for bulimia. I was impressed with its design and execution and the balanced competing investigator allegiances. The article’s reporting was transparent, substantially reducing risk of bias and allowing a clear message. You will not see me very often being so positive about a piece of research in this blog, although I did note some limitations.
Hands down, CBT did better than psychoanalytic therapy in reducing binging and purging, despite there being only five months of cognitive therapy and two years of psychoanalysis. This difference seems to be a matter of psychoanalysis doing quite poorly, and not that the cognitive behavior CBT doing so well.
Did you see there’s also a recent very similar Lancet study for anorexia? With different results, of course.
She was referring to
Zipfel, Stephan, Beate Wild, Gaby Groß, Hans-Christoph Friederich, Martin Teufel, Dieter Schellberg, Katrin E. Giel et al. Focal psychodynamic therapy, cognitive behaviour therapy, and optimised treatment as usual in outpatients with anorexia nervosa (ANTOP study): randomised controlled trial. The Lancet (2013).
The abstract of the Lancet article is available here, but the full text is behind a pay wall. Fortunately, the registered trial protocol for the study is available open access here. You can at least get the details of what the authors said they were going to do, ahead of doing it.
For an exceedingly quick read, try the press release for the trial here, entitled
Largest therapy trial worldwide: Psychotherapy treats anorexia effectively.
What we are told about anorexia
The introduction of the ANTOP article states
- Anorexia nervosa is associated with serious medical morbidity and pronounced psychosocial comorbidity.
- It has the highest mortality rate of all mental disorders, and relapse happens frequently.
- The course of illness is very often chronic, particularly if left untreated.
A sobering accompanying editorial in Lancet stated
The evidence base for anorexia nervosa treatment is meagre1, 2 and 3 considering the extent to which this disorder erodes quality of life and takes far too many lives prematurely.4 But clinical trials for anorexia nervosa are difficult to conduct, attributable partly to some patients’ deep ambivalence about recovery, the challenging task of offering a treatment designed to remove symptoms that patients desperately cling to, the fairly low prevalence of the disorder, and high dropout rates. The combination of high dropout and low treatment acceptability has led some researchers to suggest that we pause large-scale clinical trials for anorexia nervosa until we resolve these fundamental obstacles.
What the authors claim that this study found.
The press release states
Overall, the two new types of therapy demonstrated advantages compared to the optimized therapy as usual,” said Prof. Zipfel. “At the end of our study, focal psychodynamic therapy proved to be the most successful method, while the specific cognitive behavior therapy resulted in more rapid weight gain.
And the abstract
At the end of treatment, BMI [body mass index] had increased in all study groups (focal psychodynamic therapy 0·73 kg/m², enhanced cognitive behavior therapy 0·93 kg/m², optimised treatment as usual 0·69 kg/m²); no differences were noted between groups (mean difference between focal psychodynamic therapy and enhanced cognitive behaviour therapy –0·45, 95% CI –0·96 to 0·07; focal psychodynamic therapy vs optimised treatment as usual –0·14, –0·68 to 0·39; enhanced cognitive behaviour therapy vs optimised treatment as usual –0·30, –0·22 to 0·83). At 12-month follow-up, the mean gain in BMI had risen further (1·64 kg/m², 1·30 kg/m², and 1·22 kg/m², respectively), but no differences between groups were recorded (0·10, –0·56 to 0·76; 0·25, –0·45 to 0·95; 0·15, –0·54 to 0·83, respectively). No serious adverse events attributable to weight loss or trial participation were recorded.
How can we understand results presented in terms of changes in BMI?
You can find out more about BMI [body mass index] here and you can calculate your own here. But note that BMI is a controversial measure, does not directly assess body fat, and is not particularly accurate for people who are large- or small-framed or fit or athletic.
These patients had to have been quite underweight to be diagnosed with anorexia, and so how much weight did they gain as result of treatment? The authors should have given us the results in numbers that make sense to most people.
The young adult women in the study averaged 46.7 kg or 102.7 pounds at the beginning of the study. I had to do some calculations to translate the changes in BMI reported by these authors with the assumption that they were an average height of 5’6”, like other German women.
Four months after beginning the 10 month treatment, the women had gained an average of 5 pounds and at 12 months after the end of treatment (so 22 months after beginning treatment), they had gained another 3 pounds.
On average, the women participating in the trial were still underweight 22 months after the trial’s start and would have still qualified for entering the trial, at least according to the weight criterion.
How the authors explain their results.
Optimised treatment as usual, combining psychotherapy and structured care from a family doctor, should be regarded as solid baseline treatment for adult outpatients with anorexia nervosa. Focal psychodynamic therapy proved advantageous in terms of recovery at 12-month follow-up, and enhanced cognitive behaviour therapy was more effective with respect to speed of weight gain and improvements in eating disorder psychopathology. Long-term outcome data will be helpful to further adapt and improve these novel manual-based treatment approaches.
My assessment after reading this article numerous times and consulting supplementary material:
- Anorexia was treated with two therapies, each compared to an unusual control condition termed “optimized” treatment as usual. When the study was over and even in follow-up, anorexia won and the treatments lost.
- In interpreting these results, note that the study involved a sample of young women with mostly only mild to moderate anorexia. Only a little more than half had full syndrome anorexia.
- In post hoc “exploratory analyses,” the authors emphasized a single measure at a single time point that favored focal psychodynamic therapy, despite null findings with most other standard measures at all time points.
- The authors expressed their outcomes in within-group effect sizes. This is an unusual way that exaggerated results, particularly when comparisons are made to the effect sizes reported for other studies.
- Put another way, results of the trial were very likely spun, starting with the abstract, and continuing in the results and press release.
- The study demonstrates the difficulty treating anorexia and evaluating this treatment. Only modest increases in body weight were obtained despite intensive treatment. Interpretation of what happened is complicated by high rates of dropping out of therapy and loss to follow-up, and the necessity of inpatient stays and other supplementary treatment.
- The optimized routine care condition involved ill-described, uncontrolled psychotherapeutic and medical interventions. Little sense can be made of this clinical trial except that availability of manualized treatment proved no better (or no worse), and none of the treatments, including routine care, did particularly well.
- The study is best understood as testing the effectiveness of treating anorexia in some highly unusual circumstances in Germany, not an efficacy trial testing the strength of the two treatments. Results are not generalizable to either of the psychotherapies administered by themselves in other contexts.
- The study probably demonstrates that meaningful RCTs of the treatment of anorexia cannot be conducted in Germany with generalizable results.
- Maybe this trial is just another demonstration that we do not know enough to undertake a randomized study of the treatment of anorexia that would yield readily interpretable findings.
Sad, sad, sad. So you can stop here if all you wanted was my evaluation. Or you can continue reading to find out how I arrived at and whether you agree.
Outcomes for the trial: why am I so unimpressed?
On average, the women were still underweight at follow up, despite having had only mildly to moderate anorexia at the start of the study. The sample was quite heterogeneous at baseline. We don’t know how much of the modest weight gain and the minority of women who were considered “fully recovered” represents small improvements in women starting with higher BMI and milder, subsyndromal anorexia at baseline.
Any discussion of outcomes has to take into account the substantial number of women not completing treatment and lost to follow up.
Missing data can be estimated with fancy imputational techniques. But they are not magic, and involve some assumptions that cannot be tested with loss of patients to follow up in such small treatment groups. And yet, we need some way to account for all patients initially entering a clinical trial (termed an intent-to-treat analysis) for valid, generalizable results. So, we cannot ignore these problems and simply concentrate just on the women completing treatment and remaining available.
And then there is the issue of nonstudy treatment, including inpatient stays. The study has no way of taking them into account, other than reporting them. Inpatient stays could have occurred for different reasons across the three conditions. We cannot determine if the inpatient stays contributed to the results that were observed or maybe interfered with the outpatient treatment. But here too, we cannot simply ignore this factor.
We certainly cannot assume that failures to complete treatment, loss to follow up and the necessity of inpatient stays are randomly distributed between groups. We cannot convincingly rule out that some combination of these factors are decisive for the results that were obtained.
The spinning of the trial in favor of focal psychodynamic treatment.
The preregistration of the trial listed BMI at the end of treatment as the primary outcome. That means the investigators staked any claims about the trial on this outcome at this time point. There were no overall differences.
The preregistration also listed numerous secondary outcomes: the Morgan-Russell-criteria; general axis I psychopathology (SCID I) ; eating disorder specific psychopathology (SIAB-Ex; Eating Disorder Inventory-2) severity of depressive comorbidity (PHQ-9); and quality of life according to the SF-36. Not all of these outcomes are reported in the article, and for the ones that are reported, almost all are not significantly different at any timepoint.
The authors’ failure to designate one or two of these variables a priori (ahead of time) sets them up to pick-the-best hypothesizing after results are known or HARKING. We do not actually know what was done, but there is a high risk of bias.
We should in general be highly skeptical about post hoc exploratory analyses of variables that were not pre-designated as outcomes for a clinical trial, in either primary or secondary analyses.
In table 3 of their article, the investigators present within-group effect sizes that portray the manualized treatments as doing impressively well.
Yet, as I will discuss in forthcoming blogs, within-group effect sizes are highly misleading compared to the usually reported between-group effect sizes. These within-group effect sizes attribute all changes that occurred in a particular group to the effects of the intervention. That includes claiming credit for nonspecific effects common across conditions, as well as any improvement due to positive expectations or patients bouncing back after having enrolled in the study at a particular bad time.
The conventional strategy is to provide between-group effect sizes comparing a treatment to what was obtained the other groups. This preserves the effects of randomization and makes use of what can be learned from comparison/control conditions. Treatment do not have effect sizes, but comparisons of treatments do.
As an example, we do not pay much attention to the within-group effect size for antidepressants in a particular study, because these numbers do not take into account how the antidepressants did relative to a pill placebo condition. Presumably the pill placebo is chemically inert, but it is provided with the same attention from clinicians, positive expectations, and support that come with the antidepressant. Once these factors shared by both the antidepressant and pill placebo conditions are taken into account, the effect size for antidepressant decreases.
Take a look at weight gain by the end of the 12 month follow-up among patients receiving focal psychodynamic therapy. In Table 3, the within-group effect size for focal psychodynamic therapy is a whopping 1.6, p < .001. But the more appropriate between-group effect size for comparing focal psychodynamic therapy to treatment as usual shown in Table 2 is a wimpy, nonsignificant .13, p< .48 (!)
An extraordinary “optimized” treatment as usual.
Descriptions in the preregistered study protocol, press releases, and methods section of the article do not do justice to the “optimized” treatment as usual. The method section did not rouse particular concern from me. It described patients assigned to the treatment as usual being provided with a list of psychotherapists specializing in the treatment of eating disorders and their family physicians assuming an active role in monitoring and providing actual treatment. This does not sound particularly unusual for a comparison/control group. After all, it would be unethical to leave women with such a threatening, serious disorder on a waiting list just to allow a comparison.
But then I came across this shocker description of the optimized routine care condition in the discussion section:
Under close guidance from their family doctor—eg, regular weight monitoring and essential blood testing—and with close supervision of their respective study centre, patients allocated optimised treatment as usual were able to choose their favourite treatment approach and setting (intensity, inpatient, day patient, or outpatient treatment) and their therapist, in accordance with German national treatment guidelines for anorexia nervosa.11 Moreover, comparisons of applied dosage and intensity of treatment showed that all patients— irrespective of treatment allocation—averaged a similar number of outpatient sessions over the course of the treatment and follow-up periods (about 40 sessions). These data partly reflect an important achievement of the German health-care system: that access to psychotherapy treatment is covered by insurance. However, patients allocated optimised treatment as usual needed additional inpatient treatment more frequently (41%) than either those assigned focal psychodynamic therapy (23%) or enhanced cognitive behaviour therapy (35%).
OMG! I have never seen such intensive treatment-as-usual in a clinical trial. I doubt anything like this treatment would be available elsewhere in the world as standard care.
This description raises a number of disturbing questions about the trial:
Why would any German women with anorexia enroll in the clinical trial? Although a desire to contribute to science is sometimes a factor, the main reason for patients entering clinical trials are because they think they will get better treatment and maybe because they think they can get a preferred treatment which they cannot get it elsewhere. But, if this is the situation of routine care in Germany, why would eligible women not just remain in routine care without the complications of being in a clinical trial?
At one point, the authors claim that 1% of the population has a diagnosis of anorexia. That represents a lot of women. Yet, they were only able to randomize 242 patients, despite a massive two-year effort to recruit patients involving 10 German departments of psychotherapy and psychosomatic medicine. It appears that a very small minority of the available patients were recruited, raising questions about the representativeness of the sample.
Patients had little incentive to remain in the clinical trial rather than dropping out. Dropping out of the clinical trial would still give them access to free treatment–without the hassle of remaining in the trial.
In a more typical trial, patients assigned to treatment as usual are provided with a list of referrals. Often few bother to complete a referral or remain in treatment, and so we can assume that the treatment-as-usual condition usually represents minimal treatment, providing a suitable comparison with a positive outcome for more active, free treatment. In the United States, patients enrolling in clinical trials often either do not have health insurance or can find only providers who will not accept what health insurance they have for the treatment they want. Patients in the United States enter a clinical trial just to get the possibility of treatment, very different circumstances than in Germany.
Overall, no matter what condition patients were assigned, all received about the same amount of outpatient psychotherapy, about 40 sessions. How could these authors have expected to find a substantial difference between the two manualized treatments and this intensity of routine care? Differences between groups of the magnitude they assumed in calculating sample sizes under these conditions would be truly extraordinary.
Alot of attention and support is provided in 40 sessions of such psychotherapy, making it difficult to detect the specific effects provided by the manualized therapies, above and beyond the attention support they provide..
In short, the manualized treatments were doomed to null findings in comparison to treatment as usual. The only thing really unexpected about this trial is that all three conditions did so poorly.
What is a comparison/control group supposed to accomplish, anyway?
Investigators undertaking randomized controlled trials of psychotherapies know about the necessity of comparison/control groups, but they generally understand less the implication of their choice of a comparison/control group.
Most evidence-based treatments earned their status by proving superior in a clinical trial to a control group such as wait list or no treatment at all. Such comparisons provide the backbone to claims of evidence-based treatments, but are not particularly informative. It may simply be that many manualized, structured treatments are no better than other active treatments patients have similar intensity of treatment, positive expectations, and attention and support.
Some investigators, however, are less interested in establishing the efficacy of treatments, then in demonstrating the effectiveness of particular treatments over what is already being done in the community. Effectiveness studies typically find small effects been obtained in straw-man comparisons between treatments and the weak effects observed in control groups.
But even if their intention is to conduct an effectiveness study, investigators need to better describe the nature of of treatment as usual, if they are to make reasonable generalizations to other clinical and health system contexts.
We know that the optimized treatment as usual was exceptionally intensive, but we have no idea from the published article what it entailed, except lots of treatment, as much as what was provided provided in the active treatment conditions. It may even be that some of the women assigned to optimized treatment obtained therapists providing much the same treatment.
Again, if all of the conditions had done well in terms of improved patient outcomes, then we could have concluded that introducing manualized treatment does not accomplish much in Germany at least. But my assessment is that none of the three conditions did particularly well.
The optimized treatment as usual is intensive but not evidence-based. In my last blog post, we viewed a situation in which less treatment proved better than more. Maybe the availability of intensive and extensive treatment discourages women from taking responsibility for their health threatening condition. They do not improve, simply because they can always get more treatment. That is simply a hypothesis, but Germany is spending lots of money assuming that it is incorrect.
Why Germany may not be the best place to do a clinical trial for treatment of anorexia.
Germany may not be an appropriate place to do a clinical trial of treatment for anorexia for a number of reasons:
- The ready availability of free, intensive treatment prevents recruitment of a large, representative sample of women with anorexia to a potentially burdensome clinical trial.
- There is less incentive for women to remain in the study once they are enrolled because they can always drop out and get the same intensity of treatment elsewhere.
- The control/comparison group of “optimized” treatment as usual complied with the extensive requirements of the German national treatment guidelines for anorexia nervosa. But these standards are not evidence-based and appear to have produced mediocre outcomes in at least this trial.
- Treatment as usual available to everyone is not necessarily effective, but it precludes detecting incremental improvements obtained by less intensive, but focused treatments.
Prasad and Ioannidis have recently called attention to the pervasiveness of non-evidence-based medical treatments and practice guidelines that are not either cost-effective, ensuring good patient outcomes, or avoiding unnecessary risks. They propose de-implementing such unproven practices, but acknowledge the likelihood that cultural values, vested interests, and politics can interfere with efforts to subject established but unproven practices to empirical test.
Surely, that would be the case in any effort to de-implement guidelines the treatment of anorexia in Germany.
The potentially life-threatening nature of anorexia may discourage any temporary suspension of treatment guidelines until evidence can be obtained. But we need only to look to the example of similarly life-threatening cancers where improved treatments only came about only when investigators were able to suspend well-established but unproven treatments and conduct randomized trials.
It would be unethical to assigned women with anorexia to waitlist control or no treatment when free treatment is readily available in the community. So, there may be no other options but to use treatment has usual has a control condition.
If so, a finding of no differences between groups is almost certainly guaranteed. And given the poor performance of routine care observed in this study, such results were not represent the familiar Dodo Bird Verdict for comparisons between psychotherapies in which all of the treatments were winners in all get prizes.
Why it may be premature to conduct randomized trials of treatment of anorexia.
This may well be, as the investigators proclaim in their press release, the largest ever RCT of treatment for anorexia. But it is very difficult to make sense of it, other than to conclude that no treatments, including treatment as usual, had particularly impressive results.
For me, this study highlights the anonymous barriers to conducting a well-controlled RCT for anorexia with patients representative of the kinds that would seek treatment in real-world clinical context.
There are unsolved issues of patient dropout and retention for follow-up that seriously threaten the integrity of any results. We just do not know how to recruit a representative sample of patients with anorexia and keep them in therapy and around for follow-up.
Maybe we should ask women with anorexia about what they think. Maybe we could enlist some of them to assist in a design of a randomized trial or at least a treatment investigators could retain sufficient numbers of them to conduct a randomized trial
I am not sure how we would otherwise get this understanding without involving women with anorexia in the design of treatment in future clinical trials.
There are unsolved issues of medical surveillance and co-treatment confounding. Anorexia poses physical health problems in the threats associated with sudden weight loss. But we do not have evidence-based protocols in place for standardizing surveillance and decision-making.
Before we undertake massive randomized trials such as ANTOP, we need to get information to set basic parameters from nonrandomized but nonetheless informative small-scale studies. Obviously the investigators in this study could not even estimate effect sizes in order to set sample sizes.
Well, you presumably having made it through this long read, what do you think?