COBRA study would have shown homeopathy can be substituted for cognitive behavior therapy for depression

If The Lancet COBRA study had evaluated homeopathy rather than behavioural activation (BA), homeopathy would likely have similarly been found “non-inferior” to cognitive behavior therapy.

This is not an argument for treating depression with homeopathy, but an argument that the 14 talented authors of The Lancet COBRA study stacked the deck for their conclusion that BA could be substituted for CBT in routine care for depression without loss of effectiveness. Conflict of interest and catering to politics intruded on science in the COBRA trial.

If a study like COBRA produces phenomenally similar results with treatments based on distinct mechanisms of change, one possibility is that background nonspecific factors are dominating the results. Insert homeopathy, a bogus treatment with strong nonspecific effects, in place of BA, and a non-inferiority may well be shown.

Why homeopathy?

Homeopathy involves diluting a substance so thoroughly that no molecules are likely to be present in what is administered to patients. The original substance is first diluted to one part per 10,000 part alcohol or distilled water. This process is repeated six times, ending up with the original material diluted by a factor of 100−6=10−12 .

Nonetheless, a super diluted and essentially inert substance is selected and delivered within a complex ritual.  The choice of the particular substance being diluted and the extent of its dilution is determined with detailed questioning of patients about their background, life style, and personal functioning. Naïve and unskeptical patients are likely to perceive themselves as receiving exceptionally personalized medicine delivered by a sympathetic and caring provider. Homeopathy thus has potentially strong nonspecific (placebo) elements that may be lacking in the briefer and less attentive encounters of routine medical care.

As an academic editor at PLOS One, I received considerable criticism for having accepted a failed trial of homeopathy for depression. The study had been funded by the German government and had fallen miserably short in efforts to recruit the intended sample size. I felt the study should be published in PLOS One  to provide evidence whether such and worthless studies should be undertaken in the future. But I also wanted readers to have the opportunity to see what I had learned from the article about just how ritualized homeopathy can be, with a strong potential for placebo effects.

Presumably, readers would then be better equipped to evaluate when authors claim in other contexts that homeopathy is effective from clinical trials with it was inadequate control of nonspecific effects. But that is also a pervasive problem in psychotherapy trials [ 1,  2 ]  that do not have a suitable comparison/control group.

I have tried to reinforce this message in the evaluation of complementary or integrative treatments in Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned.

The Lancet COBRA study

The Lancet COBRA study has received extraordinary promotion as evidence for the cost-effectiveness of substituting behavioural activation therapy (BA) delivered by minimally trained professionals for cognitive behaviour therapy (CBT) for depression. The study  is serving as the basis for proposals to cut costs in the UK National Health Service by replacing more expensive clinical psychologists with less trained and experienced providers.

Coached by the Science Media Centre, the authors of The Lancet study focused our attention on their finding no inferiority of BA to CBT. They are distracting us from the more important question of whether either treatment had any advantage over nonspecific interventions in the unusual context in which they were evaluated.

The editorial accompanying the COBRA study suggest a BA involves a simple message delivered by providers with very little training:

“Life will inevitably throw obstacles at you, and you will feel down. When you do, stay active. Do not quit. I will help you get active again.”

I encourage readers to stop and think how depressed persons suffering substantial impairment, including reduced ability to experience pleasure, would respond to such suggestions. It sounds all too much like the “Snap out of it, Debbie” they may have already heard from people around them or in their own self-blame.

Snap out of it, Debbie (from South Park)

 BA by any other name…

Actually, this kind of activation is routinely provided in in primary care in some countries as a first stage treatment in a stepped care approach to depression.

In such a system, when emergent mild to moderate depressive symptoms are uncovered in a primary medical care setting, providers are encouraged neither to initiate an active treatment nor even make a formal psychiatric diagnosis of a condition that could prove self-limiting with a brief passage of time. Rather, providers are encouraged to defer diagnosis and schedule a follow-up appointment. This is more than simple watchful waiting. Until the next appointment, providers encourage patients to undertake some guided self-help, including engagement in pleasant activities of their choice, much as apparently done in the BA condition in the COBRA study. Increasingly, they may encourage Internet-based therapy.

In a few parts of the UK, general practitioners may refer patients to a green gym.

green gym

It’s now appreciated that to have any effectiveness, such prescriptions have to be made in a relationship of supportive accountability. For patients to adhere adequately to such prescriptions and not feel they are simply being dismissed by the provider and sent away. Patients need to have a sense that the prescription is occurring within the context of a relationship with someone who cares with whether they carry out and benefit from the prescription.

Used in this way, this BA component of stepped care could possibly be part of reducing unnecessary medication and the need for more intensive treatment. However, evaluation of cost effectiveness is complicated by the need for a support structure in which treatment can be monitored, including any antidepressant medication that is subsequently prescribed. Otherwise, the needs of a substantial number of patients needing more intensive, quality care for depression would be neglected.

The shortcomings of COBRA as an evaluation of BA in context

COBRA does not provide an evaluation of any system offering BA to the large pool of patients who do not require more intensive treatment in a system where they would be provided appropriate timely evaluation and referral onwards.

It is the nature of mild to moderate depressive symptoms being presented in primary care, especially when patients are not specifically seeking mental health treatment, that the threshold for a formal diagnosis of major depression is often met by the minimum or only one more than the five required symptoms. Diagnoses are of necessity unreliable, in part because the judgment of particular symptoms meeting a minimal threshold of severity is unreliable. After a brief passage of time and in the absence of formal treatment, a substantial proportion of patients will no longer meet diagnostic criteria.

COBRA also does not evaluate BA versus CBT in the more select population that participates in clinical trials of treatment for depression. Sir David Goldberg is credited  with first describing the filters that operate on the pathway of patients from presenting a complex combination of problems in living and psychiatric symptoms in primary medical care to treatment in specialty settings.

Results of the COBRA study cannot be meaningfully integrated into the existing literature concerning BA as a component of stepped care or treatment for depression that is sufficient in itself.

More recently, I reviewed in detail The Lancet COBRA study, highlighting how one of the most ambitious and heavily promoted psychotherapy studies ever – was noninformative.  The authors’ claim was unwarranted that it would be wise to substitute BA delivered by minimally trained providers for cognitive behavior therapy delivered by clinical psychologists.

I refer readers to that blog post for further elaboration of some points I will be making here. For instance, some readers might want to refresh their sense of how a noninferiority trial differs from a conventional comparison of two treatments.

Risk of bias in noninferiority trial

 Published reports of clinical trials are notoriously unreliable and biased in terms of the authors’ favored conclusions.

With the typical evaluation of an active treatment versus a control condition, the risk of bias is that reported results will favor the active treatment. However, the issue of bias in a noninferiority trial is more complex. The investigators’ interest is in demonstrating that within certain limits, there are no significant differences between two treatments. Yet, although it is not always tested directly, the intention is to show that this lack of difference is due them both being effective, rather than ineffective.

In COBRA, the authors’ clear intention was to show that less expensive BA was not inferior to CBT, with the assumption that both were effective. Biases can emerge from building in features of the design, analysis, and interpretation of the study that minimized differences between these two treatments. But bias can also arise from a study design in which nonspecific effects are distributed across interventions so that any difference in active ingredients is obscured by shared features of the circumstances in which the interventions are delivered. As in Alice in Wonderland [https://en.wikipedia.org/wiki/Dodo_bird_verdict ], the race is rigged so that almost everybody can get a prize.

Why COBRA could have shown almost any treatment with nonspecific effects was noninferior to CBT for depression

 1.The investigators chose a population and a recruitment strategy that increase the likelihood that patients participating in the trial would likely get better with minimal support and contact available in either of the two conditions – BA versus CBT.

The recruited patients were not actively seeking treatment. They were identified from records of GPs has having had a diagnosis of depression, but were required to not currently being in psychotherapy.

GP recording of a diagnosis of depression has poor concordance with a formal, structured interview-based diagnosis, with considerable overdiagnosis and overtreatment.

A recent Dutch study found that persons meeting interview-based criteria for major depression in the community who do not have a past history of treatment mostly are not found to be depressed upon re-interview.

To be eligible for participation in the study, the patients also had to meet criteria for major depression in a semi structured research interview with (Structured Clinical Interview for the Diagnostic and Statistical Manual of  Mental Disorders, Fourth Edition [SCID]. Diagnoses with the SCID obtained under these circumstances are also likely to have a considerable proportion of false positives.

A dirty secret from someone who has supervised thousands of SCID interviews of medical patients. The developers of the SCID recognized that it yielded a lot of false positives and inflated rates of disorder among patients who are not seeking mental health care.

They attempted to compensate by requiring that respondents not only endorse symptoms, but indicate that the symptoms are a source of impairment. This is the so-called clinical significance criterion. Respondents automatically meet the criterion if they are seeking mental health treatment. Those who are not seeking treatment are asked directly whether the symptoms impair them. This is a particularly on validated aspect of the SCID in patients typically do not endorse their symptoms as a source of impairment.

When we asked breast cancer patients who otherwise met criteria for depression with the SCID whether the depressive symptoms impaired them, they uniformly said something like ‘No, my cancer impairs me.’ When we conducted a systematic study of the clinical significance criterion, we found that whether or not it was endorsed substantially affected individual in overall rates of diagnosis. Robert Spitzer, who developed the SCID interview along with his wife Janet Williams, conceded to me in a symposium that application of the clinical significance criterion was a failure.

What is the relevance in a discussion of the COBRA study? I would wager that the authors, like most investigators who use the SCID, did not inquire about the clinical significance criterion, and as a result they had a lot of false positives.

The population being sampled in the recruitment strategy used in COBRA is likely to yield a sample unrepresentative of patients participating in the usual trials of psychotherapy and medication for depression.

2. Most patients participating in COBRA reported already receiving antidepressants at baseline, but adherence and follow-up are unknown, but likely to be inadequate.

Notoriously, patients receiving a prescription for an antidepressant in primary care actually take the medication inconsistently and for only a short time, if at all. They receive inadequate follow-up and reassessment. Their depression outcomes may actually be poorer than for patients receiving a pill placebo in the context of a clinical trial, where there is blinding and a high degree of positive expectations, attention and support.

Studies, including one by an author of the COBRA study suggests that augmenting adequately managed treatment with antidepressants with psychotherapy is unlikely to improve outcomes.

We’re stumbling upon one of the more messy features of COBRA. Most patients had already been prescribed medication at baseline, but their adherence and follow-up is left unreported, but is likely to be poor. The prescription is likely to have been made up to two years before baseline.

It would not be cost-effective to introduce psychotherapy to such a sample without reassessing whether they were adequately receiving medication. Such a sample would also be highly susceptible to nonspecific interventions providing positive expectations, support, and attention that they are not receiving in their antidepressant treatment. There are multiple ways in which nonspecific effects could improve outcomes – perhaps by improving adherence, but perhaps because of the healing effects of support on mild depressive symptoms.

3. The COBRA authors’ way of dealing with co-treatment with antidepressants blocked readers ability to independently evaluate main effects and interactions with BA versus CBT.

 The authors used antidepressant treatment as a stratification factor, insuring that the 70% of patients receiving them were evenly distributed the BA in CBT conditions. This strategy made it more difficult to separate effects of antidepressants. However, the problem is compounded by the authors failure to provide subgroup analyses based on whether patients had received an antidepressant prescription, as well as the authors failure to provide any descriptions of the extent to which patients received management of their antidepressants at baseline or during active psychotherapy and follow-up. The authors incorporated data concerning the cost of medication into their economic analyses, but did not report the data in a way that could be scrutinized.

I anticipate requesting these data from the authors to find out more, although they have not responded to my previous query concerning anomalies in the reporting of how long since patients had first received a prescription for antidepressants.

4. The 12 month assessment designated as the primary outcomes capitalized on natural recovery patterns, unreliability of initial diagnosis, and simple regression to the mean.

Depression identified in the community and in primary care patient populations is variable in the course, but typically resolves in nine months. Making reassessment of primary outcomes at 12 months increases the likelihood that effects of active ingredients of the two treatments would be lost in a natural recovery process.

5. The intensity of treatment (allowable number of 20 sessions plus for additional sessions) offered in the study exceeded what is available in typical psychotherapy trials and exceeded what was actually accessed by patients.

Allowing this level of intensity of treatment generates a lot of noise in any interpretation of the resulting data. Offering so much treatment encourages patients dropping out, with the loss of their follow-up data. We can’t tell if they simply dropped out because they had received what they perceived as sufficient treatment or if they were dissatisfied. This intensity of offered treatment reduces generalizability to what actually occurs in routine care and comparing and contrasting results of the COBRA study to the existing literature.

 6. The low rate of actual uptake of psychotherapy and retention of patients for follow-up present serious problems for interpreting the results of the COBRA study.

Intent to treat analyses with imputation of missing data are simply voodoo statistics with so much missing data. Imputation and other multivariate techniques make the assumption that data are missing at random, but as I just noted, this is an improbable assumption. [I refer readers back to my previous blog post who want to learn more about intent to treat versus per-protocol analyses].

The authors cite past literature in their choice to emphasize the per-protocol analyses. That means that they based their interpretation of the results on 135 of 221 patients originally assigned to the BA and in the 151 of 219 patients originally signed to CBT. This is a messy approach and precludes generalizing back to original assignment. That’s why that intent to treat analyses are emphasized in conventional evaluations of psychotherapy.

A skeptical view of what will be done with the COBRA data

 The authors clear intent was to produce data supporting an argument that more expensive clinical psychologists could be replaced by less trained clinicians providing a simplified treatment. The striking lack of differences between BA and CBT might be seen as strong evidence that BA could replace CBT. Yet, I am suggesting that the striking lack of differences could also indicate features built into the design that swamped any differences in limited any generalizability to what would happen if all depressed patients were referred to BA delivered by clinicians with little training versus CBT. I’m arguing that homeopathy would have done as well.

BA is already being implemented in the UK and elsewhere as part of stepped care initiatives for depression. Inclusion of BA is inadequately evaluated, as is the overall strategy of stepped care. See here for an excellent review of stepped care initiatives and a tentative conclusion that they are moderately effective, but that many questions remain.

If the COBRA authors were most committed to improving the quality of depression care in the UK, they would’ve either designed their study as a fairer test of substituting BA for CBT or they would have tackled the more urgent task of evaluating rigorously whether stepped care initiatives work.

Years ago, collaborative care programs for depression were touted as reducing overall costs. These programs, which were found to be robustly effective in many contexts, involved placing depression managers in primary care to assist the GPs in improved monitoring and management of treatment. Often the most immediate and effective improvement was that patients got adequate follow-up, where previously they were simply being ignored. Collaborative care programs did not prove to be cheaper, and not surprising, because better care is often more expensive than ineptly provided inadequate care.

We should be extremely skeptical of experienced investigators who claim that they demonstrate that they can cut costs and maintain quality with a wholesale reduction in the level of training of providers treating depression, a complex and heterogeneous disorder, especially when their expensive study fails to deal with this complexity and heterogeneity.

 

A skeptical look at The Lancet behavioural activation versus CBT for depression (COBRA) study

A skeptical look at:

Richards DA, Ekers D, McMillan D, Taylor RS, Byford S, Warren FC, Barrett B, Farrand PA, Gilbody S, Kuyken W, O’Mahen H. et al. Cost and Outcome of Behavioural Activation versus Cognitive Behavioural Therapy for Depression (COBRA): a randomised, controlled, non-inferiority trial. The Lancet. 2016 Jul 23.

 

humpty dumpty fallenAll the Queen’s horses and all the Queen’s men (and a few women) can’t put a flawed depression trial back together again.

Were they working below their pay grade? The 14 authors of the study collectively have impressive expertise. They claim to have obtained extensive consultation in designing and implementing the trial. Yet they produced:

  • A study doomed from the start by serious methodological problems from yielding any scientifically valid and generalizable results.
  • Instead, they produced tortured results that pander to policymakers seeking an illusory cheap fix.

 

Why the interests of persons with mental health problems are not served by translating the hype from a wasteful project into clinical practice and policy.

Maybe you were shocked and awed, as I was by the publicity campaign mounted by The Lancet on behalf of a terribly flawed article in The Lancet Psychiatry about whether locked inpatient wards fail suicidal patients.

It was a minor league effort compared to the campaign orchestrated by the Science Media Centre for a recent article in The Lancet The study concerned a noninferiority trial of behavioural activation (BA) versus cognitive behaviour therapy (CBT) for depression. The message echoing through social media without any critical response was behavioural activation for depression delivered by minimally trained mental health workers was cheaper but just as effective as cognitive behavioural therapy delivered by clinical psychologists.

Reflecting the success of the campaign, the immediate reactions to the article are like nothing I have recently seen. Here are the published altmetrics for an article with an extraordinary overall score of 696 (!) as of August 24, 2016.

altmetrics

 

Here is the press release.

Here is the full article reporting the study, which nobody in the Twitter storm seems to have consulted.

some news coverage

 

 

 

 

 

 

 

 

 

Here are supplementary materials.

Here is the well-orchestrated,uncritical response from tweeters, UK academics and policy makers.

.

The Basics of the study

The study was an open-label  two-armed non-inferiority trial of behavioural activation therapy (BA) versus cognitive behavioural therapy (CBT) for depression with no non-specific comparison/control treatment.

The primary outcome was depression symptoms measured with the self-report PHQ-9 at 12 months.

Delivery of both BA and CBT followed written manuals for a maximum of 20 60-minute sessions over 16 weeks, but with the option of four additional booster sessions if the patients wanted them. Receipt of eight sessions was considered an adequate exposure to the treatments.

The BA was delivered by

Junior mental health professionals —graduates trained to deliver guided self-help interventions, but with neither professional mental health qualifications nor formal training in psychological therapies—delivered an individually tailored programme re-engaging participants with positive environmental stimuli and developing depression management strategies.

CBT, in contrast, was delivered by

Professional or equivalently qualified psychotherapists, accredited as CBT therapists with the British Association of Behavioural and Cognitive Psychotherapy, with a postgraduate diploma in CBT.

The interpretation provided by the journal article:

Junior mental health workers with no professional training in psychological therapies can deliver behavioural activation, a simple psychological treatment, with no lesser effect than CBT has and at less cost. Effective psychological therapy for depression can be delivered without the need for costly and highly trained professionals.

A non-inferiority trial

An NHS website explains non-inferiority trials:

The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.

I have previously critiqued  [ 1,   2 ] noninferiority psychotherapy trials. I will simply reproduce a passage here:

Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials  found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs  so that misuse of the design in the service of getting publishable results is more readily detected.

Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.

Noninferiority trials are very popular among Pharma companies marketing rivals to popular medications. They use noninferiority trials to show that their brand is no worse than the already popular medication. But by not including a nonspecific control group, the trialists don’t bother to show that either of the medications is more effective than placebo under the conditions in which they were administered in these trials. Often, the medication dominating the market had achieved FDA approval for advertising with evidence of only being only modestly effective. So, potato are noninferior to spuds.

Compounding the problems of a noninferiority trial many times over

Let’s not dwell on this trial being a noninferiority trial, although I will return to the problem of knowing what would happen in the absence of either intervention or with a credible, nonspecific control group. Let’s focus instead on some other features of the trial that seriously compromised an already compromised trial.

Essentially, we will see that the investigators reached out to primary care patients who were mostly already receiving treatment with antidepressants, but unlikely with the support and positive expectations or even adherence necessary to obtain benefit. By providing these nonspecific factors, any psychological intervention would likely to prove effective in the short run.

The total amount of treatment offered substantially exceeded what is typically provided in clinical trials of CBT. However, uptake and actual receipt of treatment is likely to be low in such a population recruited by outreach, not active seeking treatment. So, noise is being introduced by offering so much treatment.

A considerable proportion of primary care patients identified as depressed won’t accept treatment or will not accept the full intensity available. However, without careful consideration of data that are probably not available for this trial, it will be ambiguous whether the amount of treatment received by particular patients represented dropping out prematurely or simply dropping out when they were satisfied with the benefits they had been received. Undoubtedly, failures to receive minimal intensity of treatment and even the overall amount of treatment received by particular patients are substantial and complexly determined, but nonrandom and differ between patients.

Dropping out of treatment is often associated with dropping out of a study – further data not being available for follow-up. These conditions set the stage for considerable challenges in analyzing and generalizing from whatever data are available. Clearly, the assumption of data being missing at random will be violated. But that is the key assumption required by multivariate statistical strategies that attempt to compensate for incomplete data.

12 months – the time point designated for assessment of primary outcomes – is likely to exceed the duration of a depressive episode in a primary care population, which is approximately 9 months. In the absence of a nonspecific active comparison/control or even a waitlist control group, recovery that would’ve occurred in the absence of treatment will be ascribed to the two active interventions being studied.

12 months is likely to exceed substantially the end of any treatment being received and so effects of any active treatments are likely to dissipate. The design allowed for up to four booster sessions. However, access to booster sessions was not controlled. It was not assigned and access cannot be assumed to be random. As we will see when we examined the CONSORT flowchart for the study, there was no increase in the number of patients receiving an adequate exposure to psychotherapy from 6 to 12 months. That is likely to indicate that most active treatment had ended within the first six months.

Focusing on 12 months outcomes, rather than six months, increases the unreliability of any analyses because more 12 month outcomes will be missing than what were available at six months.

Taken together, the excessively long 12 month follow-up being designated as primary outcome and the unusually amount of treatment being offered, but not necessarily being accepted, create substantial problems of missing data that cannot be compensated by typical imputation and multivariate methods; difficulties interpreting results in terms of the amount of treatment actually received; and comparison to the primary outcomes typical trials of psychotherapy being offered to patients seeking psychotherapy.

The authors’ multivariate analysis strategy was inappropriate, given the amount of missing data and the violation of data being missing at random..

Surely the more experienced of the 14 authors of The Lancet should have anticipated these problems and the low likelihood that this study would produce generalizable results.

Recruitment of patients

The article states:

 We recruited participants by searching the electronic case records of general practices and psychological therapy services for patients with depression, identifying potential participants from depression classification codes. Practices or services contacted patients to seek permission for researcher contact. The research team interviewed those that responded, provided detailed information on the study, took informed written consent, and assessed people for eligibility.

Eligibility criteria

Eligible participants were adults aged 18 years or older who met diagnostic criteria for major depressive disorder assessed by researchers using a standard clinical interview (Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition [SCID]9). We excluded people at interview who were receiving psychological therapy, were alcohol or drug dependent, were acutely suicidal or had attempted suicide in the previous 2 months, or were cognitively impaired, or who had bipolar disorder or psychosis or psychotic symptoms.

Table 3 Patient Characteristics reveals a couple of things about co-treatment with antidepressants that must be taken into consideration in evaluating the design and interpreting results.

antidepressant stratificationAnd

 

antidepressant stratification

So, investigators did not wait for patients to refer themselves or to be referred by physicians to the trial, they reached out to them. Applying their exclusion criteria, the investigators obtained a sample that mostly had been prescribed antidepressants, with no indication that the prescription had ended. The length of time which 70% patients had been on antidepressants was highly skewed, with a mean of 164 weeks and a median of 19. These figures strain credibility. I have reached out to the authors with a question whether there is an error in the table and await clarification.

We cannot assume that patients whose records indicate they were prescribed an antidepressant were refilling their prescriptions at the time of recruitment, were faithfully adhering, or were even being monitored.  The length of time since initial prescription increases skepticism whether there was adequate exposure to antidepressants at the time of recruitment to the study..

The inadequacy of antidepressant treatment in routine primary care

Refilling of first prescriptions of antidepressants in primary care, adherence, and monitoring and follow-up by providers are notoriously low.

Guideline-congruent treatment with antidepressants in the United States requires a five week follow up visit, which is only infrequently received in routine. When the five week follow-up visit is kept,

Rates of improvement in depression associated with prescription of an antidepressant in routine care approximate that achieved with pill placebo in antidepressant trials. The reasons for this are complex: but center on depression being of mild to moderate severity in primary care. Perhaps more important is that the attention, provisional positive expectations and support provided in routine primary care is lower than what is provided in the blinded pill-placebo condition in clinical trials. In blinded trials, neither the provider nor patient know whether the active medication or a pill placebo is being administered. The famous NIMH National Collaborative Study found, not surprisingly, that response in the pill-placebo condition was predicted by the quality of the therapeutic alliance between patient and provider.

In The Lancet study, readers are not provided with important baseline characteristics of the patients that are crucial to interpreting the results and their generalizability. We don’t know the baseline or subsequent adequacy of antidepressant treatment or of the quality of the routine care being provided for it. Given that antidepressants are not the first-line treatment for mild to moderate depression, we don’t know why these patients were not receiving psychotherapy. We don’t know even whether the recruited patients were previously offered psychotherapy and with what uptake, except that they were not receiving it two months prior to recruitment.

There is a fascinating missing story about why these patients were not receiving psychotherapy at the start of the study and why and with what accuracy they were described as taking antidepressants.

Readers are not told what happened to antidepressant treatment during the trial. To what extent did patients who were not receiving antidepressants begin doing so? As result of the more frequent contact and support provided in the psychotherapy, to what extent was there improvement in adherence, as well as the ongoing support inattention per providers and attention from primary care providers?

Depression identified in primary care is a highly heterogeneous condition, more so than among patients recruited from treatment in specialty mental health settings. Much of the depression has only the minimum number of symptoms required for a diagnosis or one more. The reliability of diagnosis is therefore lower than in specialty mental health settings. Much of the depression and anxiety disorders identified with semi structured research instruments in populations that is not selected for having sought treatment resolves itself without formal intervention.

The investigators were using less than ideal methods to recruit patients from a population in which major depressive disorder is highly heterogeneous and subject to recovery in the absence of treatment by the time point designated for assessment of primary outcome. They did not sufficiently address the problem of a high level of co-treatment having been prescribed long before the beginning of the study. They did not even assess the extent to which that prescribed treatment had patient adherence or provider monitoring and follow-up. The 12 month follow-up allowed the influence of lots of factors beyond the direct effects of the active ingredients of the two interventions being compared in the absence of a control group.

decline in scores

Examination of a table presented in the supplementary materials suggests that most change occurred in the first six months after enrollment and little thereafter. We don’t know the extent to which there was any treatment beyond the first six-month or what effect it had. A population with clinically significant depression drawn from specially care, some deterioration can be expected after withdrawal of active treatment. In a primary care population, such a graph could be produced in large part because of the recovery from depression that would be observed in the absence of active treatment.

 

Cost-effectiveness analyses reported in the study address the wrong question. These analyses only considered the relative cost of these two active treatments, leaving unaddressed the more basic question of whether it is cost-effective to offer either treatments at this intensity. It might be more cost-effective to have a person with even less mental health training contact patients, inquire about adherence, side effects, and clinical outcomes, and prompt patients to accept another appointment with the GP if an algorithm indicates that would be appropriate.

The intensity of treatment being offered and received

The 20 sessions plus 4 booster sessions of psychotherapy being offered in this trial is considerably higher than the 12 to 16 sessions offered in the typical RCT for depression. Having more sessions available than typical introduces some complications. Results are not comparable to what is found inthe trials offering less treatment. But in a primary care population not actively seeking psychotherapy for depression, there is further complication in that many patients will not access the full 20 sessions. There will be difficulties interpreting results in terms of intensity of treatment because of the heterogeneity of reasons for getting less treatment. Effectively, offering so much therapy to a group that is less inclined to accept psychotherapy introduces a lot of noise in trying to make sense of the data, particularly when cost-effectiveness is an issue.

This excerpt from the CONSORT flowchart demonstrates the multiple problems associated with offering so much treatment to a population that was not actively seeking it and yet needing twelve-month data for interpreting the results of a trial.

CONSORT chart

The number of patients who had no data at six months increased by 12 months. There was apparently no increase in the number of patients receiving an adequate exposure to psychotherapy

Why the interest of people with mental health problems are not served by the results claimed by these investigators being translated into clinical practice.

 The UK National Health Service (NHS) is seriously underfunding mental health services. Patients being referred for psychotherapy from primary care have waiting periods that often exceed the expected length of an episode of depression in primary care. Simply waiting for depression to remit without treatment is not necessarily cost effective because of the unneeded suffering, role impairment, and associated social and personal costs of an episode that persist. Moreover, there is a subgroup of depressed patients in primary care who need more intensive or different treatment. Guidelines recommending assessment after five weeks are not usually reflected in actual clinical practice.

There’s a desperate search for ways in which costs can be further reduced in the NHS. The Lancet study is being interpreted to suggest that more expensive clinical psychologists can be replaced by less expensive and less trained mental health workers. Uncritically and literally accepted, the message is that clinical psychologist working half-time addressing particular comment clinical problems can be replaced by less expensive mental health workers achieving the same effects in the same amount of time.

The pragmatic translation of these claims into practice are replace have a clinical psychologists with cheaper mental health workers. I don’t think it’s being cynical to anticipate the NHS seizing upon an opportunity to reduce costs, while ignoring effects on overall quality of care.

Care for the severely mentally ill in the NHS is already seriously compromised for other reasons. Patients experiencing an acute or chronic breakdown in psychological and social functioning often do not get minimal support and contact time to avoid more intensive and costly interventions like hospitalization. I think would be naïve to expect that the resources freed up by replacing a substantial portion of clinical psychologists with minimally trained mental health workers would be put into addressing unmeet needs of the severely mentally ill.

Although not always labeled as such, some form of BA is integral to stepped care approaches to depression in primary care. Before being prescribed antidepressants or being referred to psychotherapy, patients are encouraged to increased pleasant activities. In Scotland, they may be even given free movie passes for participating in cleanup of parks.

A stepped care approach is attractive, but evaluation of cost effectiveness is complicated by consideration of the need for adequate management of antidepressants for those patients who go on to that level of care.

If we are considering a sample of primary care patients mostly already receiving antidepressants, the relevant comparator is introduction of a depression care manager.

Furthermore, there are issues in the adequacy of addressing the needs of patients who do not benefit from lower intensity care. Is the lack of improvement with low levels of care adequately monitored and addressed. Is the uncertain escalation in level of care adequately supported so that referrals are completed?

The results of The Lancet study don’t tell us very much about the adequacy of care that patients who were enrolled in the study were receiving or whether BA is as effective as CBT as stand-alone treatments or whether nonspecific treatments would’ve done as well. We don’t even know whether patients assigned to a waitlist control would’ve shown as much improvement by 12 months and we have reason to suspect that many would.

I’m sure that the administrations of NHS are delighted with the positive reception of this study. I think it should be greeted with considerable skepticism. I am disappointed that the huge resources that went into conducting this study which could have put into more informative and useful research.

I end with two questions for the 14 authors – Can you recognize the shortcomings of your study and its interpretation that you have offered? Are you at least a little uncomfortable with the use to which these results will be put?

 

 

 

 

Study protocol violations, outcomes switching, adverse events misreporting: A peek under the hood

An extraordinary, must-read article is now available open access:

Jureidini, JN, Amsterdam, JD, McHenry, LB. The citalopram CIT-MD-18 pediatric depression trial: Deconstruction of medical ghostwriting, data mischaracterisation and academic malfeasance. International Journal of Risk & Safety in Medicine, vol. 28, no. 1, pp. 33-43, 2016

The authors had access to internal documents written with the belief that they would be left buried in corporate files. However, these documents became publicly available in a class-action product liability suit concerning the marketing of the antidepressant citalopram for treating children and adolescents.

Detailed evidence of ghost writing by industry sponsors has considerable shock value. But there is a broader usefulness to this article allowing us to peek in on the usually hidden processes by which null findings and substantial adverse events are spun into a positive report of the efficacy and safety of a treatment.

another peeking under the hoodWe are able to see behind the scenes how an already underspecified protocol was violated, primary and secondary outcomes were switched or dropped, and adverse events were suppressed in order to obtain the kind of results needed for a planned promotional effort and the FDA approval for use of the drug in these populations.

We can see how subtle changes in analyses that would otherwise go unnoticed can have a profound impact on clinical and public policy.

In so many other situations, we are left only with our skepticism about results being too good to be true. We are usually unable to evaluate independently investigators’ claims because protocols are unavailable, deviations are not noted, analyses are conducted and reported without transparency. Importantly, there usually is no access to data that would be necessary for reanalysis.

ghostwriter_badThe authors whose work is being criticized are among the most prestigious child psychiatrists in the world. The first author is currently President-elect of the American Academy of Child and Adolescent Psychiatry. The journal is among the top psychiatry journals in the world. A subscription is provided as part of membership in the American Psychiatric Association. Appearing in this journal is thus strategic because its readership includes many practitioners and clinicians who will simply defer to academics publishing in a journal they respect, without inclination to look carefully.

Indeed, I encourage readers to go to the original article and read it before proceeding further in the blog. Witness the unmasking of how null findings were turned positive. Unless you had been alerted, would you have detected that something was amiss?

Some readers have participated in multisite trials other than as a lead investigator.  I ask them to imagine that they had had received the manuscript for review and approval and assumed it was vetted by the senior investigators – and only the senior investigators.  Would they have subjected it to the scrutiny needed to detect data manipulation?

I similarly ask reviewers for scientific journals if they would have detected something amiss. Would they have compared the manuscript to the study protocol? Note that when this article was published, they probably would’ve had to contact the authors or the pharmaceutical company.

Welcome to a rich treasure trove

Separate from the civil action that led to these documents and data being released, the federal government later filed criminal charges and false claims act allegations against Forest Laboratories. The pharmaceutical company pleaded guilty and accepted a $313 million fine.

Links to the filing and the announcement from the federal government of a settlement is available in a supplementary blog at Quick Thoughts. That blog post also has rich links to the actual emails accessed by the authors, as well as blog posts by John M Nardo, M.D. that detail the difficulties these authors had publishing the paper we are discussing.

Aside from his popular blog, Dr. Nardo is one of the authors of a reanalysis that was published in The BMJ of a related trial:

Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, Abi-Jaoude E. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ 2015; 351: h4320

My supplementary blog post contains links to discussions of that reanalysis obtained from GlaxoSmithKline, the original publication based on these data, 30 Rapid Responses to the reanalysis The BMJ, as well as federal criminal complaints and the guilty pleading of GlaxoSmithKline.

With Dr. Nardo’s assistance, I’ve assembled a full set of materials that should be valuable in stimulating discussion among senior and junior investigators, as well in student seminars. I agree with Dr. Nardo’s assessment:

I think it’s now our job to insure that all this dedicated work is rewarded with a wide readership, one that helps us move closer to putting this tawdry era behind us…John Mickey Nardo

The citalopram CIT-MD-18 pediatric depression trial

The original article that we will be discussing is:

Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. American Journal of Psychiatry. 2004 Jun 1;161(6):1079-83.

It reports:

An 8-week, randomized, double-blind, placebo-controlled study compared the safety and efficacy of citalopram with placebo in the treatment of children (ages 7–11) and adolescents (ages 12–17) with major depressive disorder.

The results and conclusion:

Results: The overall mean citalopram dose was approximately 24 mg/day. Mean Children’s Depression Rating Scale—Revised scores decreased significantly more from baseline in the citalopram treatment group than in the placebo treatment group, beginning at week 1 and continuing at every observation point to the end of the study (effect size=2.9). The difference in response rate at week 8 between placebo (24%) and citalopram (36%) also was statistically significant. Citalopram treatment was well tolerated. Rates of discontinuation due to adverse events were comparable in the placebo and citalopram groups (5.9% versus 5.6%, respectively). Rhinitis, nausea, and abdominal pain were the only adverse events to occur with a frequency exceeding 10% in either treatment group.

Conclusions: In this population of children and adolescents, treatment with citalopram reduced depressive symptoms to a significantly greater extent than placebo treatment and was well tolerated.

The article ends with an elaboration of what is said in the abstract:

In conclusion, citalopram treatment significantly improved depressive symptoms compared with placebo within 1 week in this population of children and adolescents. No serious adverse events were reported, and the rate of discontinuation due to adverse events among the citalopram-treated patients was comparable to that of placebo. These findings further support the use of citalopram in children and adolescents suffering from major depression.

The study protocol

The protocol for CIT-MD-I8, IND Number 22,368 was obtained from Forest Laboratories. It was dated September 1, 1999 and amended April 8, 2002.

The primary outcome measure was the change from baseline to week 8 on the Children’s Depression Rating Scale-Revised (CDRS-R) total score.

Comparison between citalopram and placebo will be performed using three-way analysis of covariance (ANCOVA) with age group, treatment group and center as the three factors, and the baseline CDRS-R score as covariate.

The secondary outcome measures were the Clinical Global Impression severity and improvement subscales, Kiddie Schedule for Affective Disorders and Schizophrenia – depression module, and Children’s Global Assessment Scale.

Comparison between citalopram and placebo will be performed using the same approach as for the primary efficacy parameter. Two-way ANOVA will be used for CGI-I, since improvement relative to Baseline is inherent in the score.

 There was no formal power analysis but:

The primary efficacy variable is the change from baseline in CDRS-R score at Week 8.

Assuming an effect size (treatment group difference relative to pooled standard deviation) of 0.5, a sample size of 80 patients in each treatment group will provide at least 85% power at an alpha level of 0.05 (two-sided).

The deconstruction

 Selective reporting of subtle departures from the protocol could easily have been missed or simply excused as accidental and inconsequential, except that there was unrestricted access to communication within Forest Laboratories and to the data for reanalysis.

3.2 Data

The fact that Forest controlled the CIT-MD-18 manuscript production allowed for selection of efficacy results to create a favourable impression. The published Wagner et al. article concluded that citalopram produced a significantly greater reduction in depressive symptoms than placebo in this population of children and adolescents [10]. This conclusion was supported by claims that citalopram reduced the mean CDRS-R scores significantly more than placebo beginning at week 1 and at every week thereafter (effect size = 2.9); and that response rates at week 8 were significantly greater for citalopram (36% ) versus placebo (24% ). It was also claimed that there were comparable rates of tolerability and treatment discontinuation for adverse events (citalopram = 5.6% ; placebo = 5.9% ). Our analysis of these data and documents has led us to conclude that these claims were based on a combination of: misleading analysis of the primary outcome and implausible calculation of effect size; introduction of post hoc measures and failure to report negative secondary outcomes; and misleading analysis and reporting of adverse events.

3.2.1 Mischaracterisation of primary outcome

Contrary to the protocol, Forest’s final study report synopsis increased the study sample size by adding eight of nine subjects who, per protocol, should have been excluded because they were inadvertently dispensed unblinded study drug due to a packaging error [23]. The protocol stipulated: “Any patient for whom the blind has been broken will immediately be discontinued from the study and no further efficacy evaluations will be performed” [10]. Appendix Table 6 of the CIT-MD-18 Study Report [24] showed that Forest had performed a primary outcome calculation excluding these subjects (see our Fig. 2). This per protocol exclusion resulted in a ‘negative’ primary efficacy outcome.

Ultimately however, eight of the excluded subjects were added back into the analysis, turning the (albeit marginally) statistically insignificant outcome (p <  0.052) into a statistically significant outcome (p  <  0.038). Despite this change, there was still no clinically meaningful difference in symptom reduction between citalopram and placebo on the mean CDRS-R scores (Fig. 3).

The unblinding error was not reported in the published article.

Forest also failed to follow their protocol stipulated plan for analysis of age-by-treatment interaction. The primary outcome variable was the change in total CDRS-R score at week 8 for the entire citalopram versus placebo group, using a 3-way ANCOVA test of efficacy [24]. Although a significant efficacy value favouring citalopram was produced after including the unblinded subjects in the ANCOVA, this analysis resulted in an age-by-treatment interaction with no significant efficacy demonstrated in children. This important efficacy information was withheld from public scrutiny and was not presented in the published article. Nor did the published article report the power analysis used to determine the sample size, and no adequate description of this analysis was available in either the study protocol or the study report. Moreover, no indication was made in these study documents as to whether Forest originally intended to examine citalopram efficacy in children and adolescent subgroups separately or whether the study was powered to show citalopram efficacy in these subgroups. If so, then it would appear that Forest could not make a claim for efficacy in children (and possibly not even in adolescents). However, if Forest powered the study to make a claim for efficacy in the combined child plus adolescent group, this may have been invalidated as a result of the ANCOVA age-by-treatment interaction and would have shown that citalopram was not effective in children.

A further exaggeration of the effect of citalopram was to report “effect size on the primary outcome measure” of 2.9, which was extraordinary and not consistent with the primary data. This claim was questioned by Martin et al. who criticized the article for miscalculating effect size or using an unconventional calculation, which clouded “communication among investigators and across measures” [25]. The origin of the effect size calculation remained unclear even after Wagner et al. publicly acknowledged an error and stated that “With Cohens method, the effect size was 0.32,” [20] which is more typical of antidepressant trials. Moreover, we note that there was no reference to the calculation of effect size in the study protocol.

3.2.2 Failure to publish negative secondary outcomes, and undeclared inclusion of Post Hoc Outcomes

Wagner et al. failed to publish two of the protocol-specified secondary outcomes, both of which were unfavourable to citalopram. While CGI-S and CGI-I were correctly reported in the published article as negative [10], (see p1081), the Kiddie Schedule for Affective Disorders and Schizophrenia-Present (depression module) and the Children’s Global Assessment Scale (CGAS) were not reported in either the methods or results sections of the published article.

In our view, the omission of secondary outcomes was no accident. On October 15, 2001, Ms. Prescott wrote: “Ive heard through the grapevine that not all the data look as great as the primary outcome data. For these reasons (speed and greater control) I think it makes sense to prepare a draft in-house that can then be provided to Karen Wagner (or whomever) for review and comments” (see Fig. 1). Subsequently, Forest’s Dr. Heydorn wrote on April 17, 2002: “The publications committee discussed target journals, and recommended that the paper be submitted to the American Journal of Psychiatry as a Brief Report. The rationale for this was the following: … As a Brief Report, we feel we can avoid mentioning the lack of statistically significant positive effects at week 8 or study termination for secondary endpoints” [13].

Instead the writers presented post hoc statistically positive results that were not part of the original study protocol or its amendment (visit-by-visit comparison of CDRS-R scores, and ‘Response’, defined as a score of ≤28 on the CDRS-R) as though they were protocol-specified outcomes. For example, ‘Response’ was reported in the results section of the Wagner et al. article between the primary and secondary outcomes, likely predisposing a reader to regard it as more important than the selected secondary measures reported, or even to mistake it for a primary measure.

It is difficult to reconcile what the authors of the original article reported in terms of adverse events and what our “deconstructionists “ found in the unpublished final study report. The deconstruction article also notes that a letter to the editor appearing at the time of publication of the original paper called attention to another citalopram study that remain unpublished, but that was known to be a null study with substantial adverse events.

3.2.3 Mischaracterisation of adverse events

Although Wagner et al. correctly reported that “the rate of discontinuation due to adverse events among citalopram-treated patients was comparable to that of placebo”, the authors failed to mention that the five citalopram-treated subjects discontinuing treatment did so due to one case of hypomania, two of agitation, and one of akathisia. None of these potentially dangerous states of over-arousal occurred with placebo [23]. Furthermore, anxiety occurred in one citalopram patient (and none on placebo) of sufficient severity to temporarily stop the drug and irritability occurred in three citalopram (compared to one placebo). Taken together, these adverse events raise concerns about dangers from the activating effects of citalopram that should have been reported and discussed. Instead Wagner et al. reported “adverse events associated with behavioral activation (such as insomnia or agitation) were not prevalent in this trial” [10] and claimed thatthere were no reports of mania”, without acknowledging the case of hypomania [10].

Furthermore, examination of the final study report revealed that there were many more gastrointestinal adverse events for citalopram than placebo patients. However, Wagner et al. grouped the adverse event data in a way that in effect masked this possibly clinically significantly gastrointestinal intolerance. Finally, the published article also failed to report that one patient on citalopram developed abnormal liver function tests [24].

In a letter to the editor of the American Journal of Psychiatry, Mathews et al. also criticized the manner in which Wagner et al. dealt with adverse outcomes in the CIT-MD-18 data, stating that: “given the recent concerns about the risk of suicidal thoughts and behaviors in children treated with SSRIs, this study could have attempted to shed additional light on the subject” [26] Wagner et al. responded: “At the time the [CIT-MD-18] manuscript was developed, reviewed, and revised, it was not considered necessary to comment further on this topic” [20]. However, concerns about suicidal risk were prevalent before the Wagner et al. article was written and published [27]. In fact, undisclosed in both the published article and Wagner’s letter-to-the-editor, the 2001 negative Lundbeck study had raised concern over heightened suicide risk [10, 20, 21].

A later blog post will discuss the letters to the editor that appeared shortly after the original study in American Journal of Psychiatry. But for now, it would be useful to clarify the status of the negative Lundbeck study at that time.

The letter by Barbe published in AJP  remarked:

It is somewhat surprising that the authors do not compare their results with those of another trial, involving 244 adolescents (13–18-year-olds), that showed no evidence of efficacy of citalopram compared to placebo and a higher level of self-harm (16 [12.9%] of 124 versus nine [7.5%] of 120) in the citalopram group compared to the placebo group (5). Although these data were not available to the public until December 2003, one would expect that the authors, some of whom are employed by the company that produces citalopram in the United States and financed the study, had access to this information. It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

The study authors replied:

It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

Conflict of interest

The authors of the deconstruction study indicate they do not have any conventional industry or speaker’s bureau support to declare, but they have had relevant involvement in litigation. Their disclosure includes:

The authors are not members of any industry-sponsored advisory board or speaker’s bureau, and have no financial interest in any pharmaceutical or medical device company.

Drs. Amsterdam and Jureidini were engaged by Baum, Hedlund, Aristei & Goldman as experts in the Celexa and Lexapro Marketing and Sales Practices Litigation. Dr. McHenry was also engaged as a research consultant in the case. Dr. McHenry is a research consultant for Baum, Hedlund, Aristei & Goldman.

Concluding remarks

I don’t have many illusions about the trustworthiness of the literature reporting clinical trials, whether pharmaceutical or psychotherapy. But I found this deconstruction article quite troubling. Among the authors’ closing observations are:

The research literature on the effectiveness and safety of antidepressants for children and adolescents is relatively small, and therefore vulnerable to distortion by just one or a two badly conducted and/or reported studies. Prescribing rates are high and increasing, so that prescribers who are misinformed by misleading publications risk doing real harm to many children, and wasting valuable health resources.

I recommend readers going to my supplementary blog and reviewing a very similar case of efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. I also recommend another of my blog posts  that summarizes action taken by the US government against both Forest Laboratories and GlaxoSmithKline for promotion of misleading claims about about the efficacy and safety of antidepressants for children and adolescents.

We should scrutinize studies of the efficacy and safety of antidepressants for children and adolescents, because of the weakness of data from relatively small studies with serious difficulties in their methodology and reporting. But we should certainly not stop there. We should critically examine other studies of psychotherapy and psychosocial interventions.

I previously documented [ 1,  2] interference by promoters of the lucrative Triple P Parenting in the implementation of a supposedly independent evaluation of it, including tampering with plans for data analysis. The promoters then followed it up attempting to block publication of a meta-analysis casting doubt on their claims.

But  suppose we are not dealing the threat of conflict of interest associated with high financial stakes as an pharmaceutical companies or a globally promoted psychosocial program. There are still the less clear conflicts associated with investigator egos and the pressures to produce positive results in order to get refunded.  We should require scrutiny of protocols, whether they were faithfully implemented, with the resulting data analyzed according to a priori plans. To do that, we need unrestricted access to data and the opportunity to reanalyze it from multiple perspectives.

Results of clinical trials should be examined wherever possible in replications and extensions in new settings. But this frequently requires resources that are unlikely to be available

We are unlikely ever to see anything for clinical trials resembling the replication initiatives such as the Open Science Collaboration’s (OSC) Replication Project: Psychology. The OSC depends on mass replications involving either samples of college students or recruitment from the Internet. Most of the studies involved in the OSC did not have direct clinical or public health implications. In contrast, clinical trials usually do and require different approaches to insure the trustworthiness of findings that are claimed.

Access to the internal documents of Forest Laboratories revealed a deliberate, concerted effort to produce results consistent with the agenda of vested interests, even where prespecified analyses yielded contradictory findings. There was clear intent. But we don’t need to assume an attempt to deceive and defraud in order to insist on the opportunity to re-examine findings that affect patients and public health. As US Vice President Joseph Biden recently declared, securing advances in biomedicine and public health depends on broad and routine sharing and re-analysis of data.

My usual disclaimer: All views that I express are my own and do not necessarily reflect those of PLOS or other institutional affiliations.

Study: Switching from antidepressants to mindfulness meditation increases relapse

  • A well-designed recent study found that patients with depression in remission who switch from maintenance antidepressants to mindfulness meditation without continuing medication had an increase in relapses.
  • The study is better designed and more transparently reported than a recent British study, but will get none of the British study’s attention.
  • The well-orchestrated promotion of mindfulness raises issues about the lack of checks and balances between investigators’ vested interest, supposedly independent evaluation, and the making of policy.

The study

Huijbers MJ, Spinhoven P, Spijker J, Ruhé HG, van Schaik DJ, van Oppen P, Nolen WA, Ormel J, Kuyken W, van der Wilt GJ, Blom MB. Discontinuation of antidepressant medication after mindfulness-based cognitive therapy for recurrent depression: randomised controlled non-inferiority trial. The British Journal of Psychiatry. 2016 Feb 18:bjp-p.

The study is currently behind a pay wall and does not appear to have a press release. These two factors will not contribute to it getting the attention it deserves.

But the protocol for the study is available here.

Huijbers MJ, Spijker J, Donders AR, van Schaik DJ, van Oppen P, Ruhé HG, Blom MB, Nolen WA, Ormel J, van der Wilt GJ, Kuyken W. Preventing relapse in recurrent depression using mindfulness-based cognitive therapy, antidepressant medication or the combination: trial design and protocol of the MOMENT study. BMC Psychiatry. 2012 Aug 27;12(1):1.

And the trial registration is here

Mindfulness Based Cognitive Therapy and Antidepressant Medication in Recurrent Depression. ClinicalTrials.gov: NCT00928980

The abstract

Background

Mindfulness-based cognitive therapy (MBCT) and maintenance antidepressant medication (mADM) both reduce the risk of relapse in recurrent depression, but their combination has not been studied.

Aims

To investigate whether MBCT with discontinuation of mADM is non-inferior to MBCT+mADM.

Method

A multicentre randomised controlled non-inferiority trial (ClinicalTrials.gov: NCT00928980). Adults with recurrent depression in remission, using mADM for 6 months or longer (n = 249), were randomly allocated to either discontinue (n = 128) or continue (n = 121) mADM after MBCT. The primary outcome was depressive relapse/recurrence within 15 months. A confidence interval approach with a margin of 25% was used to test non-inferiority. Key secondary outcomes were time to relapse/recurrence and depression severity.

Results

The difference in relapse/recurrence rates exceeded the non-inferiority margin and time to relapse/recurrence was significantly shorter after discontinuation of mADM. There were only minor differences in depression severity.

Conclusions

Our findings suggest an increased risk of relapse/recurrence in patients withdrawing from mADM after MBCT.

Translation?

Meditating_Dog clay___4e7ba9ad6f13e

A comment by Deborah Apthorp suggested that the original title Switching from antidepressants to mindfulness meditation increases relapse was incorrect. Checking it I realized that the abstract provides the article was Confusing, but the study did indded show that mindfulness alone led to more relapses and continued medication plus mindfulness.

Here is what is said in the actual introduction to the article:

The main aim of this multicentre, noninferiority effectiveness trial was to examine whether patients who receive MBCT for recurrent depression in remission could safely withdraw from mADM, i.e. without increased relapse/recurrence risk, compared with the combination of these interventions. Patients were randomly allocated to MBCT followed by discontinuation of mADM or MBCT+mADM. The study had a follow-up of 15 months. Our primary hypothesis was that discontinuing mADM after MBCT would be non-inferior, i.e. would not lead to an unacceptably higher risk of relapse/ recurrence, compared with the combination of MBCT+mADM.

Here is what is said in the discussion:

The findings of this effectiveness study reflect an increased risk of relapse/recurrence for patients withdrawing from mADM after having participated in MBCT for recurrent depression.

So, to be clear, the sequence was that patients were randomized either to MBCT without antidepressant or to MBCT with continuing antidepressants. Patients were then followed up for 15 months. Patients who received MBCT without the antidepressants have significantly more relapses/recurrences In the follow-up period than those who received MBCT with antidepressants.

The study addresses the question about whether patients with remitted depression on maintenance antidepressants who were randomized to receive mindfulness-based cognitive therapy (MBCT) have poorer outcomes than those randomized to remaining on their antidepressants.

The study found that poorer outcomes – more relapses – were experienced by patients switching to MBCT verses those remaining on antidepressants plus MBCT.

Strengths of the study

The patients were carefully assessed with validated semi structured interviews to verify they had recurrent past depression, were in current remission, and were taking their antidepressants. Assessment has an advantage over past studies that depended on less reliable primary-care physicians’ records to ascertain eligibility. There’s ample evidence that primary-care physicians often do not make systematic assessments deciding whether or not to preparation on antidepressants.

The control group. The comparison/control group continued on antidepressants after they were assessed by a psychiatrist who made specific recommendations.

 Power analysis. Calculation of sample size for this study was based on a noninferiority design. That meant that the investigators wanted to establish that within particular limit (25%), whether switching to MBCT produce poor outcomes.

A conventional clinical trial is designed to see if the the null hypothesis can rejected of no differences between intervention and control group. As an noninferiority trial, this study tested the null hypothesis that the intervention, shifting patients to MBCT would not result in an unacceptable rise, set at 25% more relapses and recurrences. Noninferiority trials are explained here.

Change in plans for the study

The protocol for the study originally proposed a more complex design. Patients would be randomized to one of three conditions: (1) continuing antidepressants alone; (2) continuing antidepressants, but with MBCT; or (3) MBCT alone. The problem the investigators encountered was that many patients had a strong preference and did not want to be randomized. So, they conducted two separate randomized trials.

This change in plans was appropriately noted in a modification in the trial registration.

The companion study examined whether adding MBCT to maintenance antidepressants reduce relapses. The study was published first:

Huijbers MJ, Spinhoven P, Spijker J, Ruhé HG, van Schaik DJ, van Oppen P, Nolen WA, Ormel J, Kuyken W, van der Wilt GJ, Blom MB. Adding mindfulness-based cognitive therapy to maintenance antidepressant medication for prevention of relapse/recurrence in major depressive disorder: Randomised controlled trial. Journal of Affective Disorders. 2015 Nov 15;187:54-61.

A copy can be obtained from this depository.

It was a smaller study – 35 patients randomized to MBCT alone and 33 patients randomized to a combination of MBCT and continued antidepressants. There were no differences in relapse/recurrence in 15 months.

An important limitation on generalizability

 The patients were recruited from university-based mental health settings. The minority of patients who move from treatment of depression in primary care to a specially mental health settings proportionately include more with moderate to severe depression and with a more defined history of past depression. In contrast, the patients being treated for depression in primary care include more who were mild to moderate and whose current depression and past history have not been systematically assessed. There is evidence that primary-care physicians do not make diagnoses of depression based on a structured assessment. Many patients deemed depressed and in need of treatment will have milder depression and only meet the vaguer, less validated diagnosis of Depression Not Otherwise Specified.

Declaration of interest

The authors indicated no conflicts of interest to declare for either study.

Added February 29: This may be a true statement for the core Dutch researchers who led in conducted the study. However, it is certainly not true for the British collaborator who may have served as a consultant and got authorship as result. He has extensive conflicts of interest and gains a lot personally and professionally from promotion of mindfulness in the UK. Read on.

The previous British study in The Lancet

Kuyken W, Hayes R, Barrett B, Byng R, Dalgleish T, Kessler D, Lewis G, Watkins E, Brejcha C, Cardy J, Causley A. Effectiveness and cost-effectiveness of mindfulness-based cognitive therapy compared with maintenance antidepressant treatment in the prevention of depressive relapse or recurrence (PREVENT): a randomised controlled trial. The Lancet. 2015 Jul 10;386(9988):63-73.

I provided my extended critique of this study in a previous blog post:

Is mindfulness-based therapy ready for rollout to prevent relapse and recurrence in depression?

The study protocol claimed it was designed as a superiority trial, but the authors did not provide the added sample size needed to demonstrate superiority. And they spun null findings, starting in their abstract:

However, when considered in the context of the totality of randomised controlled data, we found evidence from this trial to support MBCT-TS as an alternative to maintenance antidepressants for prevention of depressive relapse or recurrence at similar costs.

What is wrong here? They are discussing null findings as if they had conducted a noninferiority trial with sufficient power to show that differences of a particular size could be ruled out. Lots of psychotherapy trials are underpowered, but should not be used to declare treatments can be substituted for each other.

Contrasting features of the previous study versus the present one

Spinning of null findings. According to the trial registration, the previous study was designed to show that MBCT was superior to maintenance antidepressant treatment and preventing relapse and recurrence. A superiority trial tests the hypothesis that an intervention is better than a control group by a pre-set margin. For a very cool slideshow comparing superiority to noninferiority trials, see here .

Rather than demonstrating that MBCT was superior to routine care with maintenance antidepressant treatment, The Lancet study failed to find significant differences between the two conditions. In an amazing feat of spin, the authors took to publicizing this has a success that MBCT was equivalent to maintenance antidepressants. Equivalence is a stricter criterion that requires more than null findings – that any differences be within pre-set (registered) margins. Many null findings represent low power to find significant differences, not equivalence.

Patient selection. Patients were recruited from primary care on the basis of records indicating they had been prescribed antidepressants two years ago. There was no ascertainment of whether the patients were currently adhering to the antidepressants or whether they were getting effective monitoring with feedback.

Poorly matched, nonequivalent comparison/control group. The guidelines that patients with recurrent depression should remain on antidepressants for two years when developed based on studies in tertiary care. It’s likely that many of these patients were never systematically assessed for the appropriateness of treatment with antidepressants, follow-up was spotty, and many patients were not even continuing to take their antidepressants with any regularit

So, MBCT was being compared to an ill-defined, unknown condition in which some proportion of patients do not need to be taken antidepressants and were not taking them. This routine care also lack the intensity, positive expectations, attention and support of the MBCT condition. If an advantage for MBCT had been found – and it was not – it might only a matter that there was nothing specific about MBCT, but only the benefits of providing nonspecific conditions that were lacking in routine care.

The unknowns. There was no assessment of whether the patients actually practiced MBCT, and so there was further doubt that anything specific to MBCT was relevant. But then again, in the absence of any differences between groups, we may not have anything to explain.

  • Given we don’t know what proportion of patients were taking an adequate maintenance doses of antidepressants, we don’t know whether anything further treatment was needed for them – Or for what proportion.
  • We don’t know whether it would have been more cost-effective simply to have a depression care manager  recontact patients recontact patients, and determine whether they were still taking their antidepressants and whether they were interested in a supervised tapering.
  • We’re not even given the answer of the extent to which primary care patients provided with an MBCT actually practiced.

A well orchestrated publicity campaign to misrepresent the findings. Rather than offering an independent critical evaluation of The Lancet study, press coverage offered the investigators’ preferred spin. As I noted in a previous blog

The headline of a Guardian column  written by one of the Lancet article’s first author’s colleagues at Oxford misleadingly proclaimed that the study showed

freeman promoAnd that misrepresentation was echoed in the Mental Health Foundation call for mindfulness to be offered through the UK National Health Service –

 

calls for NHS mindfulness

The Mental Health Foundation is offering a 10-session online course  for £60 and is undoubtedly prepared for an expanded market

Declaration of interests

WK [the first author] and AE are co-directors of the Mindfulness Network Community Interest Company and teach nationally and internationally on MBCT. The other authors declare no competing interests.

Like most declarations of conflicts of interest, this one alerts us to something we might be concerned about but does not adequately inform us.

We are not told, for instance, something the authors were likely to know: Soon after all the hoopla about the study, The Oxford Mindfulness Centre, which is directed by the first author, but not mentioned in the declaration of interest publicize a massive effort by the Wellcome Trust to roll out its massive Mindfulness in the Schools project that provides mindfulness training to children, teachers, and parents.

A recent headline in the Times: US & America says it all.

times americakey to big bucks 

 

 

A Confirmation bias in subsequent citing

It is generally understood that much of what we read in the scientific literature is false or exaggerated due to various Questionable Research Practices (QRP) leading to confirmation bias in what is reported in the literature. But there is another kind of confirmation bias associated with the creation of false authority through citation distortion. It’s well-documented that proponents of a particular view selectively cite papers in terms of whether the conclusions support of their position. Not only are positive findings claimed original reports exaggerated as they progress through citations, negative findings receie less attention or are simply lost.

Huijbers et al.transparently reported that switching to MBCT leads to more relapses in patients who have recovered from depression. I confidently predict that these findings will be cited less often than the poorer quality The Lancet study, which was spun to create the appearance that it showed MBCT had equivalent  outcomes to remaining on antidepressants. I also predict that the Huijbers et al MBCT study will often be misrepresented when it is cited.

Added February 29: For whatever reason, perhaps because he served as a consultant, the author of The Lancet study is also an author on this paper, which describes a study conducted entirely in the Netherlands. Note however, when it comes to the British The Lancet study,  this article cites it has replicating past work when it was a null trial. This is an example of creating a false authority by distorted citation in action. I can’t judge whether the Dutch authors simply accepted the the conclusions offered in the abstract and press coverage of The Lancet study, or whether The Lancet author influenced their interpretation of it.

I would be very curious and his outpouring of subsequent papers on MBCT, whether The author of  The Lancet paper cites this paper and whether he cites it accurately. Skeptics, join me in watching.

What do I think is going on it in the study?

I think it is apparent that the authors have selected a group of patients who have remitted from their depression, but who are at risk for relapse and recurrence if they go without treatment. With such chronic, recurring depression, there is evidence that psychotherapy adds little to medication, particularly when patients are showing a clinical response to the antidepressants. However, psychotherapy benefits from antidepressants being added.

But a final point is important – MBCT was never designed as a primary cognitive behavioral therapy for depression. It was intended as a means of patients paying attention to themselves in terms of cues suggesting there are sliding back into depression and taking appropriate action. It’s unfortunate that been oversold as something more than this.

 

What we can learn from a PLOS Medicine study of antidepressants and violent crime

Update October 1 7:58 PM: I corrected an inaccuracy in response to a comment by DJ Jaffe, for which I am thankful.

An impressively large-scale study published in PLOS Medicine of the association between antidepressants and violent crime is being greeted with strong opinions from those who haven’t read it. But even those who attempt to read the article might miss some of the nuances and the ambiguity that its results provide.

2305701220_0fc3d01183_bIn this issue of Mind the Brain, we will explore some of these nuances, which are fascinating in themselves. But the article also provides excellent opportunities to apply the critical appraisal skills needed for correlational observational studies using administrative data sets.

Any time there is a report of a mass shooting in the media, a motley crew of commentators immediately announces that the shooter is mentally ill and has been taking psychotropic medication. Mental illness and drugs are the problem, not guns, we are told. Sprinkled among the commentators are opponents of gun-control, Scientologists, and psychiatrists seeking to make money serving as expert witnesses. They are paid handsomely to argue for the diminished responsibility for the shooter or for product liability suits against Pharma. Rebuttals will be offered by often equally biased commentators, some of them receiving funds from Pharma.

every major shoorting
This is not from the Onion, but a comment left at a blog that expresses a commonly held view.

guns-health-care-82880109353

What is generally lost is that most shooters are not mentally ill and are not taking psychotropic medication.

Yet such recurring stories in the media have created a strong impression in the public and even professionals that a large scientific literature exists which establishes a tie between antidepressant use and violence.

Even when there has been some exposure to psychotropic medication, its causal role in the shooting cannot be established either from the facts of the case or the scientific literature.

The existing literature is seriously limited in quality and quantity and contradictory in its conclusions. Ecological studies [ 1, 2,]  conclude that the availability of antidepressants may reduce violence on a community level. An “expert review” and a review of reports of adverse events conclude there is a link between antidepressants and violence. However, reports of adverse events being submitted to regulatory agencies can be strongly biased, including by recent claims in the media. Reviews of adverse events do not distinguish between correlates of a condition like depression and effects of the drug being used to treat it. Moreover, authors of these particular reviews were serving as expert witnesses in legal proceedings. Authorship adds to their credibility and publicizes their services.

The recent study in PLOS Medicine should command the attention of anyone interested in the link between antidepressants and violent crime. Already there have been many tweets and at least one media story claiming vindication of the Scientologists as being right all along  I expected the release of the study and its reaction in the media would give me another opportunity to call attention to the entrenched opposing sides in the antidepressant wars  who only claim to be driven by strength of evidence and dismiss any evidence contrary to their beliefs, as well as the gullibility of journalists. But the article and its coverage in the media are developing a very different story.

At the outset, I should say I don’t know if evidence can be assembled for an unambiguous case that antidepressants are strongly linked to violent crime. Give up on us ever been able to rely on a randomized trial in which we examine whether participants randomized to receiving an antidepressant rather than a placebo are convicted more often for violent crimes. Most persons receiving antidepressant will not be convicted for a violent crime. The overall base rate of convictions is too low to monitor as an outcome a randomized trial. We are left having to sort through correlational observational, clinical epidemiological data typically collected for other purposes.

I’m skeptical about there being a link strong enough to send a clear signal through all the noise in the data sets that we can assemble to look for it. But the PLOS Medicine article represents a step forward.

stop Association does not equal causation
From Health News Review

Correlation does not equal causality.

Any conceivable data set in which we can search will pose the challenges of competing explanations from other variables that might explain the association.

  • Most obviously, persons prescribed antidepressants suffer from conditions that may themselves increase the likelihood of violence.
  • The timing of persons seeking treatment with antidepressants may be influenced by circumstances that increase their likelihood of violence.
  • Violent persons are more likely to be under the influence of alcohol and other drugs and to have histories of use of these substances.
  • Persons taking antidepressants and consuming alcohol and other drugs may be prone to adverse effects of the combination.
  • Violent persons have characteristics and may be in circumstances with a host of other influences that may explain their behavior.
  • Violent persons may themselves be facing victimization that increases the likelihood of their committing violence and having a condition warranting treatment with antidepressants.

Etc, etc.

The PLOS Medicine article introduces a number of other interesting possibilities for such confounding.

Statistical controls are never perfect

Studies will always incompletely specify of confounds and imperfectly measure them. Keep in mind that completeness of statistical control requires that all possible confounding factors be identified and measured without error. These ideal conditions are not attainable. Yet any application of statistics to “control” confounds that do not meet these ideal conditions risks producing less accurate estimate of effects than simply examining basic associations. Yet, we already know that these simple associations are not sufficient to indicate causality.

The PLOS Medicine article doesn’t provide definitive answers, but it presents data with greater sophistication than has previously been available. The article’s careful writing should make misinterpretation or missing of its main points less likely. And one of the authors – Professor Seena Fazel of the Department of Psychiatry, Oxford University – did an exemplary job of delivering careful messages to any journalist who would listen.

Professor Seena Fazel
Professor Seena Fazel

Professor Fazel can be found explaining his study in the media at 8:45 in a downloadable BBC World New Health Check News mp3.

Delving into the details of the article

The PLOS Medicine article is of course open access and freely available.

Molero, Y., Lichtenstein, P., Zetterqvist, J., Gumpert, C. H., & Fazel, S. (2015). Selective Serotonin Reuptake Inhibitors and Violent Crime: A Cohort Study. PLoS Med, 12(9), e1001875.

Supplementary material are also available from the web [1, 2, 3] for the study including a completed standardized STROBE checklist of items  that should be included in reports of observational studies, additional tables, and details of the variables and how they were obtained.

An incredible sample

Out of Sweden’s total population of 7,917,854 aged 15 and older in 2006, the researchers identified 856,493 individuals who were prescribed a selective serotonin reuptake inhibitor (SSRI) antidepressant from 2006-2009 and compared them to the 7,061,361 Swedish individuals who were not been prescribed this medication in that four year period.

SSRIs  were chosen for study because they represent the bulk of antidepressants being prescribed and also because SSRIs are the class of antidepressants to which the question of an association with violence of the most often raised. Primary hypotheses were about the SSRIs as a group, but secondary analyses focused on individual SSRIs – fluoxetine, citalopram, paroxetine, sertraline, fluvoxamine, and escitalopram. It was not expected that the analyses at the level of individual SSRI drugs have sufficient statistical power to explore associations with violent crimes. Data were also collected on non-SSRI antidepressants and other psychotropic medication, and these data were used to adjust for medications taken concurrently with SSRIs.

With these individuals’ unique identification number, the researchers collected information on the particular medications and dates of prescription from the Swedish Prescribed Drug Register. The register provides complete data on all prescribed and dispensed medical drugs from all pharmacies in Sweden since July 2005. The unique identification number also allowed obtaining information concerning hospitalizations and outpatient visits and reasons for visit and diagnoses.

crime sceneThese data were then matched against information on convictions for violent crimes for the same period from the Swedish national crime register.

These individuals were followed from January 1, 2006, to December 31, 2009.

During this period 1% of individuals prescribed an SSRI were convicted of a violent crime versus .6% of those not being prescribed an SSRI. The article focused on the extent to which prescription of an SSRI affected the likelihood of committing a violent crime and considered other possibilities for any association that was found.

A clever analytic strategy

Epidemiologic studies most commonly make comparisons between individuals differing in their exposures to particular conditions in terms of whether they have particular outcomes. Detecting bona fide causal associations can be derailed by other characteristics being associated with both antidepressants and violent crimes. An example of a spurious relationship is one between coffee drinking and cardiovascular disease. Exposure to coffee may be associated with lung cancer, but the association is spurious, due to smokers smoking Confoundinglighting up when they have coffee breaks. Taking smoking into account eliminates the association of coffee and cardiovascular disease. In practice, it can be difficult to identify such confounds, particularly when they are left unmeasured or imperfectly measured.

So, such Between-individual analyses of people taking antidepressants and those who are not are subject to a full range of unmeasured, but potentially confounding background variables.

For instance, in an earlier study in the same population, some of these authors found that individuals with a full (adjusted OR 1.5, 95% CI 1.3-1.6) or half (adjusted OR 1.2, 95% CI 1.1-1.4) sibling with depression were themselves more likely to be convicted of violent crime, after controlling for age, sex, low family income and being born abroad. The influence of such familial risk can be misconstrued in a standard between-individual analysis.

This article supplemented between-individual analyses with within-individual stratified Cox proportional hazards regressions. Each individual exposed to antidepressants was considered separately and served as his/her own control. Thus, these within-individual analyses examined differences in violent crimes in the same individuals over time periods differing in whether they had exposure to an antidepressant prescription. Periods of exposure became the unit of analysis, not just individuals.

The linked Swedish data sets that were used are unusually rich. It would not be feasible to obtain such data in other countries, and certainly not the United States.

The results as summarized in the abstract

Using within-individual models, there was an overall association between SSRIs and violent crime convictions (hazard ratio [HR] = 1.19, 95% CI 1.08–1.32, p < 0.001, absolute risk = 1.0%). With age stratification, there was a significant association between SSRIs and violent crime convictions for individuals aged 15 to 24 y (HR = 1.43, 95% CI 1.19–1.73, p < 0.001, absolute risk = 3.0%). However, there were no significant associations in those aged 25–34 y (HR = 1.20, 95% CI 0.95–1.52, p = 0.125, absolute risk = 1.6%), in those aged 35–44 y (HR = 1.06, 95% CI 0.83–1.35, p = 0.666, absolute risk = 1.2%), or in those aged 45 y or older (HR = 1.07, 95% CI 0.84–1.35, p = 0.594, absolute risk = 0.3%). Associations in those aged 15 to 24 y were also found for violent crime arrests with preliminary investigations (HR = 1.28, 95% CI 1.16–1.41, p < 0.001), non-violent crime convictions (HR = 1.22, 95% CI 1.10–1.34, p < 0.001), non-violent crime arrests (HR = 1.13, 95% CI 1.07–1.20, p < 0.001), non-fatal injuries from accidents (HR = 1.29, 95% CI 1.22–1.36, p < 0.001), and emergency inpatient or outpatient treatment for alcohol intoxication or misuse (HR = 1.98, 95% CI 1.76–2.21, p < 0.001). With age and sex stratification, there was a significant association between SSRIs and violent crime convictions for males aged 15 to 24 y (HR = 1.40, 95% CI 1.13–1.73, p = 0.002) and females aged 15 to 24 y (HR = 1.75, 95% CI 1.08–2.84, p = 0.023). However, there were no significant associations in those aged 25 y or older. One important limitation is that we were unable to fully account for time-varying factors.

Hazard ratios (HRs) are explained here and are not to be confused with odds ratios (ORs) explained here. Absolute risk (AR) is the most intuitive and easy to understand measure of risk and is explained here, along with reasons that hazard ratios don’t tell you anything about absolute risk.

Principal findings

  • There was an association between receiving a prescription for antidepressants and violent crime.
  • When age differences were examined, the 15-24 age range was the only one from which the association was significant.
  • No association was found for other age groups.
  • The association held for both males and females analyze separately in the 15 – 24 age range. But…

Things not to be missed in the details

Only a small minority of persons prescribed an antidepressant were convicted of a violent crime, but the likelihood of a conviction in persons exposed to antidepressants was increased in this 15 to 24 age range.

There isn’t a dose-response association between SSRI use and convictions for violent crimes. Even in the 15 to 24 age range, periods of moderate or high exposure to SSRIs were not associated with violent crimes any more than no exposure. Rather, the association occurred only in those individuals with low exposure.

A dose response association would be reflected in the more exposure to antidepressants an individual had, the greater the level of violent crimes. A dose response association is a formal criterion for a causal association adequate evidence of a causal relationship between an incidence and a possible consequence.

In the age bracket for which this association between antidepressant use and conviction of a violent crime was significant, antidepressant use was also associated with an increased risk of violent crime arrests, non-violent crime convictions, and non-violent crime arrests, using emergency inpatient and or outpatient treatment for alcohol intoxication or misuse.

Major caveats

The use of linked administrative data sets concerning both antidepressant prescription and violent crimes is a special strength of this study. It allows a nuanced look at an important question with evidence that could not otherwise be assembled. But administrative data have well-known limitations.

The data were not originally captured with the research questions in mind and so key variables, including data concerning potential confounds were not necessarily collected. The quality control for the administrative purposes for which these data were collected, may differ greatly from what is needed in their use as research data. There may be systematic errors and incomplete data and inaccurate coding, including of the timing of these administrative events.

Administrative data do not always mesh well with the concepts with which we may be most concerned. This study does not directly assess violent behavior, only arrest and convictions. Most violent behavior does not result in an arrest or conviction and so this is a biased proxy for behavior.

This study does not directly assess diagnosis of depression, only diagnosis by specialists. We know from other studies that in primary and specialty medical settings, there may be no systematic effort to assess clinical depression by interview. The diagnoses that are recorded may simply be only serve to justify a clinical decision made on the basis other than a patient meeting research criteria for depression. Table 1 in the article suggests that only about a quarter of the patients exposed to antidepressants actually had a diagnosis of depression. And throughout this article, there was no distinction made between unipolar depression and the depressed phase of a bipolar disorder. This distinction may be important, given the small minority of individuals who were convicted of a violent crime while exposed to a SSRI.

Alcohol-and-Anti-DepressantsPerhaps one of the greatest weaknesses of this data set is its limited assessment of alcohol and substance use and abuse. For alcohol, we are limited to emergency inpatient or outpatient treatment for alcohol intoxication or misuse. For substance abuse, we have only convictions designated as substance-related. These are poor proxies for more common actual alcohol and substance use, which for a variety of reasons may not show up in these administrative data. Substance-related convictions are simply too infrequent to serve as a suitable control variable or even proxy for substance. It is telling that in the 15-24 age range, alcohol intoxication or misuse is associated with convictions for violent crimes with a strength (HR = 1.98, 95% CI 1.76–2.21, p < 0.001) greater than that found for SSRIs.

There may be important cultural differences between Sweden and other countries to which we want to generalize in terms of the determinants of arrest and conviction, but also treatment seeking for depression and the pathways for obtaining antidepressant medication. There may also be differences in institutional response to drug and alcohol use and misuse, including individuals’ willingness and ability to access services.

An unusual strength of this study is its use of within-individual analyses to escape some of the problems of more typical between-individual analyses not being able to adequately control for stable sources of differences. But, we can’t rely on these analyses to faithfully capture crucial sequences of events that happen quickly in terms of which events occurred first. The authors note that they

cannot fully account for time-varying risk factors, such as increased drug or alcohol use during periods of SSRI medication, worsening of symptoms, or a general psychosocial decline.

Findings examining non-fatal injuries from accidents as well as emergency inpatient or outpatient treatment for alcohol intoxication or misuse as time-varying confounders are tantalizing, but we reached the limits of the administrative data in trying to pursue them.

What can we learn from this study?

Readers seeking a definitive answer from the study to the question of whether antidepressants cause violent behavior or even violent crime will be frustrated.

There does not seem to be a risk of violent crime in individuals over 25 taking antidepressants.

The risk confined to individuals aged between 15 and 25 is, according to the authors, modest, but not insignificant. It represents a 20 to 40% increase in the low likelihood of being convicted of a violent crime. But it is not necessarily causal. The provocative data suggesting that low exposure, rather than no exposure or moderate or high exposure to antidepressants should give pause and suggest something more complex than simple causality may be going on.

This is an ambiguous but important point. Low exposure could represent non-adherence, inconsistent adherence, or periods in which there was a sudden stopping of medication, the effects of which might generate an association between the exposure and violent crimes. It could also represent the influence of time-dependent variables such as use of alcohol or substances that escaped control in the within-individual analyses.

There are parallels between results of the present study what is observed in other data sets. Most importantly, the data have some consistency with reports of suicidal ideation and deliberate self-harm among children and adolescents exposed to antidepressants. The common factor may be increased sensitivity of younger persons to antidepressants and particularly to their initiation and withdrawal or sudden stopping, the sensitivity reflected in impulsive and risk-taking behavior.

The take away message

Data concerning links between SSRIs and violent crime invite premature and exaggerated declarations of implications for public health and public policy.

At another blog, I’ve suggested that the British Medical Journal requirement that that observational studies have a demarcated section addressing these issues encourages authors to go beyond their data in order to increase the likelihood of publication – authors have to make public health and public policy recommendations to show that their data are newsworthy enough for publication. It’s interesting thata media watch group  criticized BMJ for using too strong causal language in covering this observational PLOS Medicine article.

I’m sure that the authors of this article felt pressure to address whether a black box warning inserted into the packaging of SSRIs was warranted by these data. I agree with them not recommending this at this time because of the strength of evidence and ambiguity in the interpretation of these administrative data. But I agree that the issue of young people being prescribed SSRIs needs more research and specifically elucidation of why low dose increases the likelihood of violence versus no or medium to high dose.

The authors do make some clinical recommendations, and their spokesperson Professor Fazel is particularly clear but careful in his interview with BBC World New Health Check News. My summary of what is said in the interview and in other media contacts is

  • Adolescents and young adults should be prescribed SSRIs should be on the basis of careful clinical interviews to ascertain a diagnosis consistent with practice guidelines for prescribing these drugs and that the drug be prescribed at therapeutic level.
  • These patients should be educated about the necessity of taking these medications consistently and advised against withdrawal or stopping the medication quickly without consultation and supervision of a professional.
  • These patients should be advised against taking these medications with alcohol or other drugs, with the explanation that there could be serious adverse reactions.

In general, young persons may be more sensitive to SSRIs, particularly when starting or stopping, and particularly when taken in the presence of alcohol or other drugs.

The importance of more research concerning nature of the sensitivity is highlighted by the findings of the PLOS Medicine article and the issues these findings point to but do not resolve.

Molero Y, Lichtenstein P, Zetterqvist J, Gumpert CH, Fazel S (2015) Selective Serotonin Reuptake Inhibitors and Violent Crime: A Cohort Study. PLoS Med 12(9): e1001875. doi:10.1371/journal.pmed.1001875

The views expressed in this post represent solely those of its author, and not necessarily those of PLOS or PLOS Medicine.

Is mindfulness-based therapy ready for rollout to prevent relapse and recurrence in depression?

Doubts that much of clinical or policy significance was learned from a recent study published in Lancet

Dog-MindfulnessPromoters of Acceptance and Commitment Therapy (ACT) notoriously established a record for academics endorsing a psychotherapy as better than alternatives, in the absence of evidence from adequately sized, high quality studies with suitable active control/comparison conditions. The credibility of designating a psychological interventions as “evidence-based” took a serious hit with the promotion of ACT, before its enthusiasts felt they attracted enough adherents to be able to abandon claims of “best” or “better than.”

But the tsunami of mindfulness promotion has surpassed anything ACT ever produced, and still with insufficient quality and quantity of evidence.

Could that be changing?

Some might think so with a recent randomized controlled trial reported in the Lancet of mindfulness-based cognitive therapy (MBCT) to reduce relapse and recurrence in depression. The headline of a Guardian column  by one of the Lancet article’s first author’s colleagues at Oxford misleadingly proclaimed that the study showed

freeman promoAnd that misrepresentation was echoed in the Mental Health Foundation call for mindfulness to be offered through the UK National Health Service –

calls for NHS mindfulnessThe Mental Health Foundation is offering a 10-session online course  for £60 and is undoubtedly prepared for an expanded market.

andrea-on-mindfulness
Patient testimonial accompanying Mental Health Foundation’s call for dissemination.

 

 

 

The Declaration of Conflict of Interest for the Lancet article mentions the first author and one other are “co-directors of the Mindfulness Network Community Interest Company and teach nationally and internationally on MBCT.” The first author notes the marketing potential of his study in comments to the media.

revising NICETo the authors’ credit, they modified the registration of their trial to reduce the likelihood of it being misinterpreted.

Reworded research question. To ensure that readers clearly understand that this trial is not a direct comparison between antidepressant medication (ADM) and Mindfulness-based cognitive therapy (MBCT), but ADM versus MBCT plus tapering support (MBCT-TS), the primary research question has been changed following the recommendation made by the Trial Steering Committee at their meeting on 24 June 2013. The revised primary research question now reads as follows: ‘Is MBCT with support to taper/discontinue antidepressant medication (MBCT-TS) superior to maintenance antidepressant medication (m-ADM) in preventing depression over 24 months?’ In addition, the acronym MBCT-TS will be used to emphasise this aspect of the intervention.

1792c904fbbe91e81ceefdd510d46304I would agree and amplify: This trial adds nothing to  the paucity of evidence from well-controlled trials that MBCT is a first-line treatment for patients experiencing a current episode of major depression. The few studies to date are small and of poor quality and are insufficient to recommend MBCT as a first line treatment of major depression.

I know, you would never guess that from promotions of MBCT for depression, especially not in the current blitz promotion in the UK.

The most salient question is whether MBCT can provide an effective means of preventing relapse in depressed patients who have already achieved remission and seek discontinuation.

Despite a chorus of claims in the social media to the contrary, the Lancet trial does not demonstrate that

  • Formal psychotherapy is needed to prevent relapse and recurrence among patients previously treated with antidepressants in primary care.
  • Any less benefit would have been achieved with a depression care manager who requires less formal training than a MBCT therapist.
  • Any less benefit would have been achieved with primary care physicians simply tapering antidepressant treatment that may not even have been appropriate in the first place.
  • The crucial benefit to patients being assigned to the MBCT condition was their acquisition of skills.
  • That practicing mindfulness is needed or even helpful in tapering from antidepressants.

We are all dodos and everyone gets a prize

dodosSomething also lost in the promotion of the trial is that it was originally designed to test the hypothesis that MBCT was better than maintenance antidepressant therapy in terms of relapse and recurrence of depression. That is stated in the registration of the trial, but not in the actual Lancet report of the trial outcome.

Across the primary and secondary outcome measures, the trial failed to demonstrate that MBCT was superior. Essentially the investigators had a null trial on their hands. But in a triumph of marketing over accurate reporting of a clinical trial, they shifted the question to whether MBCT is inferior to maintenance antidepressant therapy and declared the success demonstrating that it was not.

We saw a similar move in a MBCT trial  that I critiqued just recently. The authors here opted for the noninformative conclusion that MBCT was “not inferior” to an ill-defined routine primary care for a mixed sample of patients with depression and anxiety and adjustment disorders.

An important distinction is being lost here. Null findings in a clinical trial with a sample size set to answer the question whether one treatment is better than another is not the same as demonstrating that the two treatments are equivalent. The latter question requires a non-inferiority design with a much larger sample size in order to demonstrate that by some pre-specified criteria two treatments do not differ from each other in clinically significant terms.

Consider this analogy: we want to test whether yogurt is better than aspirin for a headache. So we do a power analysis tailored to the null hypothesis of no difference between yogurt and aspirin, conduct a trial, and find that yogurt and aspirin do not differ. But if we were actually interested in the question whether yogurt can be substituted for aspirin in treating headaches, we would have to estimate what size of a study would leave us comfortable with that conclusion the treating aspirin with yogurt versus aspirin makes no clinically significant difference. That would require a much larger sample size, typically several times the size of a clinical trial designed to test the efficacy of an intervention.

The often confusing differences between standard efficacy trials and noninferiority and superiority trials are nicely explained here.

Do primary care patients prescribed an antidepressant need to continue?

Patients taking antidepressants should not stop without consulting their physician and agreeing on a plan for discontinuation.

NICE Guidelines, like many international guidelines, recommend that patients with recurrent depression continue their medication for at least two years, out of concerned for a heightened risk of relapse and recurrence. But these recommendations are based on research in specialty mental health settings conducted with patients with an established diagnosis of depression. The generalization to primary care patients may not be appropriate best evidence.

Major depression is typically a recurrent, episodic condition with onset in the teens or early 20s. Many currently adult depressed patients beyond that age would be characterized as having a recurrent depression. In a study conducted at primary care practices associated with the University of Michigan, we found that most patients in waiting rooms identified as depressed on the basis of a two stage screening and formal diagnostic interview had recurrent depression, with the average patient having over six episodes before our point of contact.

However, depression in primary care may have less severe symptoms in a given episode and an overall less severe course then the patients who make it to specialty mental health care. And primary care physicians’ decisions about placing patients on antidepressants in primary care are typically not based upon a formal, semi structured interview in which there are symptom counts to ascertain whether patients have the necessary number of symptoms (5 for the Diagnostic and Statistical Manual-5) to meet diagnostic criteria.

My colleagues in Germany and I conducted another relevant study in which we randomized patients to either antidepressant, behavior therapy, or the patient preference of antidepressant versus behavior therapy. However, what was unusual was that we relied on primary care physician diagnosis, not our formal research criteria. We found that many patients enrolling in the trial would not meet criteria for major depression and, at least by DSM-IV-R criteria, would be given the highly ambiguous diagnosis of Depression, Not Otherwise Specified. The patients identified by the primary care physicians as requiring treatment for depression were quite different than those typically entering clinical trials evaluating treatment options. You can find out more about the trial here .

It is thus important to note that patients in the Lancet study were not originally prescribed antidepressants based on a formal, research diagnosis of major depression. Rather, the decisions of primary care physicians to prescribe the antidepressants, are not usually based on a systematic interview aimed at a formal diagnosis based on a minimal number of symptoms being present. This is a key issue.

The inclusion criteria for the Lancet study were that patients currently be in full or partial remission from a recent episode of depression and have had at least three episodes, counting the recent one. But their diagnosis at the time they were prescribed antidepressants was retrospectively reconstructed and may have biased by them having received antidepressants

Patients enrolled in the study were thus a highly select subsample of all patients receiving antidepressants in the UK primary care. A complex recruitment procedure involving not only review of GP records, but advertisement in the community means that we cannot tell what the overall proportion of patients receiving antidepressants and otherwise meeting criteria would have agreed to be in the study.

The study definitely does not provide a basis for revising guidelines for determining when and if primary care physicians should raise the issue of tapering antidepressant treatment. But that’s a vitally important clinical question.

skeptical-cat-is-fraught-with-skepticismQuestions not answered by the study:

  • We don’t know the appropriateness of the prescription of antidepressants to these patients in the first place.
  • We don’t know what review of the appropriateness of prescription of antidepressants had been conducted by the primary care physicians in agreeing that their patients participate in the study.
  • We don’t know the selectivity with which primary care physicians agreed for their patients to participate. To what extent are the patients to whom they recommended the trial representative of other patients in the maintenance phase of treatment?
  • We don’t know enough about how the primary care physicians treating the patients in the control groups reacted to the advice from the investigator group to continue medication. Importantly, how often were there meetings with these patients and did that change as a result of participation in this trial? Like every other trial of CBT in the UK that I have reviewed, this one suffers from an ill defined control group that was nonequivalent in terms of the contact time with professionals and support.
  • The question persists whether any benefits claimed for cognitive behavior therapy or MBCT from recent UK trials could have been achieved with nonspecific supportive interventions. In this particular Lancet study, we don’t know whether the same results could been achieved by simply tapering antidepressants assisted by a depression care manager less credentialed than what is required to provide MBCT.

The investigators provided a cost analysis. They concluded that there were no savings in health care costs of moving patients in full or partial remission off antidepressants to MBCT. But the cost analysis did not take into account the added patient time invested in practicing MBCT. Indeed, we don’t even know whether the patients assigned to MBCT actually practiced it with any diligence or will continue to do after treatment.

The authors promise a process analysis that will shed light on what element of MBCT contributed to the equivalency of outcomes with the maintenance of antidepressant medication.

But this process analysis will be severely limited by the inability to control for nonspecific factors such as contact time with the patient and support provided to the primary care physician and patient in tapering medication.

The authors seem intent on arguing that MBCT should be disseminated into the UK National Health Services. But a more sober assessment is that this trial only demonstrates that a highly select group of patients currently receiving antidepressants within the UK health system could be tapered without heightened risk of relapse and recurrence. There may be no necessity or benefit of providing MBCT per se during this process.

The study is not comparable to other noteworthy studies of MBCT to prevent remission, like Zindel Segal’s complex study . That study started with an acutely depressed patient population defined by careful criteria and treated patients with a well-defined algorithm for choosing and making changes in medications. Randomization to continued medication, MBCT, or pill placebo occurred on in the patients who remitted. It is unclear how much the clinical characteristics of the patients in the present Lancet study overlapped with those in Segal’s study.

What would be the consequences of disseminating and implementing MBCT into routine care based on current levels of evidence?

There are lots of unanswered questions concerning whether MBCT should be disseminated and widely implemented in routine care for depression.

One issue is where would the resources come from for this initiative? There already are long waiting list for cognitive behavior therapy, generally 18 weeks. Would disseminating MBCT draw therapists away from providing conventional cognitive behavior therapy? Therapists are often drawn to therapies based on their novelty and initial, unsubstantiated promises rather than strength of evidence. And the strength of evidence for MBCT is not such that we could recommend substituting it for CBT for treatment of acute, current major depression.

Another issue is whether most patients would be willing to commit not only the time for sessions of training and MBCT but to actually practicing it in their everyday life. Of course, again, we don’t even know from this trial whether actually practicing MBCT matters.

There hasn’t been a fair comparison of MBCT to equivalent time with a depression manager who would review patients currently receiving antidepressants and advise physicians has to whether and how to taper suitable candidates for discontinuation.

If I were distributing scarce resources to research to reduce unnecessary treatment with antidepressants, I would focus on a descriptive, observational study of the clinical status of patients currently receiving antidepressants, the amount of contact time their receiving with some primary health care professional, and the adequacy of their response in terms of symptom levels, but also adherence. Results could establish the usefulness of targeting long term use of antidepressants and the level of adherence of patients to taking the medication and to physicians monitoring their symptom levels and adherence. I bet there is a lot of poor quality maintenance care for depression in the community

When I was conducting NIMH-funded studies of depression in primary care, I never could get review committees interested in the issue of overtreatment and unnecessarily continued treatment. I recall one reviewer’s snotty comment that that these are not pressing public health issues.

That’s too bad, because I think they are key in considering how to distribute scarce resources to study and improve care for depression in the community. Existing evidence suggest a substantial cost of treatment of depression with antidepressants in general medical care is squandered on patients who do not meet guideline criteria for receiving antidepressants or who do not receive adequate monitoring.

Amazingly spun mindfulness trial in British Journal of Psychiatry: How to publish a null trial

mindfulness chocolateSince when is “mindfulness therapy is not inferior to routine primary care” newsworthy?

 

Spinning makes null results a virtue to be celebrated…and publishable.

An article reporting a RCT of group mindfulness therapy

Sundquist, J., Lilja, Å., Palmér, K., Memon, A. A., Wang, X., Johansson, L. M., & Sundquist, K. (2014). Mindfulness group therapy in primary care patients with depression, anxiety and stress and adjustment disorders: randomised controlled trial. The British Journal of Psychiatry.

was previously reviewed in Mental Elf. You might want to consider their briefer evaluation before beginning mine. I am going to be critical not only of the article, but the review process that got it into British Journal of Psychiatry (BJP).

I am an Academic Editor of PLOS One,* where we have the laudable goal of publishing all papers that are transparently reported and not technically flawed. Beyond that, we leave decisions about scientific quality to post-publication commentary of the many, not a couple of reviewers whom the editor has handpicked. Yet, speaking for myself, and not PLOS One, I would have required substantial revisions or rejected the version of this paper that got into the presumably highly selective, even vanity journal BJP**.

The article is paywalled, but you can get a look at the abstract here  and write to the corresponding author for a PDF at Jan.sundquist@med.lu.se

As always, examine the abstract carefully  when you suspect spin, but expect that you will not fully appreciate the extent of spin until you have digested the whole paper. This abstract declares

Mindfulness-based group therapy was non-inferior to treatment as usual for patients with depressive, anxiety or stress and adjustment disorders.

“Non-inferior” meaning ‘no worse than routine care?’ How could that null result be important enough to get into a journal presumably having a strong confirmation bias? The logic sounds just like US Senator George Aiken famously proposing getting America out of the war it was losing in Vietnam by declaring America had won and going home.

There are hints of other things going on, like no reporting of how many patients were retained for analysis or whether there were intention-to-treat analyses. And then the weird mention of outcomes being analyzed with “ordinal mixed models.”  Have you ever seen that before? And finally, do the results hold for patients with any of those disorders or only a particular sample of unknown mix and maybe only representing those who could be recruited from specific settings? Stay tuned…

What is a non-inferiority trial and when should one conduct one?

An NHS website explains

The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.

Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials  found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs  so that misuse of the design in the service of getting publishable results is more readily detected. You might want to download the CONSORT checklist for NI and apply the checklist to the trial under discussion. Right away, you can see how deficient the reporting is in the abstract of the paper under discussion.

Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.

Conflicts of interest could be a reason. Demonstration that the treatment is as good as existing treatments might warrant marketing of the new treatment or dissemination into existing markets. There could be financial rewards or simply promoters and enthusiasts favoring what they would find interesting. Yup, some bandwagons, some fads and fashions psychotherapy are in large part due to promoters simply seeking the new and different, without evidence that a treatment is better than existing ones.

Suspicions are reduced when the new treatment has other advantages, like greater acceptability or a lack of side effects, or when the existing treatments are so good that an RCT of the new treatment with a placebo-control condition would be unethical.

We should give evaluate whether there is an adequate rationale for authors doing an NI RCT, rather than them relying on the conventional test whether the null hypothesis can be rejected of no differences between the intervention and a control condition. Suitable support would be a strong record of efficacy for a well defined control condition. It would also help if the trial were pre-registered as NI, quieting concerns that it was declared as such after peeking at the data.

net-smart-mindfulnessThe first things I noticed in the methods section…trouble

  • The recruitment procedure is strangely described, but seems to indicate that the therapist providing mindfulness training were present during recruitment and probably weren’t blinded to group assignment and conceivably could influence it. The study thus does not have clear evidence of an appropriate randomization procedure and initial blinding. Furthermore, the GPs administering concurrent treatment also were not blinded and might take group assignment into account in subsequent prescribing and monitoring of medication.
  • During the recruitment procedure, GPs assessed whether medication was needed and made prescriptions before randomization occurred. We will need to see – we are not told in the methods section – but I suspect a lot of medication is being given to both intervention and control patients. That is going to complicate interpretation of results.
  • In terms of diagnosis, a truly mixed group of patients was recruited. Patients experiencing stress or adjustment reactions were thrown in with patients who had mild or moderate depression or anxiety disorders. Patients were excluded who were considered severe enough to need psychiatric care.
  • Patients receiving any psychotherapy at the start of the trial were excluded, but the authors ignored whether patients were receiving medication.

This appears to be a mildly distressed sample that is likely to show some recovery in the absence of any treatment. The authors’ not controlling for the medication was received is going to be a big problem later. Readers won’t be able to tell whether any improvement in the intervention condition is due to its more intensive support and encouragement that results in better adherence to medication.

  • The authors go overboard in defending their use of multiple overlapping
    Play at https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0CC4QFjAC&url=https%3A%2F%2Fmyspace.com%2Fkevncoyne%2Fmusic%2Fsong%2Felvis-is-dead-86812363-96247308&ei=GvYYVbegOKTf7AaRzIHoCg&usg=AFQjCNHM4EKRwFYkepeT-yROFk4LOtfhCA&bvm=bv.89381419,d.ZGU
    Play Elvis is Dead at athttp://tinyurl.com/p78pzcn

    measures and overboard in praising the validity of their measures. For instance, The Hospital Anxiety and Depression Scale (HADS) is a fatally flawed instrument, even if still widely used. I considered the instrument dead in terms of reliability and validity, but like Elvis, it is still being cited.

Okay, the authors claim these measures are great, and attach clinical importance to cut points that others no longer consider valid. But then, why do they decide that the scales are ordinal, not interval? Basically, they are saying the scales are so bad that the differences between one number to the next higher or lower for pairs of items can’t be considered equal. This is getting weird. If the scales are as good as the authors claim, why do the authors take the unusual step of considering them as psychometrically inadequate?

I know, I’m getting technical to the point that I risk losing some readers, but the authorsspin no are setting readers up to be comfortable with a decision to focus on medians, not mean scores – making it more difficult to detect any differences between the mindfulness therapy and routine care. Spin, spin!

There are lots of problems with the ill described control condition, treatment as usual (TAU). My standing gripe with this choice is  that TAU varies greatly across settings, and often is so inadequate that at best the authors are comparing whether mindfulness therapy is better than some unknown mix of no treatment and inadequate treatment.

We know enough about mindfulness therapy at this point to not worry about whether it is better than nothing at all, but should be focusing on whether is better than another active treatment and whether its effectiveness is due to particular factors. The authors state that most of the control patients were receiving CBT, but don’t indicate how they knew that, except for case records. Notoriously, a lot of the therapy done in primary care that is labeled by practitioners as CBT does not pass muster. I would be much more comfortable with some sort of control over what patients were receiving in the control arm, or at least better specification.

Analyses

I’m again trying to avoid getting very technical here, but point out for those who have a developed interest in statistics, that there were strange things going on.

  • Particular statistical analyses (depending on group medians, rather than means are chosen that are less likely to reveal differences between intervention and control group than the parametric statistics that are typically done.
  • Complicated decisions justify throwing away data and then using multivariate techniques to estimate what the data were. The multivariate techniques require assumptions that are not tested.
  • The power analysis is not conducted to detect differences between groups, but to be able to provide a basis for saying that mindfulness does not differ from routine care. Were the authors really interested in that question rather than whether mindfulness is better than routine care in initially designing a study and its analytic plan? Without pre-registration, we cannot know.

Results

There are extraordinary revelations in table 1, baseline characteristics.

Please click to enlarge

  • The intervention and control group initially differed for two of the four outcome variables before they even received the intervention. Thus, intervention and control conditions are not comparable in important baseline characteristics. This is in itself a risk of bias, but also raises further questions about the adequacy of the randomization procedure and blinding.
  • We are told nothing about the distribution of diagnoses across the intervention and control group, which is very important in interpreting results and considering what generalizations can be made.
  • Most patients in both the intervention and control groups were receiving antidepressants and about a third of them either condition were receiving a “tranquilizer” or have missing data for that variable.

Signals that there is something amiss in this study are growing stronger. Given the mildness of disturbance and high rates of prescription of medication, we are likely dealing with a primary care sample where medications are casually distributed and poorly monitored. Yet, this study is supposedly designed to inform us whether adding mindfulness to this confused picture produces outcomes that are not worse.

Table 5 adds to the suspicions. There were comparable, significant changes in both the intervention and control condition over time. But we can’t know if that was due to the mildness of distress or effectiveness of both treatments.

table 5

Twice as many patients assigned to mindfulness dropped out of treatment, compared to those assigned to routine care. Readers are given some information about how many sessions of mindfulness patients attended, but not the extent to which they practiced mindfulness.

positive spin 2Discussion

We are told

The main finding of the present RCT is that mindfulness group therapy given in a general practice setting, where a majority of patients with depression, anxiety, and stress and adjustment disorders are treated, is non-inferior to individual-based therapy, including CBT. To the best of our knowledge, this is the first RCT performed in a general practice setting where the effect of mindfulness group therapy was compared with an active control group.

Although a growing body of research has examined the effect of mindfulness on somatic as well as psychiatric conditions, scientific knowledge from RCT studies is scarce. For example, a 2007 review…

It’s debatable whether the statement was true in 2007, but a lot has happened since then. Recent reviews suggest that mindfulness therapy is better than nothing and better than inactive control conditions that do not provide comparable levels of positive expectations and support. Studies are accumulating that indicate mindfulness therapy is not consistently better than active control conditions. Differences become less likely when the alternative treatments are equivalent in positive expectations conveyed to patients and providers, support, and intensity in terms of frequency and amount of contact. Resolving this latter question of whether mindfulness is better than reasonable alternatives is now critical in this study provides no relevant data.

An Implications section states

Patients who receive antidepressants have a reported remission rate of only 35–40%.41 Additional treatment is therefore needed for non-responders as well as for those who are either unable or unwilling to engage in traditional psychotherapy.

The authors are being misleading to the point of being irresponsible in making this statement in the context of discussing the implications of their study. The reference is to the American STAR*D treatment study, which dealt with very different, more chronically and unremittingly depressed population.

An appropriately referenced statement about primary care populations like what this study was recruited would point to the lack of diagnosis on which prescription of medicaton was based, unnecessary treatment with medication of patients who would not be expected to benefit from it, and poor monitoring and follow-up of patients who could conceivably benefit from medication if appropriately minutes. The statement would reflect the poor state of routine care for depression in the community, but would undermine claims that the control group received an active treatment with suitable specification that would allow any generalizations about the efficacy of mindfulness.

MY ASSESSMENT

This RCT has numerous flaws in its conduct and reporting that preclude making any contribution to the current literature about mindfulness therapy. What is extraordinary is that, as a null trial, it got published in BJP. Maybe its publication in its present form represents incompetent reviewing and editing, or maybe a strategic, but inept decision to publish a flawed study with null findings because it concerns the trendy topic of mindfulness and GPs to whom British psychiatrists want to reach out.

An RCT of mindfulness psychotherapy is attention-getting. Maybe the BJP is willing to sacrifice trustworthiness of the interpretation of results for newsworthiness. BJP will attract readership it does not ordinarily get with publication of this paper.

What is most fascinating is that the study was framed as a noninferiority trial and therefore null results are to be celebrated. I challenge anyone to find similar instances of null results for a psychotherapy trial being published in BJP except in the circumstances that make a lack of effect newsworthy because it suggests that investment in the dissemination of a previously promising treatment is not justified. I have a strong suspicion that this particular paper got published because the results were dressed up as a successful demonstration of noninferiority.

I would love to see the reviews this paper received, almost as much as any record of what the authors intended when they planned the study.

Will this be the beginning of a trend? Does BJP want to encourage submission of noninferiority psychotherapy studies? Maybe the simple explanation is that the editor and reviewers do not understand what a noninferiority trial is and what it can conceivably conclude.

Please, some psychotherapy researcher with a null trial sitting in the drawer, test the waters by dressing the study up as a noninferior trial and submitted to BJP.

How bad is this study?

The article provides a non-intention-to-treat analysis of a comparison of mindfulness to an ill specified control condition that would not qualify as an active condition. The comparison does not allow generalization to other treatments in other settings. The intervention and control conditions had significant differences in key characteristics at baseline. The patient population is ill-described in ways that does not allow generalization to other patient populations. The high rates of co-treatment confounding due to antidepressants and tranquilizers precludes determination of any effects of the mindfulness therapy. We don’t know if there were any effects, or if both the mindfulness therapy and control condition benefited from the natural decline in distress of a patient population largely without psychiatric diagnoses. Without a control group like a waiting list, we can’t tell if these patients would have improved any way. I could go on but…

This study was not needed and may be unethical

lipstickpigThe accumulation of literature is such that we need less mindfulness therapy research, not more. We need comparisons with well specified active control groups that can answer the question of whether mindfulness therapy offers any advantage over alternative treatments, not only in efficacy, but in the ability to retain patients so they get an adequate exposure to the treatment. We need mindfulness studies with cleverly chosen comparison conditions that allow determination of whether it is the mindfulness component of mindfulness group therapy that has any effectiveness, rather than relaxation that mindfulness therapy shares with other treatments.

To conduct research in patient populations, investigators must have hypotheses and methods with the likelihood of making a meaningful contribution to the literature commensurate with all the extra time and effort they are asking of patients. This particular study fails this ethical test.

Finally, the publication of this null trial as a noninferiority trial pushes the envelope in terms of the need for preregistration of design and analytic plans for trials. If authors of going to claim a successful demonstration of non-inferiority, we need to know that is what they set out to do, rather than just being stuck with null findings they could not otherwise publish.

*DISCLAIMER: This blog post presents solely the opinions of the author, and not necessarily PLOS. Opinions about the publishability of papers reflect only the author’s views and not necessarily an editorial decision for a manuscript submitted to PLOS One.

**I previously criticized the editorial process at BJP, calling for the retraction of a horribly flawed meta-analysis of the mental health effects of abortion written by an American antiabortion activist. I have pointed out how another flawed review of the efficacy of long-term psychodynamic psychotherapy represented duplicate publication . But both of these papers were published under the last editor. I still hope that the current editor can improve the trustworthiness of what is published at BJP. I am not encouraged by this particular paper, however.