The PACE PLOS One data will not be released and the article won’t be retracted

PLOS One has bought into discredited arguments about patient consent forms not allowing sharing of anonymized data. PLOS One is no longer at the vanguard of open science through routine data sharing.

mind the brain logo

Two years have passed since I requested release of the PLOS One PACE data, eight months since the Expression of Concern was posted. What can we expect?

expression of concern-page-0

9 dot problem
Solving the 9-dot problem involves paying attention and thinking outside the box.

If we spot some usually unrecognized connections, we can see the PLOS One editors are biased towards the PACE investigators, favoring them over other stakeholders in whether the data are released as promised..

Spoiler: The PLOS One Senior Editors completed the pre-specified process of deciding what to do about the data not being shared.  They took no action. Months later the Senior Editors reopened the process and invited one of PACE investigators Trudy Chalder’s outspoken co-authors to help them reconsider.

A lot of us weren’t cynical enough to notice.

International trends will continue toward making uploading data into publicly accessible repositories a requirement for publication. PLOS One has slowed down by buying into discredited arguments about patient consent forms not allowing sharing of anonymized data.

PLOS One is no longer at the vanguard of open science through routine data sharing.

The expression of concern

actual display of expression of concern on PLOS article
Actual Expression of Concern on display on PLOS One article.

The editors’ section of the Expression of Concern ends with:

In spite of requests to the authors and Queen Mary University of London, we have not yet received confirmation that an institutional process compatible with the existing PLOS data policy at the time has been developed or implemented for the independent evaluation of requests for data from this study. We conclude that the lack of resolution towards release of the dataset is not in line with the journal’s editorial policy and we are thus issuing this Expression of Concern to alert readers about the concerns raised about this article.

This is followed by the PACE investigators’ response:

Statement from the authors

We disagree with the Expression of Concern about our health economic paper that PLOS ONE has issued and do not accept that it is justified. We believe that data should be made available and have shared data from the PACE trial with other researchers previously, in line with our data sharing policy. This is consistent with the data sharing policies of Queen Mary University of London, and the Medical Research Council, which funded the trial. The policy allows for the sharing of data with other researchers, so long as safeguards are agreed regarding confidentiality of the data and consent as specified by the Research Ethics Committee (REC). We have also pointed out to PLOS ONE that our policy includes an independent appeal process, if a request is declined, so this policy is consistent with the journal’s policy when the paper was published.

During negotiations with the journal over these matters, we have sought further guidance from the PACE trial REC. They have advised that public release, even of anonymised data, is not appropriate. As a consequence, we are unable to publish the individual patient data requested by the journal. However, we have offered to provide key summarised data, sufficient to provide an independent re-analysis of our main findings, so long as it is consistent with the REC decision, on the PLOS ONE website. As such we are surprised by and question the decision by the journal to issue this Expression of Concern.

Check out my critique of their claim to have shared data from the PACE trial with other researchers-

Don’t bother to apply: PACE investigators issue guidance for researchers requesting access to data.

Nothing_to_DeclareConflict of interest: Nothing to declare?

 The PACE authors were thus given an extraordinary opportunity to undermine the editors’ Expression of Concern.

It is just as extraordinary that there is no disclosure of conflict of interest. After all, it is their paper is receiving expression of concern because of the authors’ failure to provide data as promised.

In contrast, when the PLOS One editors placed a discreet Editors Note in 2015 in the comment section of the article about the data not being shared when requested, it carried a COI declaration:

Competing interests declared: PLOS ONE Staff

That COI aroused the curiosity of Retraction Watch who asked PLOS One:

We weren’t sure what the last line was referring to, so contacted Executive Editor Veronique Kiermer. She told us that staff sometimes include their byline under “competing interests,” so the authorship is immediately clear to readers who may be scanning a series of comments.

Commentary from Retraction Watch

PLOS upgrades flag on controversial PACE chronic fatigue syndrome trial; authors “surprised”

Notable excerpts:

A spokesperson for PLOS told us this is the first time the journal has included a statement from the authors in an EOC:

This has been a complex case involving many stakeholders and we wanted to document the different aspects of the case in a fair manner.

And

We asked if the journal plans to retract the paper if the authors fail to provide what it’s asked for; the spokesperson explained:

At this time, PLOS stands by its Expression of Concern. For now, we have exhausted the options to make the data available in accordance with our policy at the time, but PLOS still seeks a positive outcome to this case for all parties. It is our intention to update this notice when a mechanism is established that allows concerns about the article’s analyses to be addressed while protecting patient privacy. PLOS has not given the authors a deadline.

Note: “PLOS did not given the authors a deadline.”

One of the readers who has requested the data is James Coyne, a psychologist at the University Medical Center, Groningen, who submitted his request 18 months ago (and wrote about it on the PLOS blog site). Although some of the data have been released (to one person under the Freedom of Information Act), it’s not nearly enough to conduct an analysis, Coyne told us:

This small data set does not allow recalculation of original primary outcomes but did allow recalculation of recovery data. Release of the PLOS data is crucial for a better understanding of what went on in that trial. That’s why the investigators are fighting so hard.

Eventually, Coyne began suggesting to PLOS that he would organize public protests and scientific meetings attended by journal representatives.

I think it is the most significant issue in psychotherapy today, in terms of data sharing. It’s a flagrant violation of international standards.

The Retraction Watch article cited a 2015 STAT article that was written by Retraction Watch co-founders Ivan Oransky and Adam Marcus. That article was sympathetic to my request:

If the information Coyne is seeking is harmful and distressing to the staff of the university — and that’s the university’s claim, not ours — that’s only because the information is in fact harmful and distressing. In other words, revealing that you have nothing to hide is much less embarrassing than revealing that you’re hiding something.

The STAT article also said:

To be clear, Coyne’s not asking for sex tapes or pictures of lab workers taking bong hits. He’s asking for raw data so that he can evaluate whether what a group of scientists reported in print is in fact what those data show. It’s called replication, and as Richard Smith, former editor of The BMJ (and a member of our board of directors), put it last week, the refusal goes “against basic scientific principles.” But, unfortunately, stubborn researchers and institutions have used legal roadblocks before to prevent scrutiny of science.

The PLOS One Editors’ blog  post.

The Expression of Concern was accompanied by a blog post from PLOS Iratxe Puebla, Managing Editor for PLOS ONE and Joerg Heber, Editor-in-Chief on May 2, 2017

Data sharing in clinical research: challenges and open opportunities

Since we feel we have exhausted the options to make the data available responsibly, and considering the questions that were raised about the validity of the article’s conclusions, we have decided to post an Expression of Concern [5] to alert readers that the data are not available in line with the journal’s editorial policy. It is our intention to update this notice when a mechanism is established that allows concerns about the article’s analyses to be addressed while protecting patient privacy.

This statement seems to suggest that the ball is in the PACE investigators’ court and that PLOS One editors are prepared to wait. But reading the rest of the blog post, it becomes apparent that PLOS One is wavering on the data sharing policy

Current challenges and opportunities ahead

During our follow up it became clear that there is little consensus of opinion on the sharing of this particular dataset. Experts from the Data Advisory Board whom we consulted expressed different views on the stringency of the journal reaction. Overall they agreed on the need to consider the risk to confidentiality of the trial participants and on the relevance of developing mechanisms for consideration of data requests by an independent body or committee. Interestingly, the ruling of the FOI Tribunal also indicated that the vote did not reflect a consensus among all committee members.

Fact checking the PLOS One’s Editors’ blog and a rebuttal

John Peter fact checked  the PLOS One editors’ blog. It came up short on a number of points.

“Interestingly, the ruling of the FOI Tribunal also indicated that the vote did not reflect a consensus among all committee members.”

This line is misleading and reveals either ignorance or misunderstanding of the decision in Matthees.

The Information Tribunal (IT) is not a committee. It is part of the courts system of England and Wales.

…the IT’s decisions may be appealed to a higher court. As QMUL chose not to exercise this right but to opt instead to accept the decision, then clearly it considered there were no grounds for appeal. The decision stands in its entirety and applies without condition or caveat.

And

The court had two decisions to make:

First, could and should trial data be released and if so what test should apply to determine whether particular data should be made public? Second, when that test is applied to this particular set of data, do they meet that test?

The unanimous decision on the first question was very clear: there is no legal or ethical consideration which prevents release; release is permitted by the consent forms; there is a strong public interest in the release; making data available advances legitimate scientific debate; and the data should be released.

The test set by this unanimous decision was simple: whether data can be anonymized. Furthermore, again unanimously, the Tribunal stated that the test for anonymization is not absolute. It is whether the risk of identification is reasonably likely, not whether it is remote, and whether patients can be identified without prior knowledge, specialist knowledge or equipment, or resort to criminality.

It was on applying this test to the data requested, on whether they could be properly anonymized, that the IT reached a majority decision.

On the principles, on how these decisions should be made, on the test which should be applied and on the nature of that test, the court was unanimous.

It should also be noted that to share data which have not been anonymized would be in breach of the Data Protection Act. QMUL has shared these data with other researchers. QMUL should either report itself to the Information Commissioner’s Office or accept that the data can be anonymized. In which case, the unanimous decision of the IT is very clear: the data should be shared.

PLOS ONE should apply the IT decision and its own regulations and demand the data be shared or the paper retracted.

Data Advisory Board

The Editors’ blog referred to “Experts from the Data Advisory Board.. express[ing] different views on the stringency of the journal reaction.”

That was a source of puzzlement for me. Established procedures make no provision for an advisory board as part of the process or any appeal.

A Google Search clarified. I had been to this page a number of times before and did not remember seeing this statement. There is no date or any indication it was added after the rest of the statement.

PLOS has formed an external board of advisors across many fields of research published in PLOS journals. This board will work with us to develop community standards for data sharing across various fields, provide input and advice on especially complex data-sharing situations submitted to the journals, define data-sharing compliance, and proactively work to refine our policy. If you have any questions or feedback, we welcome you to write to us at data@plos.org.

The availability of data from reanalysis and independent probing has lots of stakeholders. Independent investigators, policymakers, and patients all have a stake. I don’t recognize the names on this list and see no indication that consumers affected by what is reported in clinical and health services papers have role in making decisions about the release of data. But one name stands out.

Who is Malcolm Macleod and what is he doing in this decision-making process?

Malcolm Macleod is quoted in the Science Media Centre reaction to the PACEgate special issue:

 Expert reaction to Journal of Health Psychology’s Special Issue on The PACE Trial

Prof. Malcolm Macleod, Professor of Neurology and Translational Neuroscience, University of Edinburgh, said:

“The PACE trial, while not perfect, provides far and away the best evidence for the effectiveness of any intervention for chronic fatigue; and certainly is more robust than any of the other research cited. Reading the criticisms, I was struck by how little actual meat there is in them; and wondered where some of the authors came from. In fact, one of them lists as an institution a research centre (Soerabaja Research Center) which only seems to exist as an affiliation on papers he wrote criticising the PACE trial.

“Their main criticisms seem to revolve around the primary outcome was changed halfway through the trial: there are lots of reasons this can happen, some justifiable and others not; the main think is whether it was done without knowledge of the outcomes already accumulated in the trial and before data lock – which is what was done here.

“So I don’t think there is really a story here, apart from a group of authors, some of doubtful provenance, kicking up dust about a study which has a few minor wrinkles (as all do) but still provides information reliable enough to shape practice. If you substitute ‘CFS’ for ‘autism’ and ‘PACE trial’ for ‘vaccination’ you see a familiar pattern…”

The declaration of interest is revealing in what it says and what it does not say.

Prof. MacLeod: “Prof Sharpe used to have an office next to my wife’s; and I sit on the PLoS Data board that considered what to do about one of their other studies.

The declaration fails to reveal a recent publication co-authored by Macleod and Trudy  Chalder.

Wu S, Mead G, Macleod M, Chalder T. Model of understanding fatigue after stroke. Stroke. 2015 Mar 1;46(3):893-8.

This press release comes from an organization strongly committed to the protection of the PACE trial from independent scrutiny. The SMC even organized a letter writing campaign headed by Peter White to petition Parliament to exclude universities for Freedom of Information Act requests. Of course, that will effectively block request for data.

Why would the PLOS One editors involved such a person to reconsider what been a decision in favor of releasing the data?

Connect the dots.

Trends will continue toward making uploading data into publicly accessible repositories a requirement for publication. PLOS One has bought into discredited arguments about patient consent forms not allowing sharing of anonymized data. PLOS One is no longer at the vanguard of open science through routine data sharing.

Better days: When PLOS Blogs honored my post about fatal flaws in the PACE chronic fatigue syndrome follow-up study (2015)

The back story on my receiving this honor was that PLOS Blogs only days before had shut down the blog site because of complaints from someone associated with the PACE trial. I was asked to resign. I refused. PLOS Blogs relented when I said it would be a publicity disaster for PLOS Blogs.

mind the brain logoThe back story on my receiving this honor was that PLOS Blogs only days before had shut down the blog site because of complaints from someone associated with the PACE trial. I was asked to resign. I refused. PLOS Blogs relented when I said it would be a publicity disaster for PLOS Blogs.

screen shot 11th most accessedA Facebook memory of what I was posting two years ago reminded me of better days when PLOS Blogs honored my post about the PACE trial.

Your Top 15 in ’15: Most popular on PLOS BLOGS Network

I was included in a list of the most popular blog posts in a network that received over 2.3 million visitors reading more than 600 new posts. [It is curious that the sixth and seventh most popular posts were omitted from this list, but that’s another story]

I was mentioned for number 11:

11) Uninterpretable: Fatal flaws in PACE Chronic Fatigue Syndrome follow-up study Mind the Brain 10/29/15

Investigating and sharing potential errors in scientific methods and findings, particularly involving psychological research, is the primary reason Clinical Health Psychologist (and PLOS ONE AE) Jim Coyne blogs on Mind the Brain and elsewhere. This closely followed post is one such example.

Earlier decisions by the investigator group preclude valid long-term follow-up evaluation of CBT for chronic fatigue syndrome (CFS). At the outset, let me say that I’m skeptical whether we can hold the PACE investigators responsible… Read more

The back story was that only days before, I had gotten complaints from readers of Mind the Brain who found they were blocked from leaving comments at my blog site. I checked and found that I couldn’t even access the blog as an author.

I immediately emailed Victoria Costello and asked her what it happened. We agreed to talk by telephone, even though it was already late night where I was in Philadelphia. She was in the San Francisco PLOS office.

In the telephone conversation,  I was reminded me that there were some topics about which was not supposed to blog. Senior management at PLOS found me in violation of that prohibition and wanted me to stop blogging.

As is often the case with communication with the senior management of PLOS, no specifics had been given.  There was no formal notice or disclosure about what topics I couldn’t blog or who had complained. And there had been no warning when my access to the blog site was cut. Anything that I might say publicly could be met with a plausible denial.

I reminded Victoria that I had never received any formal specification about what I could blog nor from whom the complaint hand come. There had been a vague communication from her about not blogging about certain topics. I knew that complaints from either Gabrielle Oettingen or her family members had led to request the blog about the flaws in her book,  Rethinking Positive Thinking . That was easy to do because I was not planning another post about that dreadful self-help book.  Any other prohibition was left so vague that had no idea that I couldn’t blog about the PACE trial. I had known that the authors of the British Psychological Society’s Understanding Psychosis were quite upset with what I had said in heavily accessed blog posts. Maybe that was the source of the other prohibition, but no one made that clear. And I wasn’t sure I wanted to honor it, anyway.

I pressed Victoria Costello for details. She said an editor had complained. When I asked if it was Richard Horton, she paused and mumbled something that I took as an affirmative. Victoria then suggested that  it would be best for the blog network and myself if we had a mutually agreed-upon parting of ways. I told her that I would probably publicly comment that the breakup was not mutual and it would be a publicity disaster for the blog.

igagged_jpg-scaled500Why I was even blogging for PLOS Blogs? Victoria Costello had recruited me over after I expressed discontent with the censorship that I was receiving at Psychology Today. The PT editors there had complained that some of my blogging about antidepressants might discourage ads from pharmaceutical companies for which they depended for revenue. The editors had insisted on  the right to approve my posts before I uploaded them. In inviting me to PLOS Blogs, Victoria told me that she too was a refugee from blogging at Psychology Today.  I wouldn’t have to worry about restrictions on what I could say at Mind the Brain, beyond avoiding libel.

I ended the conversation accepting the prohibition about blogging about the PACE trial. This is was despite disagreeing with the rationale that it would be a conflict of interest for me to blog about it after requesting the data from the PLOS One paper.

Since then, I repeatedly requested that the PLOS management acknowledge the prohibition on my blogging or at least put it in writing. My request was met with repeated refusals from Managing Editor Iratxe Puebla, who always cited my conflict of interest.

In early 2017, I began publicly tweeting about the issue, stimulating some curiosity others about whether there was a prohibition. InJuly 2017, the entire Mind the Brain site, not just my blog, was shut.

In early 2018, I will provide more backstory on that shutdown and dispute what was said in the blog post below. And more about the collusion between PLOS One senior management and the PACE investigators in the data not being available 2 years after I requested it.

Message for Mind the Brain readers from PLOSBLOGS

blank plos blogs thumb nail
This strange thumbnail is the default for when no preferred image is provided. It could indicate the haste with which this blog was posted.

Posted July 31, 2017 by Victoria Costello in Uncategorized

After five years and over a hundred posts, PLOSBLOGS is retiring its psychology blog, Mind the Brain, from our PLOS-hosted blog network. By mutual agreement with the primary Mind the Brain blogger, James Coyne, Professor Coyne will retain the name of this blog and will take his archive of posts for reuse on his independent website, http://www.coyneoftherealm.com.

According to PLOSBLOGS’ policy for all our retired (inactive) blogs, any and all original posts published on Mind the Brain will retain their PLOS web addresses as intact urls, so links made previously from other sites will not be broken. In addition, PLOS will supply the archive of his posts directly to Prof Coyne so that he may repost them anywhere he may wish.

PLOS honors James Coyne’s voice as an important one in peer-to-peer scientific criticism. As discussed with Professor Coyne in recent days, after careful consideration PLOSBLOGS has concluded that it does not have the staff resources required to vet the sources, claims and tone contained in his posts, to assure they are aligned with our PLOSBLOGS Community Guidelines. This has lead us to the conclusion that Professor Coyne and his content would be better served on his own independent blog platform. We wish James Coyne the best with his future blogging.

—Victoria Costello, Senior Editor, PLOSBLOGS & Communities

Bollocks!

Power pose: II. Could early career investigators participating in replication initiatives hurt their advancement?

Participation in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

mind the brain logo

This is the second  blog post concerning the special issue of Comprehensive Results in Social Psychology  devoted to replicating Amy Cuddy’s original power pose study in Psychological Science.

Some things for early career investigators to think about.

Participating in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

I have long argued that there should be better incentives for early career (as well as more senior) investigators (ECRs) participating in efforts to improve the trustworthiness of science.

ECRs should be encouraged -and expected- to engage in post publication peer review and PubPeer and PubMed Commons in which to develop way in which such activity can be listed on the CV.

The Pottery Barn rule should be extended so that ECRs can publish critical commentaries in the journals that publish the original flawed papers. Retraction notices should indicate whose complaints led to the retraction.

Rather than being pressured to publish more underpowered, under-resourced studies, ECRs should be rewarded for research parasite activity. They should be assisted in obtaining data sets from already published studies. With that data, they should conduct exploratory, secondary analyses aimed at understanding what went wrong in larger-scale studies that left them methodologically compromised and with shortfalls in recruitment.

But I wonder if we should counsel ECRs that participating in a multisite replication initiatives like the one directed at the power pose effect might not contribute to the career advancement and may even hurt it.

MturkI’ve been critical of the value of replication initiatives  as the primary means of addressing the own trustworthiness of psychology, particularly in areas with claims of clinical and public health relevance. To add to the other reservations I have, I can point that the necessary economy and efficiency of reliance on MTurk and other massive administrations of experimental manipulations can force the efforts to improve the trustworthiness of psychology into less socially significant and may be less representative areas.

I certainly wouldn’t penalize an early career investigator for involvement in a multisite replication. I appreciate there is room for disagreement with my skepticism about the value of such initiatives. I would recognize the expression of valuation a better research practices that involvement would represent.

But I think early career investigator’s need to consider that some senior investigators and members of hiring and promotion committees (HPCs) might give a low rating of publications coming from such initiatives in judging the candidates potential for original, creative, risk-taking research. That might be even if these committee members appreciate the need to improve the trustworthiness of psychology.

Here are some conceivable comments that could be made in such a committee’s deliberations.

“Why did this candidate get involved in a modest scale study so focused on two saliva assessments of cortisol? Even if it is not their area of expertise, shouldn’t they have consulted the literature and saw how uninformative a pair of assessments of cortisol are, given the well-known problems with cortisol of intra-individual and inter-individual variation in sensitivity to uncontrolled contextual variables?…They should have powered their study to find cortisol differences amidst all the noise.”

“Were they unaware that testosterone levels differ between men and women by a factor of five or six? How do they expect that discontinuity in distributions to be overcome in any statistical analyses combining men and women? What basis was there in the literature suggests that a brief, seemingly trivial manipulation of posture with have such enduring effects on hormones? Why they specifically anticipate differences would be registered in women? Overall, their involvement in this initiative demonstrates a willingness to commit considerable time and resources to ideas that could have been ruled out by a search of the relevant literature.”

lemmengiphy.gif

“There seems to be a lemming quality to this large group of researchers pursuing some bad hypotheses with inappropriate methods. Why didn’t this investigator have the independence of mind to object? Can we expect a similar going with the herd after fashionable topics in research over the next few years?”

“While I appreciate the motivation of this investigator, I believe there was a violation of the basic principle of ‘stop and think before you undertake a study’ that does not bode well for how they will spend their time when faced with the demands of teaching and administration as well as doing research.”

Readers may think that these comments represent horrible, cruel sentiments and would be a great injustice if they influence hiring and decisions. But anyone who is ever been on a hiring and promotion committee knows that they are full of such horrible comments and that such processes are not fair or just or even rational.

 

 

 

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.

 

Study protocol violations, outcomes switching, adverse events misreporting: A peek under the hood

An extraordinary, must-read article is now available open access:

Jureidini, JN, Amsterdam, JD, McHenry, LB. The citalopram CIT-MD-18 pediatric depression trial: Deconstruction of medical ghostwriting, data mischaracterisation and academic malfeasance. International Journal of Risk & Safety in Medicine, vol. 28, no. 1, pp. 33-43, 2016

The authors had access to internal documents written with the belief that they would be left buried in corporate files. However, these documents became publicly available in a class-action product liability suit concerning the marketing of the antidepressant citalopram for treating children and adolescents.

Detailed evidence of ghost writing by industry sponsors has considerable shock value. But there is a broader usefulness to this article allowing us to peek in on the usually hidden processes by which null findings and substantial adverse events are spun into a positive report of the efficacy and safety of a treatment.

another peeking under the hoodWe are able to see behind the scenes how an already underspecified protocol was violated, primary and secondary outcomes were switched or dropped, and adverse events were suppressed in order to obtain the kind of results needed for a planned promotional effort and the FDA approval for use of the drug in these populations.

We can see how subtle changes in analyses that would otherwise go unnoticed can have a profound impact on clinical and public policy.

In so many other situations, we are left only with our skepticism about results being too good to be true. We are usually unable to evaluate independently investigators’ claims because protocols are unavailable, deviations are not noted, analyses are conducted and reported without transparency. Importantly, there usually is no access to data that would be necessary for reanalysis.

ghostwriter_badThe authors whose work is being criticized are among the most prestigious child psychiatrists in the world. The first author is currently President-elect of the American Academy of Child and Adolescent Psychiatry. The journal is among the top psychiatry journals in the world. A subscription is provided as part of membership in the American Psychiatric Association. Appearing in this journal is thus strategic because its readership includes many practitioners and clinicians who will simply defer to academics publishing in a journal they respect, without inclination to look carefully.

Indeed, I encourage readers to go to the original article and read it before proceeding further in the blog. Witness the unmasking of how null findings were turned positive. Unless you had been alerted, would you have detected that something was amiss?

Some readers have participated in multisite trials other than as a lead investigator.  I ask them to imagine that they had had received the manuscript for review and approval and assumed it was vetted by the senior investigators – and only the senior investigators.  Would they have subjected it to the scrutiny needed to detect data manipulation?

I similarly ask reviewers for scientific journals if they would have detected something amiss. Would they have compared the manuscript to the study protocol? Note that when this article was published, they probably would’ve had to contact the authors or the pharmaceutical company.

Welcome to a rich treasure trove

Separate from the civil action that led to these documents and data being released, the federal government later filed criminal charges and false claims act allegations against Forest Laboratories. The pharmaceutical company pleaded guilty and accepted a $313 million fine.

Links to the filing and the announcement from the federal government of a settlement is available in a supplementary blog at Quick Thoughts. That blog post also has rich links to the actual emails accessed by the authors, as well as blog posts by John M Nardo, M.D. that detail the difficulties these authors had publishing the paper we are discussing.

Aside from his popular blog, Dr. Nardo is one of the authors of a reanalysis that was published in The BMJ of a related trial:

Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, Abi-Jaoude E. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ 2015; 351: h4320

My supplementary blog post contains links to discussions of that reanalysis obtained from GlaxoSmithKline, the original publication based on these data, 30 Rapid Responses to the reanalysis The BMJ, as well as federal criminal complaints and the guilty pleading of GlaxoSmithKline.

With Dr. Nardo’s assistance, I’ve assembled a full set of materials that should be valuable in stimulating discussion among senior and junior investigators, as well in student seminars. I agree with Dr. Nardo’s assessment:

I think it’s now our job to insure that all this dedicated work is rewarded with a wide readership, one that helps us move closer to putting this tawdry era behind us…John Mickey Nardo

The citalopram CIT-MD-18 pediatric depression trial

The original article that we will be discussing is:

Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. American Journal of Psychiatry. 2004 Jun 1;161(6):1079-83.

It reports:

An 8-week, randomized, double-blind, placebo-controlled study compared the safety and efficacy of citalopram with placebo in the treatment of children (ages 7–11) and adolescents (ages 12–17) with major depressive disorder.

The results and conclusion:

Results: The overall mean citalopram dose was approximately 24 mg/day. Mean Children’s Depression Rating Scale—Revised scores decreased significantly more from baseline in the citalopram treatment group than in the placebo treatment group, beginning at week 1 and continuing at every observation point to the end of the study (effect size=2.9). The difference in response rate at week 8 between placebo (24%) and citalopram (36%) also was statistically significant. Citalopram treatment was well tolerated. Rates of discontinuation due to adverse events were comparable in the placebo and citalopram groups (5.9% versus 5.6%, respectively). Rhinitis, nausea, and abdominal pain were the only adverse events to occur with a frequency exceeding 10% in either treatment group.

Conclusions: In this population of children and adolescents, treatment with citalopram reduced depressive symptoms to a significantly greater extent than placebo treatment and was well tolerated.

The article ends with an elaboration of what is said in the abstract:

In conclusion, citalopram treatment significantly improved depressive symptoms compared with placebo within 1 week in this population of children and adolescents. No serious adverse events were reported, and the rate of discontinuation due to adverse events among the citalopram-treated patients was comparable to that of placebo. These findings further support the use of citalopram in children and adolescents suffering from major depression.

The study protocol

The protocol for CIT-MD-I8, IND Number 22,368 was obtained from Forest Laboratories. It was dated September 1, 1999 and amended April 8, 2002.

The primary outcome measure was the change from baseline to week 8 on the Children’s Depression Rating Scale-Revised (CDRS-R) total score.

Comparison between citalopram and placebo will be performed using three-way analysis of covariance (ANCOVA) with age group, treatment group and center as the three factors, and the baseline CDRS-R score as covariate.

The secondary outcome measures were the Clinical Global Impression severity and improvement subscales, Kiddie Schedule for Affective Disorders and Schizophrenia – depression module, and Children’s Global Assessment Scale.

Comparison between citalopram and placebo will be performed using the same approach as for the primary efficacy parameter. Two-way ANOVA will be used for CGI-I, since improvement relative to Baseline is inherent in the score.

 There was no formal power analysis but:

The primary efficacy variable is the change from baseline in CDRS-R score at Week 8.

Assuming an effect size (treatment group difference relative to pooled standard deviation) of 0.5, a sample size of 80 patients in each treatment group will provide at least 85% power at an alpha level of 0.05 (two-sided).

The deconstruction

 Selective reporting of subtle departures from the protocol could easily have been missed or simply excused as accidental and inconsequential, except that there was unrestricted access to communication within Forest Laboratories and to the data for reanalysis.

3.2 Data

The fact that Forest controlled the CIT-MD-18 manuscript production allowed for selection of efficacy results to create a favourable impression. The published Wagner et al. article concluded that citalopram produced a significantly greater reduction in depressive symptoms than placebo in this population of children and adolescents [10]. This conclusion was supported by claims that citalopram reduced the mean CDRS-R scores significantly more than placebo beginning at week 1 and at every week thereafter (effect size = 2.9); and that response rates at week 8 were significantly greater for citalopram (36% ) versus placebo (24% ). It was also claimed that there were comparable rates of tolerability and treatment discontinuation for adverse events (citalopram = 5.6% ; placebo = 5.9% ). Our analysis of these data and documents has led us to conclude that these claims were based on a combination of: misleading analysis of the primary outcome and implausible calculation of effect size; introduction of post hoc measures and failure to report negative secondary outcomes; and misleading analysis and reporting of adverse events.

3.2.1 Mischaracterisation of primary outcome

Contrary to the protocol, Forest’s final study report synopsis increased the study sample size by adding eight of nine subjects who, per protocol, should have been excluded because they were inadvertently dispensed unblinded study drug due to a packaging error [23]. The protocol stipulated: “Any patient for whom the blind has been broken will immediately be discontinued from the study and no further efficacy evaluations will be performed” [10]. Appendix Table 6 of the CIT-MD-18 Study Report [24] showed that Forest had performed a primary outcome calculation excluding these subjects (see our Fig. 2). This per protocol exclusion resulted in a ‘negative’ primary efficacy outcome.

Ultimately however, eight of the excluded subjects were added back into the analysis, turning the (albeit marginally) statistically insignificant outcome (p <  0.052) into a statistically significant outcome (p  <  0.038). Despite this change, there was still no clinically meaningful difference in symptom reduction between citalopram and placebo on the mean CDRS-R scores (Fig. 3).

The unblinding error was not reported in the published article.

Forest also failed to follow their protocol stipulated plan for analysis of age-by-treatment interaction. The primary outcome variable was the change in total CDRS-R score at week 8 for the entire citalopram versus placebo group, using a 3-way ANCOVA test of efficacy [24]. Although a significant efficacy value favouring citalopram was produced after including the unblinded subjects in the ANCOVA, this analysis resulted in an age-by-treatment interaction with no significant efficacy demonstrated in children. This important efficacy information was withheld from public scrutiny and was not presented in the published article. Nor did the published article report the power analysis used to determine the sample size, and no adequate description of this analysis was available in either the study protocol or the study report. Moreover, no indication was made in these study documents as to whether Forest originally intended to examine citalopram efficacy in children and adolescent subgroups separately or whether the study was powered to show citalopram efficacy in these subgroups. If so, then it would appear that Forest could not make a claim for efficacy in children (and possibly not even in adolescents). However, if Forest powered the study to make a claim for efficacy in the combined child plus adolescent group, this may have been invalidated as a result of the ANCOVA age-by-treatment interaction and would have shown that citalopram was not effective in children.

A further exaggeration of the effect of citalopram was to report “effect size on the primary outcome measure” of 2.9, which was extraordinary and not consistent with the primary data. This claim was questioned by Martin et al. who criticized the article for miscalculating effect size or using an unconventional calculation, which clouded “communication among investigators and across measures” [25]. The origin of the effect size calculation remained unclear even after Wagner et al. publicly acknowledged an error and stated that “With Cohens method, the effect size was 0.32,” [20] which is more typical of antidepressant trials. Moreover, we note that there was no reference to the calculation of effect size in the study protocol.

3.2.2 Failure to publish negative secondary outcomes, and undeclared inclusion of Post Hoc Outcomes

Wagner et al. failed to publish two of the protocol-specified secondary outcomes, both of which were unfavourable to citalopram. While CGI-S and CGI-I were correctly reported in the published article as negative [10], (see p1081), the Kiddie Schedule for Affective Disorders and Schizophrenia-Present (depression module) and the Children’s Global Assessment Scale (CGAS) were not reported in either the methods or results sections of the published article.

In our view, the omission of secondary outcomes was no accident. On October 15, 2001, Ms. Prescott wrote: “Ive heard through the grapevine that not all the data look as great as the primary outcome data. For these reasons (speed and greater control) I think it makes sense to prepare a draft in-house that can then be provided to Karen Wagner (or whomever) for review and comments” (see Fig. 1). Subsequently, Forest’s Dr. Heydorn wrote on April 17, 2002: “The publications committee discussed target journals, and recommended that the paper be submitted to the American Journal of Psychiatry as a Brief Report. The rationale for this was the following: … As a Brief Report, we feel we can avoid mentioning the lack of statistically significant positive effects at week 8 or study termination for secondary endpoints” [13].

Instead the writers presented post hoc statistically positive results that were not part of the original study protocol or its amendment (visit-by-visit comparison of CDRS-R scores, and ‘Response’, defined as a score of ≤28 on the CDRS-R) as though they were protocol-specified outcomes. For example, ‘Response’ was reported in the results section of the Wagner et al. article between the primary and secondary outcomes, likely predisposing a reader to regard it as more important than the selected secondary measures reported, or even to mistake it for a primary measure.

It is difficult to reconcile what the authors of the original article reported in terms of adverse events and what our “deconstructionists “ found in the unpublished final study report. The deconstruction article also notes that a letter to the editor appearing at the time of publication of the original paper called attention to another citalopram study that remain unpublished, but that was known to be a null study with substantial adverse events.

3.2.3 Mischaracterisation of adverse events

Although Wagner et al. correctly reported that “the rate of discontinuation due to adverse events among citalopram-treated patients was comparable to that of placebo”, the authors failed to mention that the five citalopram-treated subjects discontinuing treatment did so due to one case of hypomania, two of agitation, and one of akathisia. None of these potentially dangerous states of over-arousal occurred with placebo [23]. Furthermore, anxiety occurred in one citalopram patient (and none on placebo) of sufficient severity to temporarily stop the drug and irritability occurred in three citalopram (compared to one placebo). Taken together, these adverse events raise concerns about dangers from the activating effects of citalopram that should have been reported and discussed. Instead Wagner et al. reported “adverse events associated with behavioral activation (such as insomnia or agitation) were not prevalent in this trial” [10] and claimed thatthere were no reports of mania”, without acknowledging the case of hypomania [10].

Furthermore, examination of the final study report revealed that there were many more gastrointestinal adverse events for citalopram than placebo patients. However, Wagner et al. grouped the adverse event data in a way that in effect masked this possibly clinically significantly gastrointestinal intolerance. Finally, the published article also failed to report that one patient on citalopram developed abnormal liver function tests [24].

In a letter to the editor of the American Journal of Psychiatry, Mathews et al. also criticized the manner in which Wagner et al. dealt with adverse outcomes in the CIT-MD-18 data, stating that: “given the recent concerns about the risk of suicidal thoughts and behaviors in children treated with SSRIs, this study could have attempted to shed additional light on the subject” [26] Wagner et al. responded: “At the time the [CIT-MD-18] manuscript was developed, reviewed, and revised, it was not considered necessary to comment further on this topic” [20]. However, concerns about suicidal risk were prevalent before the Wagner et al. article was written and published [27]. In fact, undisclosed in both the published article and Wagner’s letter-to-the-editor, the 2001 negative Lundbeck study had raised concern over heightened suicide risk [10, 20, 21].

A later blog post will discuss the letters to the editor that appeared shortly after the original study in American Journal of Psychiatry. But for now, it would be useful to clarify the status of the negative Lundbeck study at that time.

The letter by Barbe published in AJP  remarked:

It is somewhat surprising that the authors do not compare their results with those of another trial, involving 244 adolescents (13–18-year-olds), that showed no evidence of efficacy of citalopram compared to placebo and a higher level of self-harm (16 [12.9%] of 124 versus nine [7.5%] of 120) in the citalopram group compared to the placebo group (5). Although these data were not available to the public until December 2003, one would expect that the authors, some of whom are employed by the company that produces citalopram in the United States and financed the study, had access to this information. It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

The study authors replied:

It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.

Conflict of interest

The authors of the deconstruction study indicate they do not have any conventional industry or speaker’s bureau support to declare, but they have had relevant involvement in litigation. Their disclosure includes:

The authors are not members of any industry-sponsored advisory board or speaker’s bureau, and have no financial interest in any pharmaceutical or medical device company.

Drs. Amsterdam and Jureidini were engaged by Baum, Hedlund, Aristei & Goldman as experts in the Celexa and Lexapro Marketing and Sales Practices Litigation. Dr. McHenry was also engaged as a research consultant in the case. Dr. McHenry is a research consultant for Baum, Hedlund, Aristei & Goldman.

Concluding remarks

I don’t have many illusions about the trustworthiness of the literature reporting clinical trials, whether pharmaceutical or psychotherapy. But I found this deconstruction article quite troubling. Among the authors’ closing observations are:

The research literature on the effectiveness and safety of antidepressants for children and adolescents is relatively small, and therefore vulnerable to distortion by just one or a two badly conducted and/or reported studies. Prescribing rates are high and increasing, so that prescribers who are misinformed by misleading publications risk doing real harm to many children, and wasting valuable health resources.

I recommend readers going to my supplementary blog and reviewing a very similar case of efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. I also recommend another of my blog posts  that summarizes action taken by the US government against both Forest Laboratories and GlaxoSmithKline for promotion of misleading claims about about the efficacy and safety of antidepressants for children and adolescents.

We should scrutinize studies of the efficacy and safety of antidepressants for children and adolescents, because of the weakness of data from relatively small studies with serious difficulties in their methodology and reporting. But we should certainly not stop there. We should critically examine other studies of psychotherapy and psychosocial interventions.

I previously documented [ 1,  2] interference by promoters of the lucrative Triple P Parenting in the implementation of a supposedly independent evaluation of it, including tampering with plans for data analysis. The promoters then followed it up attempting to block publication of a meta-analysis casting doubt on their claims.

But  suppose we are not dealing the threat of conflict of interest associated with high financial stakes as an pharmaceutical companies or a globally promoted psychosocial program. There are still the less clear conflicts associated with investigator egos and the pressures to produce positive results in order to get refunded.  We should require scrutiny of protocols, whether they were faithfully implemented, with the resulting data analyzed according to a priori plans. To do that, we need unrestricted access to data and the opportunity to reanalyze it from multiple perspectives.

Results of clinical trials should be examined wherever possible in replications and extensions in new settings. But this frequently requires resources that are unlikely to be available

We are unlikely ever to see anything for clinical trials resembling the replication initiatives such as the Open Science Collaboration’s (OSC) Replication Project: Psychology. The OSC depends on mass replications involving either samples of college students or recruitment from the Internet. Most of the studies involved in the OSC did not have direct clinical or public health implications. In contrast, clinical trials usually do and require different approaches to insure the trustworthiness of findings that are claimed.

Access to the internal documents of Forest Laboratories revealed a deliberate, concerted effort to produce results consistent with the agenda of vested interests, even where prespecified analyses yielded contradictory findings. There was clear intent. But we don’t need to assume an attempt to deceive and defraud in order to insist on the opportunity to re-examine findings that affect patients and public health. As US Vice President Joseph Biden recently declared, securing advances in biomedicine and public health depends on broad and routine sharing and re-analysis of data.

My usual disclaimer: All views that I express are my own and do not necessarily reflect those of PLOS or other institutional affiliations.