Lessons we need to learn from a Lancet Psychiatry study of the association between exercise and mental health

The closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

giphyThe closer we look at a heavily promoted study of exercise and mental health, the more its flaws become obvious. There is little support for the most basic claims being made – despite the authors marshaling enormous attention to the study.

Apparently, the editor of Lancet Psychiatry and reviewers did not give the study a close look before it was accepted.

The article was used to raise funds for a startup company in which one of the authors was heavily invested. This was disclosed, but doesn’t let the authors off the hook for promoting a seriously flawed study. Nor should the editor of Lancet Psychiatry or reviewers escape criticism, nor the large number of people on Twitter who thoughtlessly retweeted and “liked” a series of tweets from the last author of the study.

This blog post is intended to raise consciousness about bad science appearing in prestigious journals and to allow citizen scientists to evaluate their own critical thinking skills in terms of their ability to detect misleading and exaggerated claims.

1.Sometimes a disclosure of extensive conflicts of interest alerts us not to pay serious attention to a study. Instead, we should question why the study got published in a prestigious peer-reviewed journal when it had such an obvious risk of bias.

2.We need citizen scientists with critical thinking skills to identify such promotional efforts and alert others in their social network that hype and hokum are being delivered.

3.We need to stand up to authors who use scientific papers for commercial purposes, especially when they troll critics.

Read on and you will see what a skeptical look at the paper and its promotion revealed.

  • The study failed to capitalize on the potential of multiple years of data for developing and evaluating statistical models. Bigger is not necessarily better. Combining multiple years of data was wasteful and served only the purpose of providing the authors bragging rights and the impressive, but meaningless p-values that come from overly large samples.
  • The study relied on an unvalidated and inadequate measure of mental health that confounded recurring stressful environmental conditions in the work or home with mental health problems, even where validated measures of mental health would reveal no effects.
  • The study used an odd measure of history of mental health problems that undoubtedly exaggerated past history.
  • The study confused physical activity with (planned) exercise. Authors amplified their confusion by relying on an exceedingly odd strategy for getting estimate of how much participants exercised: Estimates of time spent in a single activity was used in analyses of total time spent exercising. All other physical activity was ignored.
  • The study made a passing acknowledgment of the problems interpreting simple associations as causal, but then went on to selectively sample the existing literature to make the case that interventions to increase exercise improve mental health.
  • Taken together, a skeptical of assessment of this article provides another demonstration that disclosure of substantial financial conflicts of interests should alert readers to a high likelihood of a hyped, inaccurately reported study.
  • The article was pay walled so that anyone interested in evaluating the authors claims for themselves had to write to the author or have access to the article through a university library site. I am waiting for the authors to reply to my requests for the supplementary tables that are needed to make full sense of their claims. In the meantime, I’ll just complain about authors with significant conflicts of interest heavily promoting studies that they hide behind paid walls.

I welcome you to  examine the author’s thread of tweets. Request the actual article from the author if you want to evaluate independently my claims. This can be great material for a masters or honors class on critical appraisal, whether in psychology or journalism.

title of article

Let me know if you think that I’ve been too hard on this study.

A thread of tweets  from the last author celebrated the success of well orchestrated publicity campaign for a new article concerning exercise and mental health in Lancet Psychiatry.

The thread started:

Our new @TheLancetPsych paper was the biggest ever study of exercise and mental health. it caused quite a stir! here’s my guided tour of the paper, highlighting some of our excitements and apprehensions along the way [thread] 1/n

And ended with pitch for the author’s do-good startup company:

Where do we go from here? Over @spring_health – our mental health startup in New York City – we’re using these findings to develop personalized exercise plans. We want to help every individual feel better—faster, and understand exactly what each patient needs the most.

I wasn’t long into the thread before my skepticism was stimulated. The fourth tweet in the thread had a figure that didn’t get any comments about how bizarre it was.

The tweet

It looks like those differences mattered. for example, people who exercised for about 45 minutes seemed to have better mental health than people who exercised for less than 30, or more than 60 minutes. — a sweet spot for mental health, perhaps?

graphs from paper

Apparently the author does not comment on an anomaly either. Housework appears to be better for mental health than a summary score of all exercise and looks equal to or better than cycling or jogging. But how did housework slip into the category “exercise”?

I begin wondering what the authors meant by “exercise” or if they’d given the definition serious consideration when constructing their key variable from the survey data.

But then that tweet was followed by another one that generated more confusion with a  graph the seemingly contradicted the figures in the last one

the type of exercise people did seems important too! People doing team sports or cycling had much better mental health than other sports. But even just walking or doing household chores was better than nothing!

Then a self-congratulatory tweet for a promotional job well done.

for sure — these findings are exciting, and it has been overwhelming to see the whole world talking openly and optimistically about mental health, and how we can help people feel better. It isn’t all plain sailing though…

The author’s next tweet revealed a serious limitation to the measure of mental health used in the study in a screenshot.

screenshot up tweet with mental health variable

The author acknowledged the potential problem, sort of:

(1b- this might not be the end of the world. In general, most peple have a reasonable understanding of their feelings, and in depressed or anxious patients self-report evaluations are highly correlated with clinician-rated evaluations. But we could be more precise in the future)

“Not the end of the world?” Since when does the author of the paper in the Lancet family of journals so casually brush off a serious methodological issue? A lot of us who have examined the validity of mental health measures would be skeptical of this dismissal  of a potentially fatal limitation.

No validation is provided for this measure. On the face of it, respondents could endorse it on basis of facing  recurring stressful situations that had no consequences for their mental health. This reflects ambiguity of the term stress for both laypersons and scientists. “Stress” could variously refer to an environmental situation, a subjective experience of stress, or an adaptational outcome. Waitstaff could consider Thursday when the chef is off, a recurrent, weekly stress. Persons with diagnosable persistent depressive disorder would presumably endorse more days than not as being a mental health challenge. But they would mean something entirely different.

The author acknowledged that the association between exercise and mental health might be bidirectional in terms of causality

adam on lots of reasons to believe relationship goes both ways.PNG

But then made a strong claim for increased exercise leading to better mental health.

exercise increases mental health.PNG

[Actually, as we will see, the evidence from randomized trials of exercise to improve mental health is modest, and entirely disappears one limits oneself to the quality studies.]

The author then runs off the rail with the claim that the benefits of exercise exceed benefits of having greater than poverty-level income.

why are we so excited.PNG

I could not resist responding.

Stop comparing adjusted correlations obtained under different circumstances as if they demonstrated what would be obtained in RCT. Don’t claim exercising would have more effect than poor people getting more money.

But I didn’t get a reply from the author.

Eventually, the author got around to plugging his startup company.

I didn’t get it. Just how did this heavy promoted study advance the science fo such  “personalized recommendation?

Important things I learned from others’ tweets about the study

I follow @BrendonStubbs on Twitter and you should too. Brendon often makes wise critical observations of studies that most everyone else is uncritically praising. But he also identifies some studies that I otherwise would miss and says very positive things about them.

He started his own thread of tweets about the study on a positive note, but then he identified a couple of critical issues.

First, he took issue with the author’s week claiming to have identified a tipping point, below which exercise is beneficial, and above which exercise could prove detrimental the mental health.

4/some interpretations are troublesome. Most confusing, are the assumptions that higher PA is associated/worsens your MH. Would we say based on cross sect data that those taking most medication/using CBT most were making their MH worse?

A postdoctoral fellow @joefirth7  seconded that concern:

I agree @BrendonStubbs: idea of high PA worsening mental health limited to observation studies. Except in rare cases of athletes overtraining, there’s no exp evidence of ‘tipping point’ effect. Cross-sect assocs of poor MH <–> higher PA likely due to multiple other factors…

Ouch! But then Brendan follows up with concerns that the measure of physical activity has not been adequately validated, noting that such self-report measures prove to be invalid.

5/ one consideration not well discussed, is self report measures of PA are hopeless (particularly in ppl w mental illness). Even those designed for population level monitoring of PA https://journals.humankinetics.com/doi/abs/10.1123/jpah.6.s1.s5 … it is also not clear if this self report PA measure has been validated?

As we will soon see, the measure used in this study is quite flawed in its conceptualization and its odd methodology of requiring participants to estimate the time spent exercising for only one activity, with 70 choices.

Next, Brandon points to a particular problem using self-reported physical activity in persons with mental disorder and gives an apt reference:

6/ related to this, self report measures of PA shown to massively overestimate PA in people with mental ill health/illness – so findings of greater PA linked with mental illness likely bi-product of over-reporting of PA in people with mental illness e.g Validity and Value of Self-reported Physical Activity and Accelerometry in People With Schizophrenia: A Population-Scale Study of the UK Biobank [ https://academic.oup.com/schizophreniabulletin/advance-article/doi/10.1093/schbul/sbx149/4563831 ]

7/ An additional point he makes: anyone working in field of PA will immediately realise there is confusion & misinterpretation about the concepts of exercise & PA in the paper, which is distracting. People have been trying to prevent this happening over 30 years

Again, Brandon provides a spot-on citation clarifying the distinction between physical activity and exercise:, Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research 

The mysterious pseudonymous Zad Chow @dailyzad called attention to a blog post they had just uploaded and let’s take a look at some of the key points.

Lessons from a blog post: Exercise, Mental Health, and Big Data

Zad Chow is quite balanced in dispensing praise and criticism of the Lancet Psychiatry paper. They noted the ambiguity of any causality in cross-sectional correlation and that investigated the literature on their own.

So what does that evidence say? Meta-analyses of randomized trials seem to find that exercise has large and positive treatment effects on mental health outcomes such as depression.

Study Name     # of Randomized Trials             Effects (SMD) + Confidence Intervals

Schuch et al. 2016       25         1.11 (95% CI, 0.79-1.43)

Gordon et al. 2018      33         0.66 (95% CI, 0.48-0.83)

Krogh et al. 2017          35         −0.66 (95% CI, -0.86, -0.46)

But, when you only pool high-quality studies, the effects become tiny.

“Restricting this analysis to the four trials that seemed less affected of bias, the effect vanished into −0.11 SMD (−0.41 to 0.18; p=0.45; GRADE: low quality).” – Krogh et al. 2017

Hmm, would you have guessed this from the Lancet Psychiatry author’s thread of tweets?

Zad Chow showed the hype and untrustworthiness of the press coverage in prestigious media with a sampling of screenshots.

zad chou screenshots of press coverage

I personally checked and don’t see that Zad Chow’s selection of press coverage was skewed. Coverage in the media all seemed to be saying the same thing. I found the distortion to continue with uncritical parroting – a.k.a. churnaling – of the claims of the Lancet Psychiatry authors in the Wall Street Journal. 

The WSJ repeated a number of the author’s claims that I’ve already thrown into question and added a curiosity:

In a secondary analysis, the researchers found that yoga and tai chi—grouped into a category called recreational sports in the original analysis—had a 22.9% reduction in poor mental-health days. (Recreational sports included everything from yoga to golf to horseback riding.)

And the NHS England totally got it wrong:

NHS getting it wrong.PNG

So, we learned that the broad category “recreational sports” covers yoga and tai chi , as well as golf and  horseback riding. This raises serious questions about the lumping and splitting of categories of physical activity in the analyses that are being reported.

I needed to access the article in order to uncover some important things 

I’m grateful for the clues that I got from Twitter, and especially Zad Chow that I used in examining the article itself.

I got hung up on the title proclaiming that the study involved 1·2 million individuals. When I checked the article, I saw that the authors use three waves of publicly available data to get that number. Having that many participants gave them no real advantage except for bragging rights and the likelihood that modest associations could be expressed in expressed in spectacular p-values, like p<2・2 × 10–16. I don’t understand why the authors didn’t conduct analyses with one-way and Qwest validate results in another.

The obligatory Research in Context box made it sound like a systematic search of the literature had been undertaken. Maybe, but the authors were highly selective in what they chose to comment upon, as seen in its contradiction by the brief review of Zad Chow. The authors would have us believe that the existing literature is quite limited and inconclusive, supporting the need for like their study.

research in context

Caveat Lector, a strong confirmation bias is likely ahead in this article.

Questions accumulated quickly as to the appropriateness of the items available from a national survey undoubtedly constructed with other purposes. Certainly these items would not have been selected if the original investigators were interested in the research question at the center of this article.

Participants self-reported a previous diagnosis of depression or depressive episode on the basis of the following question: “Has a doctor, nurse, or other health professional EVER told you that you have a depressive disorder, including depression, major depression, dysthymia, or minor depression?”

Our own work has cast serious doubt on the correspondence of reports of a history of depression in response to a brief question embedded in a larger survey with results of a structured interview in which respondents’ answers can be probed. We found that answers to such questions were more related to current distress, then to actual past diagnoses and treatment of depression. However, the survey question used in the Lancet Psychiatry study added the further ambiguity and invalidity with the added  “or minor depression.” I am not sure under what circumstances a health care professional would disclose a diagnosis of “minor depression” to a patient, but I doubt it would be in context in which the professional felt treatment was needed.

Despite the skepticism that I was developing about the usefulness of the survey data, I was unprepared for the assessment of “exercise.”

Other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?” Participants who answered yes to this question were then asked: “What type of physical activity or exercise did you spend the most time doing during the past month?” A total of 75 types of exercise were represented in the sample, which were grouped manually into eight exercise categories to balance a diverse representation of exercises with the need for meaningful cell sizes (appendix).

Participants indicated the number of times per week or month that they did this exercise and the number of minutes or hours that they usually spend exercising in this way each time.

I had already been tipped off by the discussion on twitter that there would be a thorough confusion of planned exercise and mere physical activity. But now that was compounded. Why was physical activity during employment excluded? What if participants were engaged in a number of different physical activities,  like both jogging and bicycling? If so, the survey obtained data for only one of these activities, with the other excluded, and the choice could’ve been quite arbitrary as to which one the participant identified as the one to be counted.

Anyone who has ever constructed surveys would be alert to the problems posed by participants’ awareness that saying “yes” to exercising would require contemplating  75 different options, arbitrarily choosing one of them for a further question how much time the participant engaged in this activity. Unless participants were strongly motivated, then there was an incentive to simply say no, they didn’t exercise.

I suppose I could go on, but it was my judgment that any validity what the authors were claiming  had been ruled out. Like someone once said on NIH grant review panel, there are no vital signs left, let’s move on to the next item.

But let’s refocus just a bit on the overall intention of these authors. They want to use a large data set to make statements about the association between physical activity and a measure of mental health. They have used matching and statistical controls to equate participants. But that strategy effectively eliminates consideration of crucial contextual variables. Persons’ preferences and opportunities to exercise are powerfully shaped by their personal and social circumstances, including finances and competing demands on their time. Said differently, people are embedded in contexts in which a lot of statistical maneuvering has sought to eliminate.

To suggest a small number of the many complexities: how much physical activity participants get  in their  employment may be an important determinant of choices for additional activity, as well as how much time is left outside of work. If work typically involves a lot of physical exertion, people may simply be left too tired for additional planned physical activity, a.k.a. exercise, and the physical health may require it less. Environments differ greatly in terms of the opportunities and the safety of engaging in various kinds of physical activities. Team sports require other people being available. Etc., etc.

What I learned from the editorial accompanying the Lancet Psychiatry article

The brief editorial accompanying the article aroused my curiosity as to whether someone assigned to reading and commenting on this article would catch things that apparently the editor and reviewer missed.

Editorial commentators are chosen to praise, not to bury articles. There are strong social pressures to say nice things. However, this editorial leaked a number of serious concerns.

First

In presenting mental health as a workable, unified concept, there is a presupposition that it is possible and appropriate to combine all the various mental disorders as a single entity in pursuing this research. It is difficult to see the justification for this approach when these conditions differ greatly in their underlying causes, clinical presentation, and treatment. Dementia, substance misuse, and personality disorder, for example, are considered as distinct entities for research and clinical purposes; capturing them for study under the combined banner of mental health might not add a great deal to our understanding.

The problem here of categorisation is somewhat compounded by the repeated uncomfortable interchangeability between mental health and depression, as if these concepts were functionally equivalent, or as if other mental disorders were somewhat peripheral.

Then:

A final caution pertains to how studies approach a definition of exercise. In the current study, we see the inclusion of activities such as childcare, housework, lawn-mowing, carpentry, fishing, and yoga as forms of exercise. In other studies, these activities would be excluded for not fulfilling the definition of exercise as offered by the American College of Sports Medicine: “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.” 11 The study by Chekroud and colleagues, in its all-encompassing approach, might more accurately be considered a study in physical activity rather than exercise.

The authors were listening for a theme song with which they could promote their startup company in a very noisy data set. They thought they had a hit. I think they had noise.

The authors’ extraordinary disclosure of interests (see below this blog post) should have precluded publication of this serious flawed piece of work, either simply for reason of high likelihood of bias or because it promoted the editor and reviewers to look more carefully at the serious flaws hiding in plain sight.

Postscript: Send in the trolls.

On Twitter, Adam Chekroud announced he felt no need to respond to critics. Instead, he retweeted and “liked” trolling comments directed at critics from the twitter accounts of his brother, his mother, and even the official Twitter account of a local fried chicken joint @chickenlodge, that offered free food for retweets and suggested including Adam Chekroud’s twitter handle if you wanted to be noticed.

chicken lodge

Really, Adam, if you can’t stand the heat, don’t go near  where they are frying chicken.

The Declaration of Interests from the article.

declaration of interest 1

declaration of interest 2

 

Using F1000 “peer review” to promote politics over evidence about delivering psychosocial care to cancer patients

The F 1000 platform allowed authors and the reviewers whom they nominated to collaborate in crafting more of their special interest advocacy that they have widely disseminated elsewhere. Nothing original in this article and certainly not best evidence!

 

mind the brain logo

A newly posted article on the F1000 website raises questions about what the website claims is a “peer-reviewed” open research platform.

Infomercial? The F1000 platform allowed authors and the reviewers whom they nominated to collaborate in crafting more of their special interest advocacy that they have widely disseminated elsewhere. Nothing original in this article and certainly not best evidence!

I challenge the authors and the reviewers they picked to identify something said in the F1000 article that they have not said numerous times before either alone or in papers co-authored by some combination of authors and the reviewers they picked for this paper.

F1000 makes the attractive and misleading claim that versions of articles that are posted on its website reflect the response to reviewers.

Readers should be aware of uncritically accepting articles on the F 1000 website as having been peer-reviewed in any conventional sense of the term.

Will other special interests groups exploit this opportunity to brand their claims as “peer-reviewed” without the risk of having to tone down their claims in peer review? Is this already happening?

In the case of this article, reviewers were all chosen by the authors and have a history of co-authoring papers with the authors of the target paper in active advocacy of a shared political perspective, one that is contrary to available evidence.

Cynically, future authors might be motivated to divide their team, with some remaining authors and others dropping off to become nominated as reviewers. They could then suggest content that had already been agreed would be included, but was left off for the purposes being suggested in the review process

F1000

F1000Research bills itself as

An Open Research publishing platform for life scientists, offering immediate publication of articles and other research outputs without editorial bias. All articles benefit from transparent refereeing and the inclusion of all source data.

Material posted on this website is labeled as having received rapid peer-review:

Articles are published rapidly as soon as they are accepted, after passing an in-house quality check. Peer review by invited experts, suggested by the authors, takes place openly after publication.

My recent Google Scholar alert call attention to an article posted on F1000

Advancing psychosocial care in cancer patients [version 1; referees: 3 approved]

 Who were the reviewers?

open peer review of Advancing psychosocial care

Google the names of authors and reviewers. You will discover a pattern of co-authorship; leadership positions in international Psycho-Oncology society, a group promoting the mandating of specially mental health services for cancer patients, and lots of jointly and separately authored articles making a pitch for increased involvement of mental health professionals in routine cancer care. This article adds almost nothing to what is multiply available elsewhere in highly redundant publications

Given a choice of reviewers, these authors would be unlikely to nominate me. Nonetheless, here is my review of the article.

 As I might do in a review of a manuscript, I’m not providing citations for these comments, but support can readily be found by a search of blog posts at my website @CoyneoftheRealm.com and Google Scholar search of my publications. I welcome queries from anybody seeking documentation of these points below.

 Fighting Spirit

The notion that cancer patients having a fighting spirit improves survival is popular in the lay press and in promoting the power of the mind over cancer, but it has thoroughly been discredited.

Early on, the article identifies fighting spirit as an adaptive coping style. In actuality, fighting spirit was initially thought to predict mortality in a small methodologically flawed study. But that is no longer claimed.

Even one of the authors of the original study, Maggie Watson,  expressed relief when her own larger, better designed study failed to confirm the impression that a fighting spirit extended life after diagnosis  of cancer. Why? Dr. Watson was concerned that the concept was being abused in blaming cancer patients who were dying there was to their personal deficiency of not having enough fighting spirit.

Fighting spirit is rather useless as a measure of psychological adaptation. It confounds severity of cancer enrolled dysfunction with efforts to cope with cancer.

Distress as the sixth vital sign for cancer patients

distress thermometerBeware of a marketing slogan posing as an empirical statement. Its emptiness is similar to that of to “Pepsi is the one.” Can you imagine anyone conducting a serious study in which they conclude “Pepsi is not the one”?

Once again in this article, a vacuous marketing slogan is presented in impressive, but pseudo-medical terms. Distress cannot be a vital sign in the conventional sense. Thr  vital signs are objective measurements that do not depend on patient self-report: body temperature, pulse rate, and respiration rate (rate of breathing) (Blood pressure is not considered a vital sign, but is often measured along with the vital signs.).

Pain was declared a fifth vital sign, with physicians mandated  by guidelines to provide routine self-report screening of patients, regardless of their reasons for visit. Pain being the fifth vital sign seems to have been the inspiration for declaring distress as the sixth vital sign for cancer patients. However policy makers declaring pain  as the fifth vital sign did not result in improved patient levels of pain. Their subsequent making intervention mandatory for any reports of pain led to a rise in unnecessary back and knee surgery, with a substantial rise in associated morbidity and loss of function. The next shift to prescription of opioids that were claimed not to be addictive was the beginning of the current epidemic of addiction to prescription opioids. Making pain the fifth vital sign is killed a lot of patients and  turned others into addicts craving drugs on the street because they have lost their prescriptions for the opioids that addicted them.

pain as 5th vital signCDC launches

 Cancer as a mental health issue

There is a lack of evidence that cancer carries a risk of psychiatric disorder more than other chronic and catastrophic illnesses. However, the myth that there is something unique or unusual about cancer’s threat to mental health is commonly cited by mental health professional advocacy groups is commonly used to justify increased resources to them for specialized services.

The article provides an inflated estimate of psychiatric morbidity by counting adjustment disorders as psychiatric disorders. Essentially, a cancer patient who seeks mental health interventions for distress qualifies by virtue of help seeking being defined as impairment.

The conceptual and empirical muddle of “distress” in cancer patients

The article repeats the standard sloganeering definition of distress that the authors and reviewers have circulated elsewhere.

It has been very broadly defined as “a multifactorial, unpleasant, emotional experienceof a psychological (cognitive, behavioural, emotional), social and/or spiritual nature that may interfere with the ability to cope effectively with cancer, its physical symptoms and its treatment and that extends along a continuum, ranging from common normalfeelings of vulnerability, sadness and fears to problems that can become disabling, such as depression, anxiety, panic, social isolation and existential and spiritual crisis”5

[You might try googling this. I’m sure you’ll discover an amazing number of repetitions in similar articles advocating increasing psychosocial services for cancer patients organized around this broad definition.]

Distress is so broadly defined and all-encompassing, that there can be no meaningful independent validation of distress measures except for by other measures of distress, not conventional measures of adaptation or mental health. I have discussed that in a recent blog post.

If we restrict “distress” to the more conventional meaning of stress or negative affect, we find that any elevation in distress (usually 35% or so) associated with onset diagnosis of cancer tends to follow a natural trajectory of decline without formal intervention. Elevations in distress for most cancer patients, are resolved within 3 to 6 months without intervention. A residual 9 to 11% of cancer patients having elevated distress is likely attributed to pre-existing psychiatric disorder.

Routine screening for distress

The slogan “distress is the sixth vital sign” is used to justify mandatory routine screening of cancer patients for distress. In the United States, surgeons cannot close their electronic medical records for a patient and go on to the next patient without recording whether they had screened patients for distress, and if the patient reports distress, what intervention has been provided. Clinicians simply informally asking patients if they are distressed and responding to a “yes” by providing the patient with an antidepressant without further follow up allows surgeons to close the medical records.

As I have done so before, I challenge advocates of routine screening of cancer patients for distress to produce evidence that simply introducing routine screening without additional resources leads to better patient outcomes.

Routine screening for distress as uncovering unmet needs among cancer patients

 Studies in the Netherlands suggest that there is not a significant increase in need for services from mental health or allied health professionals associated with diagnosis of cancer. There is some disruption of such services that patients were receiving before diagnosis. It doesn’t take screening and discussion to suggest that patients that they at some point resume those services if they wish. There is also some increased need for physical therapy and nutritional counseling

If patients are simply asked a question whether they want a discussion of the services (in Dutch: Zou u met een deskundige willen praten over uw problemen?)  that are available, many patients will decline.

Much of demand for supportive services like counseling and support groups, especially among breast cancer patients is not from among the most distressed patients. One of the problems with clinical trials of psychosocial interventions is that most of the patients who seek enrollment are not distressed, and less they are prescreened. This poses dilemma: if you require elevated distress on a screening instrument, we end up rationing services and excluding many of the patients who would otherwise be receiving them.

I welcome clarification from F 1000 just what they offer over other preprint repositories. When one downloads a preprint from some other repositories, it clearly displays “not yet peer-reviewed.” F 1000 carries the advantage of the label of “peer-reviewed, but does not seem to be hard earned.

Notes

Slides are from two recent talks at Dutch International Congress on Insurance Medicine Thursday, November 9, 2017, Almere, Netherlands   :

Will primary care be automated screening and procedures or talking to patients and problem-solving? Invited presentation

and

Why you should not routinely screen your patients for depression and what you should do instead. Plenary Presentation

        

                                  

 

 

 

Is risk of Alzheimer’s Disease reduced by taking a more positive attitude toward aging?

Unwarranted claims that “modifiable” negative beliefs cause Alzheimer’s disease lead to blaming persons who develop Alzheimer’s disease for not having been more positive.

Lesson: A source’s impressive credentials are no substitute for independent critical appraisal of what sounds like junk science and is.

More lessons on how to protect yourself from dodgy claims in press releases of prestigious universities promoting their research.

If you judge the credibility of health-related information based on the credentials of the source, this article  is a clear winner:

Levy BR, Ferrucci L, Zonderman AB, Slade MD, Troncoso J, Resnick SM. A Culture–Brain Link: Negative Age Stereotypes Predict Alzheimer’s Disease Biomarkers. Psychology and Aging. Dec 7 , 2015, No Pagination Specified. http://dx.doi.org/10.1037/pag0000062

alzheimers
From INI

As noted in the press release from Yale University, two of the authors are from Yale School of Medicine, another is a neurologist at Johns Hopkins School of Medicine, and the remaining three authors are from the US National Institute on Aging (NIA), including NIA’s Scientific Director.

The press release Negative beliefs about aging predict Alzheimer’s disease in Yale-led study declared:

“Newly published research led by the Yale School of Public Health demonstrates that                   individuals who hold negative beliefs about aging are more likely to have brain changes associated with Alzheimer’s disease.

“The study suggests that combatting negative beliefs about aging, such as elderly people are decrepit, could potentially offer a way to reduce the rapidly rising rate of Alzheimer’s disease, a devastating neurodegenerative disorder that causes dementia in more than 5 million Americans.

The press release posited a novel mechanism:

“We believe it is the stress generated by the negative beliefs about aging that individuals sometimes internalize from society that can result in pathological brain changes,” said Levy. “Although the findings are concerning, it is encouraging to realize that these negative beliefs about aging can be mitigated and positive beliefs about aging can be reinforced, so that the adverse impact is not inevitable.”

A Google search reveals over 40 stories about the study in the media. Provocative titles of the media coverage suggest a children’s game of telephone or Chinese whispers in which distortions accumulate with each retelling.

Negative beliefs about aging tied to Alzheimer’s (Waltonian)

Distain for the elderly could increase your risk of Alzheimer’s (FinancialSpots)

Lack of respect for elderly may be fueling Alzheimer’s epidemic (Telegraph)

Negative thoughts speed up onset of Alzheimer’s disease (Tech Times)

Karma bites back: Hating on the elderly may put you at risk of Alzheimer’s (LA Times)

How you feel about your grandfather may affect your brain health later in life (Men’s Health News)

Young people pessimistic about aging more likely to develop Alzheimer’s later on (Health.com)

Looking forward to old age can save you from Alzheimer’s (Canonplace News)

If you don’t like old people, you are at higher risk of Alzheimer’s, study says (RedOrbit)

If you think elderly people are icky, you’re more likely to get Alzheimer’s (HealthLine)

In defense of the authors of this article as well as journalists, it is likely that editors added the provocative titles without obtaining approval of the authors or even the journalists writing the articles. So, let’s suspend judgment and write off sometimes absurd titles to editors’ need to establish they are offering distinctive coverage, when they are not necessarily doing so. That’s a lesson for the future: if we’re going to criticize media coverage, better focus on the content of the coverage, not the titles.

However, a number of these stories have direct quotes from the study’s first author. Unless the media coverage is misattributing direct quotes to her, she must have been making herself available to the media.

Was the article such an important breakthrough offering new ways in which consumers could take control of their risk of Alzheimer’s by changing beliefs about aging?

No, not at all. In the following analysis, I’ll show that judging the credibility of claims based on the credentials of the sources can be seriously misleading.

What is troubling about this article and its well-organized publicity effort is that information is being disseminated that is misleading and potentially harmful, with the prestige of Yale and NIA attached.

Before we go any further, you can take your own look at a copy of the article in the American Psychological Association journal Psychology and Aging here, the Yale University press release here, and a fascinating post-publication peer review at PubPeer that I initiated as peer 1.

Ask yourself: if you encountered coverage of this article in the media, would you have been skeptical? If so what were the clues?

spoiler aheadcure within The article is yet another example of trusted authorities exploiting entrenched cultural beliefs about the mind-body connection being able to be harnessed in some mysterious way to combat or prevent physical illness. As Ann Harrington details in her wonderful book, The Cure Within, this psychosomatic hypothesis has a long and checkered history, and gets continually reinvented and misapplied.

We see an example of this in claims that attitude can conquer cancer. What’s the harm of such illusions? If people can be led to believe they have such control, they are set up for blame from themselves and from those around them when they fail to fend off and control the outcome of disease by sheer mental power.

The myth of “fighting spirit” overcoming cancer that has survived despite the accumulation of excellent contradictory evidence. Cancer patients are vulnerable to blaming themselves for being blamed by loved ones when they do not “win” the fight against cancer. They are also subject to unfair exhortations to fight harder as their health situation deteriorates.

onion composite
                                                        From the satirical Onion

 What I saw when I skimmed the press release and the article

  • The first alarm went off when I saw that causal claims were being made from a modest sized correlational study. This should set off anyone’s alarms.
  • The press release refers to this as a “first ever” d discussion section of the article refer to this as a “first ever” study. One does not seek nor expect to find robust “first ever” discoveries in such a small data set.
  • The authors do not provide evidence that their key measure of “negative stereotypes” is a valid measure of either stereotyping or likelihood of experiencing stress. They don’t even show it is related to concurrent reports of stress.
  • Like a lot of measures with a negative tone to items, this one is affected by what Paul Meehl calls the crud factor. Whatever is being measured in this study cannot be distinguished from a full range of confounds that not even being assessed in this study.
  • The mechanism by which effects of this self-report measure somehow get manifested in changes in the brain lacks evidence and is highly dubious.
  • There was no presentation of actual data or basic statistics. Instead, there were only multivariate statistics that require at least some access to basic statistics for independent evaluation.
  • The authors resorted to cheap statistical strategies to fool readers with their confirmation bias: reliance on one tailed rather than two-tailed tests of significance; use of a discredited backwards elimination method for choosing control variables; and exploring too many control/covariate variables, given their modest sample size.
  • The analyses that are reported do not accurately depict what is in the data set, nor generalize to other data sets.

The article

The authors develop their case that stress is a significant cause of Alzheimer’s disease with reference to some largely irrelevant studies by others, but depend on a preponderance of studies that they themselves have done with the same dubious small samples and dubious statistical techniques. Whether you do a casual search with Google scholar or a more systematic review of the literature, you won’t find stress processes of the kind the authors invoke among the usual explanations of the development of the disease.

Basically, the authors are arguing that if you hold views of aging like “Old people are absent-minded” or “Old people cannot concentrate well,” you will experience more stress as you age, and this will accelerate development of Alzheimer’s disease. They then go on to argue that because these attitudes are modifiable, you can take control of your risk for Alzheimer’s by adopting a more positive view of aging and aging people

The authors used their measure of negative aging stereotypes in other studies, but do not provide the usual evidence of convergent  and discriminant validity needed to establish the measure assesses what is intended. Basically, we should expect authors to show that a measure that they have developed is related to existing measures (convergent validity) in ways that one would expect, but not related to existing measures (discriminate validity) with which it should have associations.

Psychology has a long history of researchers claiming that their “new” self-report measures containing negatively toned items assess distinct concepts, despite high correlations with other measures of negative emotion as well as lots of confounds. I poked fun at this unproductive tradition in a presentation, Negative emotions and health: why do we keep stalking bears, when we only find scat in the woods?

The article reported two studies. The first tested whether participants holding more negative age stereotypes would have significantly greater loss of hippocampal volume over time. The study involved 52 individuals selected from a larger cohort enrolled in the brain-neuroimaging program of the Baltimore Longitudinal Study of Aging.

Readers are given none of the basic statistics that would be needed to interpret the complex multivariate analyses. Ideally, we would be given an opportunity to see how the independent variable, negative age stereotypes, is related to other data available on the subjects, and so we could get some sense if we are starting with some basic, meaningful associations.

Instead the authors present the association between negative age stereotyping and hippocampal volume only in the presence of multiple control variables:

Covariates consisted of demographics (i.e., age, sex, and education) and health at time of baseline-age-stereotype assessment, (number of chronic conditions on the basis of medical records; well-being as measured by a subset of the Chicago Attitude Inventory); self-rated health, neuroticism, and cognitive performance, measured by the Benton Visual Retention Test (BVRT; Benton, 1974).

Readers get cannot tell why these variables and not others were chosen. Adding or dropping a few variables could produce radically different results. But there are just too many variables being considered. With only 52 research participants, spurious findings that do not generalize to other samples are highly likely.

I was astonished when the authors announced that they were relying on one-tailed statistical tests. This is widely condemned as unnecessary and misleading.

Basically, every time the authors report a significance level in this article, you need to double the number to get what is obtained with a more conventional two-tailed test. So, if they proudly declare that results are significant p = .046, then the results are actually (non)significant, p= .092. I know, we should not make such a fuss about significance levels, but journals do. We’re being set up to be persuaded the results are significant, when they are not by conventional standards.

So the authors’ accumulating sins against proper statistical techniques and transparent reporting: no presentation of basic associations; reporting one tailed tests; use of multivariate statistics inappropriate for a sample that is so small. Now let’s add another one, in their multivariate regressions, the authors relied on a potentially deceptive backwards elimination:

Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.

The authors assembled their candidate control/covariate variables and used a procedure that checks them statistically and drop some from consideration, based on whether they fail to add to the significance of the overall equation. This procedure is condemned because the variables that are retained in the equation capitalize on chance. Particular variables that could be theoretically relevant are eliminated simply because they fail to add anything statistically in the context of the other variables being considered. In the context of other variables, these same discarded variables would have been retained.

The final regression equation had fewer control/covariates then when the authors started. Statistical significance will be calculated on the basis of the small number of variables remaining, not the number that were picked over and so results will artificially appear stronger. Again, potentially quite misleading to the unwary reader.

The authors nonetheless concluded:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had a significantly steeper decline in hippocampal volume

The second study:

examined whether participants holding more negative age stereotypes would have significantly greater accumulation of amyloid plaques and neurofibrillary tangles.

The outcome was a composite-plaques-and-tangles score and the predictor was the same negative age stereotypes measure from the first study. These measurements were obtained from 74 research participants upon death and autopsy. The same covariates were used in stepwise regression with backward elimination. Once again, the statistical test was one tailed.

Results were:

As predicted, participants holding more-negative age stereotypes, compared to those holding more-positive age stereotypes, had significantly higher composite-plaques-and-tangles scores, t(1,59) = 1.71 p = .046, d = 0.45, adjusting for age, sex, education, self-rated health, well-being, and number of chronic conditions.

Aha! Now we see why the authors commit themselves to a one tailed test. With a conventional two-tailed test, these results would not be significant. Given a prevailing confirmation bias, aversion to null findings, and obsession with significance levels, this article probably would not have been published without the one tailed test.

The authors’ stirring overall conclusion from the two studies:

By expanding the boundaries of known environmental influences on amyloid plaques, neurofibrillary tangles, and hippocampal volume, our results suggest a new pathway to identifying mechanisms and potential interventions related to Alzheimer’s disease

pubpeerPubPeer discussion of this paper [https://pubpeer.com/publications/16E68DE9879757585EDD8719338DCD ]

Comments accumulated for a couple of days on PubPeer after I posted some concerns about the first study. All of the comments were quite smart, some directly validated points that I been thinking about, but others took the discussion in new directions either statistically or because the commentators knew more about neuroscience.

Using a mechanism available at PubPeer, I sent emails to the first author of the paper, the statistician, and one of the NIA personnel inviting them to make comments also. None have responded so far.

Tom Johnstone, a commentator who exercise the option of identifying himself noted the reliance on inferential statistics in the absence of reporting basic relationships. He also noted that the criterion used to drop covariates was lax. Apparently familiar with neuroscience, he expressed doubts that the results had any clinical significance or relevance to the functioning of the research participants.

Another commentator complained of the small sample size, use of one tailed statistical tests without justification, the “convoluted list of covariates,” and “taboo” strategy for selecting covariates to be retained in the regression equation. This commentator also noted that the authors had examined the effect of outliers, conducting analyses both with and without the inclusion of the most extreme case. While it didn’t affect the overall results, exclusion dramatically change the significance level, highlighting the susceptibility of such a small sample to chance variation or sampling error.

Who gets the blame for misleading claims in this article?

dr-luigi-ferrucciThere’s a lot of blame to go around. By exaggerating the size and significance of any effects, the first author increases the chance of publication and also further funding to pursue what is seen as a “tantalizing” association. But it’s the job of editors and peer reviewers to protect the readership from such exaggerations and maybe to protect the author from herself. They failed, maybe because exaggerated findings are consistent with the journal‘s agenda of increasing citations by publishing newsworthy rather than trustworthy findings. The study statistician, Martin Slade obviously knew that misleading, less than optimal statistics were used, why didn’t he object? Finally, I think the NIA staff, particularly Luigi Ferrucci, the Scientific Director of NIA  should be singled out for the irresponsibility of attaching their names to such misleading claims. Why they do so? Did they not read the manuscript?  I will regularly present instances of NIH staff endorsing dubious claims, such as here. The mind-over-disease, psychosomatic hypothesis, gets a lot of support not warranted by the evidence. Perhaps NIH officials in general see this as a way of attracting research monies from Congress. Regardless, I think NIH officials have the responsibility to see that consumers are not misled by junk science.

This article at least provided the opportunity for an exercise that should raise skepticism and convince consumers at all levels – other researchers, clinicians, policymakers, and those who suffer from Alzheimer’s disease and those who care from them – we just cannot sit back and let trusted sources do our thinking for us.

 

Stalking a Cheshire cat: Figuring out what happened in a psychotherapy intervention trial

John Ioannidis, the “scourge of sloppy science”  has documented again and again that the safeguards being introduced into the biomedical literature against untrustworthy findings are usually ineffective. In Ioannidis’ most recent report , his group:

…Assessed the current status of reproducibility and transparency addressing these indicators in a random sample of 441 biomedical journal articles published in 2000–2014. Only one study provided a full protocol and none made all raw data directly available.

As reported in a recent post in Retraction Watch, Did a clinical trial proceed as planned? New project finds out, Psychiatrist Ben Goldacre has a new project with

…The relatively straightforward task of comparing reported outcomes from clinical trials to what the researchers said they planned to measure before the trial began. And what they’ve found is a bit sad, albeit not entirely surprising.

Ben Goldacre specifically excludes psychotherapy studies from this project. But there are reasons to believe that the psychotherapy literature is less trustworthy than the biomedical literature because psychotherapy trials are less frequently registered, adherence to CONSORT reporting standards is less strict, and investigators more routinely refuse to share data when requested.

Untrustworthiness of information provided in the psychotherapy literature can have important consequences for patients, clinical practice, and public health and social policy.

cheshire cat1The study that I will review twice switched outcomes in its reports, had a poorly chosen comparison control group and flawed analyses, and its protocol was registered after the study started. Yet, the study will likely provide data for decision-making about what to do with primary care patients with a few unexplained medical symptoms. The recommendation of the investigators is to deny these patients medical tests and workups and instead provide them with an unvalidated psychiatric diagnosis and a treatment that encourages them to believe that their concerns are irrational.

In this post I will attempt to track what should have been an orderly progression from (a) registration of a psychotherapy trial to (b) publishing of its protocol to (c) reporting of the trial’s results in the peer-reviewed literature. This exercise will show just how difficult it is to make sense of studies in a poorly documented psychological intervention literature.

  • I find lots of surprises, including outcome switching in both reports of the trial.
  • The second article reporting results of the trial that does not acknowledge registration, minimally cites the first reports of outcomes, and hides important shortcomings of the trial. But the authors inadvertently expose new crucial shortcomings without comment.
  • Detecting important inconsistencies between registration and protocols and reports in the journals requires an almost forensic attention to detail to assess the trustworthiness of what is reported. Some problems hide in plain sight if one takes the time to look, but others require a certain clinical connoisseurship, a well-developed appreciation of the subtle means by which investigators spin outcomes to get novel and significant findings.
  • Outcome switching and inconsistent cross-referencing of published reports of a clinical trial will bedevil any effort to integrate the results of the trial into the larger literature in a systematic review or meta-analysis.
  • Two journals – Psychosomatic Medicine and particularly Journal of Psychosomatic Research– failed to provide adequate peer review of articles based on this trial, in terms of trial registration, outcome switching, and allowing multiple reports of what could be construed as primary outcomes from the same trial into the literature.
  • Despite serious problems in their interpretability, results of this study are likely to be cited and influence far-reaching public policies.
  • cheshire cat4The generalizability of results of my exercise is unclear, but my findings encourage skepticism more generally about published reports of results of psychotherapy interventions. It is distressing that more alarm bells have not been sounded about the reports of this particular study.

The publicly accessible registration of the trial is:

Cognitive Behaviour Therapy for Abridged Somatization Disorder (Somatic Symptom Index [SSI] 4,6) patients in primary care. Current controlled trials ISRCTN69944771

The publicly accessible full protocol is:

Magallón R, Gili M, Moreno S, Bauzá N, García-Campayo J, Roca M, Ruiz Y, Andrés E. Cognitive-behaviour therapy for patients with Abridged Somatization Disorder (SSI 4, 6) in primary care: a randomized, controlled study. BMC Psychiatry. 2008 Jun 22;8(1):47.

The second report of treatment outcomes in Journal of Psychosomatic Research

Readers can more fully appreciate the problems that I uncovered if I work backwards from the second published report of outcomes from the trial. Published in Journal of Psychosomatic Research, the article is behind a pay wall, but readers can write to the corresponding author for a PDF: mgili@uib.es. This person is also the corresponding author for the second paper in Psychosomatic Medicine, and so readers might want to request both papers.

Gili M, Magallón R, López-Navarro E, Roca M, Moreno S, Bauzá N, García-Cammpayo J. Health related quality of life changes in somatising patients after individual versus group cognitive behavioural therapy: A randomized clinical trial. Journal of Psychosomatic Research. 2014 Feb 28;76(2):89-93.

The title is misleading in its ambiguity because “somatising” does not refer to an established diagnostic category. In this article, it refers to an unvalidated category that encompasses a considerable proportion of primary care patients, usually those with comorbid anxiety or depression. More about that later.

PubMed, which usually reliably attaches a trial registration number to abstracts, doesn’t do so for this article 

The article does not list the registration, and does not provide the citation when indicating that a trial protocol is available. The only subsequent citations of the trial protocol are ambiguous:

More detailed design settings and study sample of this trial have been described elsewhere [14,16], which explain the effectiveness of CBT reducing number and severity of somatic symptoms.

The above quote is also the sole citation of a key previous paper that presents outcomes for the trial. Only an alert and motivated reader would catch this. No opportunity within the article is provided for comparing and contrasting results of the two papers.

The brief introduction displays a decided puffer fish phenomenon, exaggerating the prevalence and clinical significance of the unvalidated “abridged somatization disorder.” Essentially, the authors invoke the  problematic, but accepted psychiatric diagnostic categories somatoform or somatization disorders in claiming validity for a diagnosis with much less stringent criteria. Oddly, the category has different criteria when applied to men and women: men require four unexplained medical symptoms, whereas women require six.

I haven’t previously counted the term “abridged” in psychiatric diagnosis. Maybe the authors mean “subsyndromal,” as in “subsyndromal depression.” This is a dubious labeling because it suggested all characteristics needed for diagnosis are not present, some of which may be crucial. Think of it: is a persistent cough subsyndromal lung cancer or maybe emphysema? References to symptoms being “subsyndromal”often occur in context where exaggerated claims about prevalence are being made with inappropriate, non-evidence-based inferences  about treatment of milder cases from the more severe.

A casual reader might infer that the authors are evaluating a psychiatric treatment with wide applicability to as many as 20% of primary care patients. As we will see, the treatment focuses on discouraging any diagnostic medical tests and trying to convince the patient that their concerns are irrational.

The introduction identifies the primary outcome of the trial:

The aim of our study is to assess the efficacy of a cognitive behavioural intervention program on HRQoL [health-related quality of life] of patients with abridged somatization disorder in primary care.

This primary outcome is inconsistent with what was reported in the registration, the published protocol, and the first article reporting outcomes. The earlier report does not even mention the inclusion of a measure of HRQoL, measured by the SF-36. It is listed in the study protocol as a “secondary variable.”

The opening of the methods section declares that the trial is reported in this paper consistent with the Consolidated Standards of Reporting Clinical Trials (CONSORT). This is not true because the flowchart describing patients from recruitment to follow-up is missing. We will see that when it is reported in another paper, some important information is contained in that flowchart.

The methods section reports only three measures were administered: a Standardized Polyvalent Psychiatric Interview (SPPI), a semistructured interview developed by the authors with minimal validation; a screening measure for somatization administered by primary care physicians to patients whom they deemed appropriate for the trial, and the SF-36.

Crucial details are withheld about the screening and diagnosis of “abridged somatization disorder.” If these details had been presented, a reader would further doubt the validity of this unvalidated and idiosyncratic diagnosis.

Few readers, even primary care physicians or psychiatrists, will know what to make of the Smith’s guidelines (Googling it won’t yield much), which is essentially a matter of simply sending a letter to the referring GP. Sending such a letter is a notoriously ineffective intervention in primary care. It mainly indicates that patients referred to a trial did not get assigned to an active treatment. As I will document later, the authors were well aware that this would be an ineffectual control/comparison intervention, but using it as such guarantees that their preferred intervention would look quite good in terms of effect size.

The two active interventions are individual- and group-administered CBT which is described as:

Experimental or intervention group: implementation of the protocol developed by Escobar [21,22] that includes ten weekly 90-min sessions. Patients were assessed at 4 time points: baseline, post-treatment, 6 and 12 months after finishing the treatment. The CBT intervention mainly consists of two major components: cognitive restructuring, which focuses on reducing pain-specific dysfunctional cognitions, and coping, which focuses on teaching cognitive and behavioural coping strategies. The program is structured as follows. Session 1: the connection between stress and pain. Session 2: identification of automated thoughts. Session 3: evaluation of automated thoughts. Session 4: questioning the automatic thoughts and constructing alternatives. Session 5: nuclear beliefs. Session 6: nuclear beliefs on pain. Session 7: changing coping mechanisms. Session 8: coping with ruminations, obsessions and worrying. Session 9: expressive writing. Session 10: assertive communication.

There is sparse presentation of data from the trial in the results section, but some fascinating details await a skeptical, motivated reader.

Table 1 displays social demographic and clinical variables. Psychiatric comorbidity is highly prevalent. Readers can’t tell exactly what is going on, because the authors’ own interview schedule is used to assess comorbidity. But it appears that all but a small minority of patients diagnosed with “abridged somatization disorder” have substantial anxiety and depression. Whether these symptoms meet formal criteria cannot be determined. There is no mention of physical comorbidities.

But there is something startling awaiting an alert reader in Table 2.

sf-36 gili

There is something very odd going on here, and very likely a breakdown of randomization. Baseline differences in the key outcome measure, SF-36 are substantially greater between groups than any within-group change. The treatment as usual condition (TAU) has much lower functioning [lower scores mean lower functioning] than the group CBT condition, which is substantially below the individual CBT difference.

If we compare the scores to adult norms, all three groups of patients are poorly functioning, but those “randomized” to TAU are unusually impaired, strikingly more so than the other two groups.

Keep in mind that evaluations of active interventions, in this case CBT, in randomized trials always involve a between difference between groups, not just difference observed within a particular group. That’s because a comparison/control group is supposed to be equivalent for nonspecific factors, including natural recovery. This trial is going to be very biased in its evaluation of individual CBT, a group within which patients started much higher in physical functioning and ended up much higher. Statistical controls fail to correct for such baseline differences. We simply do not have an interpretable clinical trial here.

cheshire cat2The first report of treatment outcomes in Psychosomatic Medicine

 Moreno S, Gili M, Magallón R, Bauzá N, Roca M, del Hoyo YL, Garcia-Campayo J. Effectiveness of group versus individual cognitive-behavioral therapy in patients with abridged somatization disorder: a randomized controlled trial. Psychosomatic medicine. 2013 Jul 1;75(6):600-8.

The title indicates that the patients are selected on the basis of “abridged somatization disorder.”

The abstract prominently indicates the trial registration number (ISRCTN69944771), which can be plugged into Google to reach the publicly accessible registration.

If a reader is unaware of the lack of validation for “abridged somatization disorder,” they probably won’t infer that from the introduction. The rationale given for the study is that

A recently published meta-analysis (18) has shown that there has been ongoing research on the effectiveness of therapies for abridged somatization disorder in the last decade.

Checking that meta-analysis, it only included a single null trial for treatment of abridged somatization disorder. This seems like a gratuitous, ambiguous citation.

I was surprised to learn that in three of the five provinces in which the study was conducted, patients

…Were not randomized on a one-to-one basis but in blocks of four patients to avoid a long delay between allocation and the onset of treatment in the group CBT arm (where the minimal group size required was eight patients). This has produced, by chance, relatively big differences in the sizes of the three arms.

This departure from one-to-one randomization was not mentioned in the second article reporting results of the study, and seems an outright contradiction of what is presented there. Neither is it mentioned nor in the study protocol. This patient selection strategy may have been the source of lack of baseline equivalence of the TAU and to intervention groups.

For the vigilant skeptic, the authors’ calculation of sample size is an eye-opener. Sample size estimation was based on the effectiveness of TAU in primary care visits, which has been assumed to be very low (approximately 10%).

Essentially, the authors are justifying a modest sample size because they don’t expect the TAU intervention is utterly ineffective. How could authors believe there is equipoise, that the comparison control and active interventions treatments could be expected to be equally effective? The authors seem to say that they don’t believe this. Yet,equipoise is an ethical and practical requirement for a clinical trial for which human subjects are being recruited. In terms of trial design, do the authors really think this poor treatment provides an adequate comparison/control?

In the methods section, the authors also provide a study flowchart, which was required for the other paper to adhere to CONSORT standards but was missing in the other paper. Note the flow at the end of the study for the TAU comparison/control condition at the far right. There was substantially more dropout in this group. The authors chose to estimate the scores with the Last Observation Carried Forward (LOCF) method which assumes the last available observation can be substituted for every subsequent one. This is a discredited technique and particularly inappropriate in this context. Think about it: the TAU condition was expected by the authors to be quite poor care. Not surprisingly,  more patients assigned to it dropped out. But they might have  dropped out while deteriorating, and so the last observation obtained is particularly inappropriate. Certainly it cannot be assumed that the smaller number of dropouts from the other conditions were from the same reason. We have a methodological and statistical mess on our hands, but it was hidden from us in our discussion of the second report.

 

flowchart

Six measures are mentioned: (1) the Othmer-DeSouza screening instrument used by clinicians to select patients; (2) the Screening for Somatoform Disorders (SOMS, a 39 item questionnaire that includes all bodily symptoms and criteria relevant to somatoform disorders according to either DSM-IV or ICD-10; (3) a Visual Analog Scale of somatic symptoms (Severity of Somatic Symptoms scale) that patients useto assess changes in severity in each of 40 symptoms; (4) the authors own SPPI semistructured psychiatric interview for diagnosis of psychiatric morbidity in primary care settings; (5) the clinician administered Hamilton Anxiety Rating Scale; and the (6) Hamilton Depression Rating Scale.

We are never actually told what the primary outcome is for the study, but it can be inferred from the opening of the discussion:

The main finding of the trial is a significant improvement regardless of CBT type compared with no intervention at all. CBT was effective for the relief of somatization, reducing both the number of somatic symptoms (Fig. 2) and their intensity (Fig. 3). CBT was also shown to be effective in reducing symptoms related to anxiety and depression.

But I noticed something else here, after a couple of readings. The items used to select patients and identify them with “abridged somatization disorder” reference  39 or 40 symptoms, and men only needing four, while women only needing six symptoms for a diagnosis. That means that most pairs of patients receiving a diagnosis will not have a symptom in common. Whatever “abridged somatization disorder” means, patients who received this diagnosis are likely to be different from each other in terms of somatic symptoms, but probably have other characteristics in common. They are basically depressed and anxious patients, but these mood problems are not being addressed directly.

Comparison of this report to the outcomes paper  reviewed earlier shows none of these outcomes are mentioned as being assessed and certainly not has outcomes.

Comparison of this report to the published protocol reveals that number and intensity of somatic symptoms are two of the three main outcomes, but this article makes no mention of the third, utilization of healthcare.

Readers can find something strange in Table 2 presenting what seems to be one of the primary outcomes, severity of symptoms. In this table the order is TAU, group CBT, and individual CBT. Note the large difference in baseline symptoms with group CBT being much more severe. It’s difficult to make sense of the 12 month follow-up because there was differential drop out and reliance on an inappropriate LOCR imputation of missing data. But if we accept the imputation as the authors did, it appears that they were no differences between TAU and group CBT. That is what the authors reported with inappropriate analyses of covariance.

Moreno severity of symptoms

The authors’ cheerful take away message?

This trial, based on a previous successful intervention proposed by Sumathipala et al. (39), presents the effectiveness of CBT applied at individual and group levels for patients with abridged somatization (somatic symptom indexes 4 and 6).

But hold on! In the introduction, the authors’ justification for their trial was:

Evidence for the group versus individual effectiveness of cognitive-behavioral treatment of medically unexplained physical symptoms in the primary care setting is not yet available.

And let’s take a look at Sumathipala et al.

Sumathipala A, Siribaddana S, Hewege S, Sumathipala K, Prince M, Mann A. Understanding the explanatory model of the patient on their medically unexplained symptoms and its implication on treatment development research: a Sri Lanka Study. BMC Psychiatry. 2008 Jul 8;8(1):54.

The article presents speculations based on an observational study, not an intervention study so there is no success being reported.

The formal registration 

The registration of psychotherapy trials typically provides sparse details. The curious must consult the more elaborate published protocol. Nonetheless, the registration can often provide grounds for skepticism, particularly when it is compared to any discrepant details in the published protocol, as well as subsequent publications.

This protocol declares

Study hypothesis

Patients randomized to cognitive behavioural therapy significantly improve in measures related to quality of life, somatic symptoms, psychopathology and health services use.

Primary outcome measures

Severity of Clinical Global Impression scale at baseline, 3 and 6 months and 1-year follow-up

Secondary outcome measures

The following will be assessed at baseline, 3 and 6 months and 1-year follow-up:
1. Quality of life: 36-item Short Form health survey (SF-36)
2. Hamilton Depression Scale
3. Hamilton Anxiety Scale
4. Screening for Somatoform Symptoms [SOMS]

Overall trial start date

15/01/2008

Overall trial end date

01/07/2009

The published protocol 

Primary outcome

Main outcome variables:

– SSS (Severity of somatic symptoms scale) [22]: a scale of 40 somatic symptoms assessed by a 7-point visual analogue scale.

– SSQ (Somatic symptoms questionnaire) [22]: a scale made up of 40 items on somatic symptoms and patients’ illness behaviour.

When I searched for, Severity of Clinical Global Impression, the primary outcome declared in the registration , and I could find no reference to it.

The protocol was submitted on May 14, 2008 and published on June 22, 2008. This suggests that the protocol was submitted after the start of the trial.

To calculate the sample size we consider that the effectiveness of usual treatment (Smith’s norms) is rather low, estimated at about 20% in most of the variables [10,11]. We aim to assess whether the new intervention is at least 20% more effective than usual treatment.

Comparison group

Control group or standardized recommended treatment for somatization disorder in primary care (Smith’s norms) [10,11]: standardized letter to the family doctor with Smith’s norms that includes: 1. Provide brief, regularly scheduled visits. 2. Establish a strong patient-physician relationship. 3. Perform a physical examination of the area of the body where the symptom arises. 4. Search for signs of disease instead of relying of symptoms. 5. Avoid diagnostic tests and laboratory or surgical procedures. 6. Gradually move the patient to being “referral ready”.

Basically, TAU, the comparison/control group involves simply sending a letter to referring physicians encouraging them simply to meet regularly with the patients but discouraged diagnostic test or medical procedures. Keep in mind that patients for this study were selected by the physicians because they found them particularly frustrating to treat. Despite the authors’ repeated claims about the high prevalence of “abridged somatization disorder,” they relied on a large number of general practice settings to each contribute only a few patients . These patients are very heterogeneous in terms of somatic symptoms, but most share anxiety or depressive symptoms.

House of GodThere is an uncontrolled selection bias here that makes generalization from results of the study problematic. Just who are these patients? I wonder if these patients have some similarity to the frustrating GOMERS (Get Out Of My Emergency Room) in the classic House of God, a book described by Amazon  as “an unvarnished, unglorified, and amazingly forthright portrait revealing the depth of caring, pain, pathos, and tragedy felt by all who spend their lives treating patients and stand at the crossroads between science and humanity.”

Imagine the disappointment about the referring physicians and the patients when consent to participate in this study simply left the patients back in routine care provided by the same physicians . It’s no wonder that the patients deteriorated and that patients assigned to this treatment were more likely to drop out.

Whatever active ingredients the individual and group CBT have, they also include some nonspecific factors missing from the TAU comparison group: frequency and intensity of contact, reassurance and support, attentive listening, and positive expectations. These nonspecific factors can readily be confused with active ingredients and may account for any differences between the active treatments and the TAU comparison. What terrible study.

The two journals providing reports of the studies failed to responsibility to the readership and the larger audience seeking clinical and public policy relevance. Authors have ample incentive to engage in questionable publication practices, including ignoring and even suppressing registration, switching outcomes, and exaggerating the significance of their results. Journals of necessity must protect authors from their own inclinations, as well as the readers and the larger medical community from on trustworthy reports. Psychosomatic Medicine and Journal of Psychosomatic Research failed miserably in their peer review of these articles. Neither journal is likely to be the first choice for authors seeking to publish findings from well-designed and well reported trials. Who knows, maybe the journals’ standards are compromised by the need to attract randomized trials for what is construed as a psychosomatic condition, at least by the psychiatric community.

Regardless, it’s futile to require registration and posting of protocols for psychotherapy trials if editors and reviewers ignore these resources in evaluating articles for publication.

Postscript: imagine what will be done with the results of this study

You can’t fix with a meta analysis what investigators bungled by design .

In a recent blog post, I examined a registration for a protocol for a systematic review and meta-analysis of interventions to address medically unexplained symptoms. The review protocol was inadequately described, had undisclosed conflicts of interest, and one of the senior investigators had a history of switching outcomes in his own study and refusing to share data for independent analysis. Undoubtedly, the study we have been discussing meets the vague criteria for inclusion in this meta-analysis. But what outcomes will be chosen, particularly when they should only be one outcome per study? And will be recognized that these two reports are actually the same study? Will key problems in the designation of the TAU control group be recognized, with its likely inflation of treatment effects, when used to calculate effect sizes?

cheshire_cat_quote_poster_by_jc_790514-d7exrjeAs you can see, it took a lot of effort to compare and contrast documents that should have been in alignment. Do you really expect those who conduct subsequent meta-analyses to make those multiple comparisons or will they simply extract multiple effect sizes from the two papers so far reporting results?

Obviously, every time we encounter a report of a psychotherapy in the literature, we won’t have the time or inclination to undertake such a cross comparison of articles, registration, and protocol. But maybe we should be skeptical of authors’ conclusions without such checks.

I’m curious what a casual reader would infer from encountering either of these reports of this clinical trial I have reviewed in a literature search, but not the other one.

 

 

PLSO-Blogs-Survey_240x310
http://plos.io/PLOSblogs16

Was independent peer review of the PACE trial articles possible?

I ponder this question guided by Le Chavalier C. Auguste Dupin, the first fictional detective, before anyone was called “detective.”

mccartney too manyArticles reporting the PACE trial have extraordinary numbers of authors, acknowledgments, and institutional affiliations. A considerable proportion of all persons and institutions involved in researching chronic fatigue and related conditions in the UK have a close connection to PACE.

This raises issues about

  • Obtaining independent peer review of these articles that is not tainted by reviewer conflict of interest.
  • Just what authorship on a PACE trial paper represents and whether granting of authorship conforms to international standards.
  • The security of potential critics contemplating speaking out about whatever bad science they find in the PACE trial articles. The security of potential reviewers who are negative and can be found out. Critics within the UK risk isolation and blacklisting from a large group who have investments in what could be exaggerated estimates of the quality and outcome of PACE trial.
  • Whether grants associated with multimillion pound PACE study could have received the independent peer review that is so crucial to assuring that proposals selected to be funded are of the highest quality.

Issues about the large number of authors, acknowledgments, and institutional affiliations become all the more salient as critics [1, 2, 3] find again serious flaws inthe conduct and the reporting of the Lancet Psychiatry 2015 long-term follow-up study. Numerous obvious Questionable Research Practices (QRPs) survived peer review. That implies at least ineptness in peer review or even Questionable Publication Practices (QPPs).

The important question becomes: how is the publication of questionable science to be explained?

Maybe there were difficulties finding reviewers with relevant expertise who were not in some way involved in the PACE trial or affiliated with departments and institutions that would be construed as benefiting from a positive review outcome, i.e. a publication?

Or in the enormous smallness of the UK, is independent peer review achieved by persons putting those relationships and affiliations aside to produce an impeccably detached and rigorous review process?

The untrustworthiness of both the biomedical and psychological literatures are well-established. Nonpharmacological interventions have fewer safeguards than drug trials, in terms of adherence to preregistration, reporting standards like CONSORT, and enforcement of sharing of data.

Open-minded skeptics should be assured of independent peer review of nonpharmacological clinical trials, particularly when there is evidence that persons and groups with considerable financial interests attempt to control what gets published and what is said about their favored interventions. Reviewers with potential conflicts of interest should be excluded from evaluation of manuscripts.

Independent peer review of the PACE trial by those with relevant expertise might not be possible the UK where much of the conceivable expertise is in some way directly or indirectly attached to the PACE trial.

A Dutch observer’s astute observations about the PACE articles

My guest blogger Dutch research biologist Klaas van Dijk  called attention to the exceptionally large number of authors and institutions listed for a pair of PACE trial papers.

klaasKlaas noted

The Pubmed entry for the 2011 Lancet paper lists 19 authors:

B J Angus, H L Baber, J Bavinton, M Burgess, T Chalder, L V Clark, D L Cox, J C DeCesare, K A Goldsmith, A L Johnson, P McCrone, G Murphy, M Murphy, H O’Dowd, PACE trial management group*, L Potts, M Sharpe, R Walwyn, D Wilks and P D White (re-arranged in an alphabetic order).

The actual article from the Lancet website ( http://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(11)60096-2.pdf and also http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext ) lists 19 authors who are acting ‘on behalf of the PACE trial management group†’. But the end of the paper (page 835) states: “PACE trial group.” This term is not identical to “PACE trial management group”.
.
In total, another 19 names are listed under “PACE trial group” (page 835): Hiroko Akagi, Mansel Aylward, Barbara Bowman Jenny Butler, Chris Clark, Janet Darbyshire, Paul Dieppe, Patrick Doherty, Charlotte Feinmann, Deborah Fleetwood, Astrid Fletcher, Stella Law, M Llewelyn, Alastair Miller, Tom Sensky, Peter Spencer, Gavin Spickett, Stephen Stansfeld and Alison Wearden (re-arranged in an alphabetic order).

There is no overlap with the first 19 people who are listed as author of the paper.

So how many people can claim to be an author of this paper? Are all these 19 people of the “PACE trial management group” (not identical to “PACE trial group”???) also some sort of co-author of this paper? Do all these 19 people of the second group also agree with the complete contents of the paper? Do all 38 people agree with the full contents of the paper?

The paper lists many affiliations:
* Queen Mary University of London, UK
* King’s College London, UK
* University of Cambridge, UK
* University of Cumbria, UK
* University of Oxford, UK
* University of Edinburgh, UK
* Medical Research Council Clinical Trials Unit, London, UK
* South London and Maudsley NHS Foundation Trust, London, UK
* The John Radcliffe Hospital, Oxford, UK
* Royal Free Hospital NHS Trust, London, UK
* Barts and the London NHS Trust, London, UK
* Frenchay Hospital NHS Trust, Bristol, UK;
* Western General Hospital, Edinburgh, UK

Do all these affiliations also agree with the full contents of the paper? Am I right to assume that all 38 people (names see above) and all affiliations / institutes (see above) plainly refuse to give critics / other scientists / patients / patient groups (etc.) access to the raw research data of this paper and am I am right with my assumption that it is therefore impossible for all others (including allies of patients / other scientists / interested students, etc.) to conduct re-calculations, check all statements with the raw data, etc?

Decisions whether to accept manuscripts for publication are made in dark places based on opinions offered by people whose identities may be known only to editors. Actually, though, in a small country like the UK, peer-reviewed may be a lot less anonymous than intended and possibly a lot less independent and free of conflict of interests. Without a lot more transparency than is currently available concerning peer review the published papers underwent, we are left to our speculation.

Prepublication peer review is just one aspect of the process of getting research findings vetted and shaped and available to the larger scientific community, and an overall process that is now recognized as tainted with untrustworthiness.

Rules for granting authorship

Concerns about gift and unwarranted authorship have increased not only because of growing awareness of unregulated and unfair practices, but because of the importance attached to citations and authorship for professional advancement. Journals are increasingly requiring documentation that all authors have made an appropriate contribution to a manuscript and have approved the final version

Yet operating rules for granting authorship in many institutional settings vary greatly from the stringent requirements of journals. Contrary to the signed statements that corresponding authors have to make in submitting a manuscript to a journal, many clinicians expect an authorship in return for access to patients. Many competitive institutions award and withhold authorship based on politics and good or bad behavior that have nothing to do with requirements of journals.

Basically, despite the existence of numerous ethical guidelines and explicit policies, authors and institutions can largely do what they want when it comes to granting and withholding authorship.

Persons are quickly disappointed when they are naïve enough to complain about unwarranted authorships or being forced to include authors on papers without appropriate contribution or being denied authorship for an important contribution. They quickly discover that whistleblowers are generally considered more of a threat to institutions and punished more severely than alleged wrongdoers, no matter how strong the evidence may be.

The Lancet website notes

The Lancet is a signatory journal to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, issued by the International Committee of Medical Journal Editors (ICMJE Recommendations), and to the Committee on Publication Ethics (COPE) code of conduct for editors. We follow COPE’s guidelines.

The ICMJE recommends that an author should meet all four of the following criteria:

  • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;
  • Drafting the work or revising it critically for important intellectual content;
  • Final approval of the version to be published;
  • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.”

The intent of these widely endorsed recommendations is that persons associated with a large project have to do a lot to claim their places as authors.

Why the fuss about acknowledgments?

I’ve heard from a number of graduate students and junior investigators that they have had their first manuscripts held up in the submission process because they did not obtain written permission for acknowledgments. Why is that considered so important?

Mention in an acknowledgment is an honor. But it implies involvement in a project and approval of a resulting manuscript. In the past, there were numerous instances where people were named in acknowledgments without having given permission. There was a suspicion sometimes confirmed, that they had been acknowledged only to improve the prospects of a manuscript for getting published. There are other instances where persons were included in acknowledgments without permission with the intent of authors avoiding them in the review process because of the appearance of a conflict of interest.

The expectation is that anyone contributing enough to a manuscript to be acknowledged as a potential conflict of interest in deciding whether it is suitable for publication.

But, as in other aspects of a mysterious and largely anonymous review process, whether people who were acknowledged in manuscripts were barred from participating in review of a manuscript cannot be established by readers.

What is the responsibility of reviewers to declare conflict of interest?

Reviewers are expected to declare conflicts of interest accepting a manuscript to review. But often they are presented with a tick box without a clear explanation of the criteria for the appearance of conflict of interest. But reviewers can usually continue considering a manuscript after acknowledging that they do have an association with authors or institutional affiliation, but they do not consider it a conflict. It is generally accepted that statement.

Authors excluding from the review process persons they consider to have a negative bias

In submitting a manuscript, authors are offered an opportunity to identify persons who should be excluded because of the appearance of a negative bias. Editors generally take these requests quite seriously. As an editor, I sometimes receive a large number of requested exclusions by authors who worry about opinions of particular people.

While we don’t know what went on in prepublication peer review, the PACE investigators have repeatedly and aggressively attempted to manipulate post publication portrayals of their trial in the media. Can we rule out that they similarly try to control potential critics in the prepublication peer review of their papers?

The 2015 Lancet Psychiatry secondary mediation analysis article

Chalder, T., Goldsmith, K. A., Walker, J., & White, P. D. Sharpe, M., Pickles, A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. The Lancet Psychiatry, 2: 141–52

The acknowledgments include

We acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, excluding ARP, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, LV Clark, DL Cox, JC DeCesare, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks. This report is independent research partly arising from a doctoral research fellowship supported by the NIHR.

Fifteen of the authors of the 2011 Lancet PACE paper are no longer present, and another author has been added. The PACE Trial Management Group is again acknowledged, but there is no mention of the separate PACE trial group. We can’t tell why there has been a major reduction in the number of authors and acknowledgments or why it came about. Or whether people who would been dropped participated in a review of this paper. But what is obvious is that this is an exceedingly flawed mediation analysis crafted to a foregone conclusion. I’ll say more about that in future blogs, but we can only speculate how the bad publication practices made it through peer review.

This article is a crime against the practice of secondary mediation analyses. If I were a prospect of author present in a discussion, I would flee before it became a crime scene.

I am told I have over 350 publications, but I considered vulgar for authors to keep track of exact numbers. But there are many potential publications that are not included in this number because I declined authorship because I could not agree with the spin that others were trying to put on the reporting of the findings. In such instances, I exclude myself from review of the resulting manuscript because of the appearance of a conflict of interest. We can ponder how many of the large pool of past PACE authors refused authorship on this paper when it was offered and homely declined to participate in subsequent peer review because of the appearance of a conflict of interest.

The 2015 Lancet Psychiatry long-term follow-up article

Sharpe, M., Goldsmith, K. A., Chalder, T., Johnson, A.L., Walker, J., & White, P. D. (2015). Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry, http://dx.doi.org/10.1016/S2215-0366(15)00317-X

The acknowledgments include

We gratefully acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, L V Clark, D L Cox, J C DeCesare, E Feldman, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks, and the King’s Clinical Trials Unit. We thank Hannah Baber for facilitating the long-term follow-up data collection.

Again, there are authors and acknowledgments missing from the early paper and were in the dark about how and why that happened and whether missing persons were considered free enough of conflict of interest to evaluate this article when it was in manuscript form. But as documented in a blog post at Mind the Brain, there were serious, obvious flaws in the conduct and reporting of the follow-up study. It is a crime against best practices for the proper conduct and reporting of clinical trials. And again we can speculate how it got through peer review.

… And grant reviews?

Where can UK granting agencies obtain independent peer review of past and future grants associated with the PACE trial? To take just one example, the 2015 Lancet Psychiatry secondary mediation analysis was funded in part by a NIHR doctoral research fellowship grant. The resulting paper has many fewer authors than the 2011 Lancet. Did everyone who was an author or mentioned in the acknowledgments on that paper exclude themselves from review of the screen? Who, then, would be left

In Germany and the Netherlands, concerns about avoiding the appearance of conflict of interest in obtaining independent peer review of grants has led to heavy reliance on expertise from outside the country. This does not imply any improprieties from expertise within these countries, but rather the necessity of maintaining a strong appearance that vested interests have not unduly influenced grant review. Perhaps the situation of apparent with the PACE trial suggests that journals and grant review panels within the UK might consider similar steps.

Contemplating the evidence against independent peer review

  • We have a mob of people as authors and mentions in acknowledgments. We have a huge conglomerate of institutions acknowledged.
  • We have some papers with blatant questionable research and reporting practices published in prestigious journals after ostensible peer review.
  • We are left in the dark about what exactly happened in peer review, but that the articles were adequately peer reviewed is a crucial part of their credability.

What are we to conclude?

The_Purloined_LetterI think of what Edgar Allen Poe’s wise character, Le Chevalier C. Auguste Dupin would say. For those of you who don’t know who he is:

Le Chevalier C. Auguste Dupin  is a fictional detective created by Edgar Allan Poe. Dupin made his first appearance in Poe’s “The Murders in the Rue Morgue” (1841), widely considered the first detective fiction story.[1] He reappears in “The Mystery of Marie Rogêt” (1842) and “The Purloined Letter” (1844)…

Poe created the Dupin character before the word detective had been coined. The character laid the groundwork for fictitious detectives to come, including Sherlock Holmes, and established most of the common elements of the detective fiction genre.

I think if we asked Dupin, he would say the danger is that the question is too fascinating to give up, but impossible to resolve without evidence we cannot access. We can blog, we can discuss this important question, but in the end we cannot answer it with certainty.

Sigh.

Busting foes of post-publication peer review of a psychotherapy study

title_vigilante_blu-rayAs described in the last issue of Mind the Brain, peaceful post-publication peer reviewers (PPPRs) were ambushed by an author and an editor. They used the usual home team advantages that journals have – they had the last word in an exchange that was not peer-reviewed.

As also promised, I will team up in this issue with Magneto to bust them.

Attacks on PPPRs threaten a desperately needed effort to clean up the integrity of the published literature.

The attacks are getting more common and sometimes vicious. Vague threats of legal action caused an open access journal to remove an article delivering fair and balanced criticism.

In a later issue of Mind the Brain, I will describe an  incident in which authors of a published paper had uploaded their data set, but then  modified it without notice after PPPRs used the data for re-analyses. The authors then used the modified data for new analyses and then claimed the PPPRs were grossly mistaken. Fortunately, the PPPRs retained time stamped copies of both data sets. You may like to think that such precautions are unnecessary, but just imagine what critics of PPPR would be saying if they had not saved this evidence.

Until journals get more supportive of post publication peer review, we need repeated vigilante actions, striking from Twitter, Facebook pages, and blogs. Unless readers acquire basic critical appraisal skills and take the time to apply them, they will have to keep turning to the social media for credible filters of all the crap that is flooding the scientific literature.

MagnetoYardinI’ve enlisted Magneto because he is a mutant. He does not have any extraordinary powers of critical appraisal. To the contrary, he unflinchingly applies what we should all acquire. As a mutant, he can apply his critical appraisal skills without the mental anguish and physiological damage that could beset humans appreciating just how bad the literature really is. He doesn’t need to maintain his faith in the scientific literature or the dubious assumption that what he is seeing is just a matter of repeat offender authors, editors, and journals making innocent mistakes.

Humans with critical appraisal risk demoralization and too often shirk from the task of telling it like it is. Some who used their skills too often were devastated by what they found and fled academia. More than a few are now working in California in espresso bars and escort services.

Thank you, Magneto. And yes, I again apologize for having tipped off Jim Coan about our analyses of his spinning and statistical manipulations of his work to get newsworthy finding. Sure, it was an accomplishment to get a published apology and correction from him and Susan Johnson. I am so proud of Coan’s subsequent condemnation of me on Facebook as the Deepak Chopra of Skepticism  that I will display it as an endorsement on my webpage. But it was unfortunate that PPPRs had to endure his nonsensical Negative Psychology rant, especially without readers knowing what precipitated it.

shakespeareanThe following commentary on the exchange in Journal of Nervous and Mental Disease makes direct use of your critique. I have interspersed gratuitous insults generated by Literary Genius’ Shakespearean insult generator and Reocities’ Random Insult Generator.

How could I maintain the pretense of scholarly discourse when I am dealing with an author who repeatedly violates basic conventions like ensuring tables and figures correspond to what is claimed in the abstract? Or an arrogant editor who responds so nastily when his slipups are gently brought to his attention and won’t fix the mess he is presenting to his readership?

As a mere human, I needed all the help I could get in keeping my bearings amidst such overwhelming evidence of authorial and editorial ineptness. A little Shakespeare and Monty Python helped.

The statistical editor for this journal is a saucy full-gorged apple-john.

 

Cognitive Behavioral Techniques for Psychosis: A Biostatistician’s Perspective

Domenic V. Cicchetti, PhD, quintessential  biostatistician
Domenic V. Cicchetti, PhD, quintessential biostatistician

Domenic V. Cicchetti, You may be, as your website claims

 A psychological methodologist and research collaborator who has made numerous biostatistical contributions to the development of major clinical instruments in behavioral science and medicine, as well as the application of state-of-the-art techniques for assessing their psychometric properties.

But you must have been out of “the quintessential role of the research biostatistician” when you drafted your editorial. Please reread it. Anyone armed with an undergraduate education in psychology and Google Scholar can readily cut through your ridiculous pomposity, you undisciplined sliver of wild belly-button fluff.

You make it sound like the Internet PPPRs misunderstood Jacob Cohen’s designation of effect sizes as small, medium, and large. But if you read a much-accessed article that one of them wrote, you will find a clear exposition of the problems with these arbitrary distinctions. I know, it is in an open access journal, but what you say is sheer bollocks about it paying reviewers. Do you get paid by Journal of Nervous and Mental Disease? Why otherwise would you be a statistical editor for a journal with such low standards? Surely, someone who has made “numerous biostatistical contributions” has better things to do, thou dissembling swag-bellied pignut.

More importantly, you ignore that Jacob Cohen himself said

The terms ‘small’, ‘medium’, and ‘large’ are relative . . . to each other . . . the definitions are arbitrary . . . these proposed conventions were set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible.

Cohen J. Statistical power analysis for the behavioural sciences. Second edition, 1988. Hillsdale, NJ: Lawrence Earlbaum Associates. p. 532.

Could it be any clearer, Dommie?

Click to enlarge

You suggest that the internet PPPRs were disrespectful of Queen Mother Kraemer in not citing her work. Have you recently read it? Ask her yourself, but she seems quite upset about the practice of using effects generated from feasibility studies to estimate what would be obtained in an adequately powered randomized trial.

Pilot studies cannot estimate the effect size with sufficient accuracy to serve as a basis of decision making as to whether a subsequent study should or should not be funded or as a basis of power computation for that study.

Okay you missed that, but how about:

A pilot study can be used to evaluate the feasibility of recruitment, randomization, retention, assessment procedures, new methods, and implementation of the novel intervention. A pilot study is not a hypothesis testing study. Safety, efficacy and effectiveness are not evaluated in a pilot. Contrary to tradition, a pilot study does not provide a meaningful effect size estimate for planning subsequent studies due to the imprecision inherent in data from small samples. Feasibility results do not necessarily generalize beyond the inclusion and exclusion criteria of the pilot design.

A pilot study is a requisite initial step in exploring a novel intervention or an innovative application of an intervention. Pilot results can inform feasibility and identify modifications needed in the design of a larger, ensuing hypothesis testing study. Investigators should be forthright in stating these objectives of a pilot study.

Dommie, although you never mention it, surely you must appreciate the difference between a within-group effect size and a between-group effect size.

  1. Interventions do not have meaningful effect sizes, between-group comparisons do.
  2. As I have previously pointed out

 When you calculate a conventional between-group effect size, it takes advantage of randomization and controls for background factors, like placebo or nonspecific effects. So, you focus on what change went on in a particular therapy, relative to what occurred in patients who didn’t receive it.

Turkington recruited a small, convenience sample of older patients from community care who averaged over 20 years of treatment. It is likely that they were not getting much support and attention anymore, whether or not they ever were. The intervention that Turkington’s study provided that attention. Maybe some or all of any effects were due to simply compensating for what was missing from from inadequate routines care. So, aside from all the other problems, anything going on in Turkington’s study could have been nonspecific.

Recall that in promoting his ideas that antidepressants are no better than acupuncture for depression, Irving Kirsh tried to pass off within-group as equivalent to between-group effect sizes, despite repeated criticisms. Similarly, long term psychodynamic psychotherapists tried to use effect sizes from wretched case series for comparison with those obtained in well conducted studies of other psychotherapies. Perhaps you should send such folks a call for papers so that they can find an outlet in Journal of Nervous and Mental Disease with you as a Special Editor in your quintessential role as biostatistician.

Douglas Turkington’s call for a debate

Professor Douglas Turkington: "The effect size that got away was this big."
Professor Douglas Turkington: “The effect size that got away was this big.”

Doug, as you requested, I sent you a link to my Google Scholar list of publications. But you still did not respond to my offer to come to Newcastle and debate you. Maybe you were not impressed. Nor did you respond to Keith Law’s repeated request to debate. Yet you insulted internet PPPR Tim Smits with the taunt,

Click to Enlarge

 

You congealed accumulation of fresh cooking fat.

I recommend that you review the recording of the Maudsley debate. Note how the moderator Sir Robin Murray boldly announced at the beginning that the vote on the debate was rigged by your cronies.

Do you really think Laws and McKenna got their asses whipped? Then why didn’t you accept Laws’ offer to debate you at a British Psychological Society event, after he offered to pay your travel expenses?

High-Yield Cognitive Behavioral Techniques for Psychosis Delivered by Case Managers…

Dougie, we were alerted that bollacks would follow with the “high yield” of the title. Just what distinguishes this CBT approach from any other intervention to justify “high yield” except your marketing effort? Certainly, not the results you have obtained from an earlier trial, which we will get to.

Where do I begin? Can you dispute what I said to Dommie about the folly of estimating effect sizes for an adequately powered randomized trial from a pathetically small feasibility study?

I know you were looking for a convenience sample, but how did you get from Newcastle, England to rural Ohio and recruit such an unrepresentative sample of 40 year olds with 20 years of experience with mental health services? You don’t tell us much about them, not even a breakdown of their diagnoses. But would you really expect that the routine care they were currently receiving was even adequate? Sure, why wouldn’t you expect to improve upon that with your nurses? But would you be demonstrating?

insult 1

 

The PPPR boys from the internet made noise about Table 2 and passing reference to the totally nude Figure 5 and how claims in the abstract had no apparent relationship to what was presented in the results section. And how nowhere did you provide means or standard deviations. But they did not get to Figure 2 Notice anything strange?

figure 2Despite what you claim in the abstract, none of the outcomes appear significant. Did you really mean standard error of measurement (SEMs), not standard deviations (SDs)? People did not think so to whom I showed the figure.

mike miller

 

And I found this advice on the internet:

If you want to create persuasive propaganda:

If your goal is to emphasize small and unimportant differences in your data, show your error bars as SEM,  and hope that your readers think they are SD.

If our goal is to cover-up large differences, show the error bars as the standard deviations for the groups, and hope that your readers think they are a standard errors.

Why did you expect to be able to talk about effect sizes of the kind you claim you were seeking? The best meta analysis suggests an effect size of only .17 with blind assessment of outcome. Did you expect that unblinding assessors would lead to that much more improvement? Oh yeh, you cited your own previous work in support:

That intervention improved overall symptoms, insight, and depression and had a significant benefit on negative symptoms at follow-up (Turkington et al., 2006).

Let’s look at Table 1 from Turkington et al., 2006.

A consistent spinning of results

Table 1 2006

Don’t you just love those three digit significance levels that allow us to see that p =.099 for overall symptoms meets the apparent criteria of p < .10 in this large sample? Clever, but it doesn’t work for depression with p = .128. But you have a track record of being sloppy with tables. Maybe we should give you the benefit of a doubt and ignore the table.

But Dougie, this is not some social priming experiment with college students getting course credit. This is a study that took up the time of patients with serious mental disorder. You left some of them in the squalor of inadequate routine care after gaining their consent with the prospect that they might get more attention from nurses. And then with great carelessness, you put the data into tables that had no relationship to the claims you were making in the abstract. Or in your attempts to get more funding for future such ineptitude. If you drove your car like you write up clinical trials, you’d lose your license, if not go to jail.

insult babbling

 

 

The 2014 Lancet study of cognitive therapy for patients with psychosis

Forgive me that I missed until Magneto reminded me that you were an author on the, ah, controversial paper

Morrison, A. P., Turkington, D., Pyle, M., Spencer, H., Brabban, A., Dunn, G., … & Hutton, P. (2014). Cognitive therapy for people with schizophrenia spectrum disorders not taking antipsychotic drugs: a single-blind randomised controlled trial. The Lancet, 383(9926), 1395-1403.

But with more authors than patients remaining in the intervention group at follow up, it is easy to lose track.

You and your co-authors made some wildly inaccurate claims about having shown that cognitive therapy was as effective as antipsychotics. Why, by the end of the trial, most of the patients remaining in follow up were on antipsychotic medication. Is that how you obtained your effectiveness?

In our exchange of letters in The Lancet, you finally had to admit

We claimed the trial showed that cognitive therapy was safe and acceptable, not safe and effective.

Maybe you should similarly be retreating from your claims in the Journal of Nervous and Mental Disease article? Or just take refuge in the figures and tables being uninterpretable.

No wonder you don’t want to debate Keith Laws or me.

insult 3

 

 

A retraction for High-Yield Cognitive Behavioral Techniques for Psychosis…?

The Turkington article meets the Committee on Publication Ethics (COPE) guidelines for an immediate retraction (http://publicationethics.org/files/retraction%20guidelines.pdf).

But neither a retraction nor even a formal expression of concern has appeared.

Toilet-outoforderMaybe matters can be left as they now are. In the social media, we can point to the many problems of the article like a clogged toilet warning that Journal of Nervous and Mental Disease is not a fit place to publish – unless you are seeking exceeding inept or nonexistent editing and peer review.

 

 

 

Vigilantes can periodically tweet Tripadvisor style warnings, like

toilets still not working

 

 

Now, Dommie and Dougie, before you again set upon some PPPRs just trying to do their jobs for little respect or incentive, consider what happened this time.

Special thanks are due for Magneto, but Jim Coyne has sole responsibility for the final content. It  does not necessarily represent the views of PLOS blogs or other individuals or entities, human or mutant.

Sordid tale of a study of cognitive behavioral therapy for schizophrenia gone bad

What motivates someone to publish that paper without checking it? Laziness? Naivety? Greed? Now that’s one to ponder. – Neuroskeptic, Science needs vigilantes.

feared_and_hated_by_a_world_they_have_sworn_to_pro_by_itomibhaa-d4kx9bd.pngWe need to

  • Make the world safe for post-publication peer review (PPR) commentary.
  • Ensure appropriate rewards for those who do it.
  • Take action against those who try to make life unpleasant for those who are toil hard for a scientific literature that is more trustworthy.

In this issue of Mind the Brain, I set the stage for my teaming up with Magneto to bring some bullies to justice.

The background tale of a modest study of cognitive behavior therapy (CBT) for patients with schizophrenia has been told in bits and pieces elsewhere.

The story at first looked like it was heading for a positive outcome more worthy of a blog post than the shortcomings of a study in an obscure journal. The tale would go

A group organized on the internet called attention to serious flaws in the reporting of a study. We then witnessed the self-correcting of science in action.

If only this story was complete and accurately described scientific publishing today

Daniel Lakens’ blog post, How a Twitter HIBAR [Had I Been A Reviewer] ends up as a published letter to the editor recounts the story beginning with expressions of puzzlement and skepticism on Twitter.

Gross errors were made in a table and a figure. These were bad enough in themselves, but seemed to point to reported results not seem supporting the claims made in the article.

A Swedish lecturer blogged Through the looking glass into an oddly analyzed clinical paper .

Some of those involved in the Twitter exchange banded together in writing a letter to the editor.

Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2014). Statistical errors and omissions in a trial of cognitive behavior techniques for psychosis: commentary on Turkington et al. The Journal of Nervous and Mental Disease, 202(7), 566.

Lakens explained in his blog

Now I understand that getting criticism on your work is never fun. In my personal experience, it very often takes a dinner conversation with my wife before I’m convinced that if people took the effort to criticize my work, there must be something that can be improved. What I like about this commentary is that is shows how Twitter is making post-publication reviews possible. It’s easy to get in contact with other researchers to discuss any concerns you might have (as Keith did in his first Tweet). Note that I have never met any of my co-authors in real life, demonstrating how Twitter can greatly extend your network and allows you to meet interesting and smart people who share your interests. Twitter provides a first test bed for your criticisms to see if they hold up (or if the problem lies in your own interpretation), and if a criticism is widely shared, can make it fun to actually take the effort to do something about a paper that contains errors.

Furthermore,

It might be slightly weird that Tim, Stuart, and myself publish a comment in the Journal of Nervous and Mental Disease, a journal I guess none of us has ever read before. It also shows how Twitter extends the boundaries between scientific disciplines. This can bring new insights about reporting standards  from one discipline to the next. Perhaps our comment has made researchers, reviewers, and editors who do research on cognitive behavioral therapy aware of the need to make sure they raise the bar on how they report statistics (if only so pesky researchers on Twitter leave you alone!). I think this would be great, and I can’t wait until researchers from another discipline point out statistical errors in my own articles that I and my closer peers did not recognize, because anything that improves the way we do science (such as Twitter!) is a good thing.

Hindsight: If the internet group had been the original reviewers of the article…

The letter was low key and calmly pointed out obvious errors. You can see it here. Tim Smit’s blog Don’t get all psychotic on this paper: Had I (or we) Been A Reviewer (HIBAR) describes what had to be left out to keep within the word limit.

the actual table originalTable 2 had lots of problems –

  • The confidence intervals were suspiciously wide.
  • The effect sizes seemed too large for what the modest sample size should yield.
  • The table was inconsistent with information in the abstract.
  • Neither they table nor the accompanying text had any test of significance nor reporting of means and standard deviations.
  • Confidence intervals for two different outcomes were identical, yet one had the same value for its effect size as its lower bound.

Figure 5 Click to Enlarge

Figure 5 was missing labels and definitions on both axes, rendering it uninterpretable. Duh?

The authors of the letter were behaving like a blue helmeted international peacekeeping force, not warriors attacking bad science.

peacekeepersBut you don’t send peacekeeping troops into an active war zone.

In making recommendations, the Internet group did politely introduce the R word:

We believe the above concerns mandate either an extensive correction, or perhaps a retraction, of the article by Turkington et al. (2014). At the very least, the authors should reanalyze their data and report the findings in a transparent and accurate manner.

Fair enough, but I doubt the authors of the letter appreciated how upsetting this reasonable advice was or anticipated what reaction would be coming.

A response from an author of the article and a late night challenge to debate

The first author of the article published a reply

Turkington, D. (2014). The reporting of confidence intervals in exploratory clinical trials and professional insecurity: a response to Ritchie et al. The Journal of Nervous and Mental Disease, 202(7), 567.

He seemed to claim to re-examine the study data and

  • The findings were accurately reported.
  • A table of means and standard deviations was unnecessary because of the comprehensive reporting of confidence intervals and p-values in the article.
  • The missing details from the figure were self-evident.

The group who had assembled on the internet was not satisfied. An email exchange with Turkington and the editor of the journal confirmed that Turkington had not actually re-examined the raw file data, but only a summary with statistical tables.

The group requested the raw data. In a subsequent letter to the editor, they would describe Turkington as timely the providing the data, but the exchange between them was anything but cordial. Turkington at first balked, saying that the data were not readily available because the statistician had retired. He nonetheless eventually provided the data, but not before first sending off a snotty email –

Click to Enlarge
Click to Enlarge

Tim Smit declined:

Dear Douglas,

Thanks for providing the available data as quick as possible. Based on this and the tables in the article, we will try to reconstruct the analysis and evaluate our concerns with it.

With regard to your recent invitation to “slaughter” me at Newcastle University, I politely want to decline that invitation. I did not have any personal issue in mind when initiating the comment on your article, so a personal attack is the least of my priorities. It is just from a scientific perspective (but an outsider to the research topic) that I was very confused/astonished about the lack of reporting precision and what appears to be statistical errors. So, if our re-analysis confirms that first perception, then I am of course willing to accept your invitation at Newcastle university to elaborate on proper methodology in intervention studies, since science ranks among the highest of my priorities.

Best regards,

Tim Smits

When I later learned of this email exchange, I wrote to Turkington and offered to go to Newcastle to debate either as Tim Smits’ second or to come alone. Turkington asked me to submit my CV to show that I wasn’t a crank. I complied, but he has yet to accept my offer.

A reanalysis of the data and a new table

Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2015). Correcting Errors in Turkington et al.(2014): Taking Criticism Seriously. The Journal of nervous and mental disease, 203(4), 302-303.

The group reanalyzed the data and the title of their report leaked some frustration.

We confirmed that all the errors identified by Smits et al. (2014) were indeed errors. In addition, we observed that the reported effect sizes in Turkington et al. (2014) were incorrect by a considerable margin. To correct these errors, Table 2 and all the figures in Turkington et al. (2014) need to be changed.

The sentence in the Abstract where effect sizes are specified needs to be rewritten.

A revised table based on their reanalyses was included:

new tableGiven the recommendation of their first letter was apparently dismissed –

To conclude, our recommendation for the Journal and the authors would now be to acknowledge that there are clear errors in the original Turkington et al. (2014) article and either accept our corrections or publish their own corrigendum. Moreover, we urge authors, editors, and reviewers to be rigorous in their research and reviewing, while at the same time being eager to reflect on and scrutinize their own research when colleagues point out potential errors. It is clear that the authors and editors should have taken more care when checking the validity of our criticisms. The fact that a rejoinder with the title “A Response to Ritchie et al. [sic]” was accepted for publication in reply to a letter by Smits et al. (2014) gives the impression that our commentary did not receive the attention it deserved. If we want science to be self-correcting, it is important that we follow ethical guidelines when substantial errors in the published literature are identified.

Sound and fury signifying nothing

Publication of their letter was accompanied by a blustery commentary from the statistical editor for the journal full of innuendo and pomposity.

quote-a-harmless-hilarity-and-a-buoyant-cheerfulness-are-not-infrequent-concomitants-of-genius-and-we-charles-caleb-colton-294969

Cicchetti, D. V. (2015). Cognitive Behavioral Techniques for Psychosis: A Biostatistician’s Perspective. The Journal of Nervous and Mental Disease, 203(4), 304-305.

He suggested that the team assembled on the internet

reanalyzed the data of Turkington et al. on the basis that it contained some serious errors that needed to be corrected. They also reported that the statistic that Turkington et al. had used to assess effect sizes (ESs) was an inappropriate metric.

Well, did Turkington’s table contain errors and was the metric inappropriate? If so, was a formal correction or even retraction needed? Cicchetti reproduced the internet groups’ table, but did not immediately offer his opinion. So, the uncorrected article stands as published. Interested persons downloading it from behind the journal’s paywall won’t be alerted to the controversy.

hello potInstead of dealing with the issues at hand, Cicchetti launched into an irrelevant lecture about Jacob Cohen’s arbitrary designation of effect sizes as small, medium, or large. Anything he said had already appeared clearer and more accurately in an article by Daniel Laken, one of the internet group authors. Cicchetti cited that article, but only as a basis for libeling the open access journal in which it appeared.

To be perfectly candid, the reader needs to be informed that the journal that published the Lakens (2013) article, Frontiers in Psychology, is one of an increasing number of journals that charge exorbitant publication fees in exchange for free open access to published articles. Some of the author costs are used to pay reviewers, causing one to question whether the process is always unbiased, as is the desideratum. For further information, the reader is referred to the following Web site: http://www.frontiersin.org/Psychology/fees.

love pomposityCicchetti further chastised the internet group for disrespecting the saints of power analysis.

As an additional comment, the stellar contributions of Helena Kraemer and Sue Thiemann (1987) were noticeable by their very absence in the Smits et al. critique. The authors, although genuinely acknowledging the lasting contributions of Jacob Cohen to our understanding of ES and power analysis, sought to simplify the entire enterprise

Jacob Cohen is dead and cannot speak. But good Queen Mother Helena is very much alive and would surely object to being drawn into this nonsense. I encourage Cicchetti to ask what she thinks.

Ah, but what about the table based on the re-analyses of the internet group that Cicchetti had reproduced?

The reader should also be advised that this comment rests upon the assumption that the revised data analyses are indeed accurate because I was not privy to the original data.

Actually, when Turkington sent the internet group the study data, he included Cicchetti in the email.

The internet group experienced one more indignity from the journal that they had politely tried to correct. They had reproduced Turkington’s original table in their letter. The journal sent them an invoice for 106 euros because the table was copyrighted. It took a long email exchange before this billing was rescinded.

Science Needs Vigilantes

Imagine a world where we no longer depend on a few cronies of an editor to decide once and forever the value of a paper. This would replace the present order in which much of the scientific literature is untrustworthy, where novelty and sheer outrageousness of claims are valued over robustness.

Imagine we have constructed a world where post publication commentary is welcomed and valued. Data are freely available for reanalysis and the rewards are there for performing those re-analyses.

We clearly are not there yet and certainly not with this flawed article. The sequence of events that I have described has so far not produced a correction of a paper. As it stands, the paper concludes that nurses can and should be given a brief training that will allow them to effectively treat patients with severe and chronic mental disorder. This paper encourages actions that may put such patients and society at risk because of ineffectual and neglectful treatment.

The authors of the original paper and the editor responded with dismissal of the criticisms, ridicule, and, the editor at least, libeling open access journals. Obviously, we have not reached the point at which those willing to re-examine and if necessary, re-analyze data, are appropriately respected and protected from unfair criticism. The current system of publishing gives authors who have been questions and editors who are defensive of their work, no matter how incompetent and inept it may be, the last word. But there is always the force of social media- tweets and blogs.

The critics were actually much too kind and restrained in a critique narrowly based on re-analyses. They ignored so much about

  • The target paper as an underpowered feasibility study being passed off a source of estimates of what a sufficiently sized randomized trial would yield.
  • The continuity between the mischief done in this article with tricks and spin in the past work of the author Turkington.
  • The laughably inaccurate lecture of the editor.
  • The lowlife journal in which the article was published.

These problems deserve a more unrestrained and thorough trashing. Journals may not yet be self-correcting, but blogs can do a reasonable job of exposing bad science.

Science needs vigilantes, because of the intransigence of those pumping crap into the literature.

Coming up next

In my next issue of Mind the Brain I’m going to team up with Magneto. You may recall I previously collaborated with him and Neurocritic to scrutinize some junk science that Jim Coan and Susan Johnson had published in PLOS One. Their article crassly promoted to clinicians what they claimed was a brain-soothing couples therapy. We obtained an apology and a correction in the journal for undeclared conflict of interest.

Magneto_430But that incident left Magneto upset with me. He felt I did not give sufficient attention to the continuity between how Coan had slipped post hoc statistical manipulations in the PLOS article to get positive results and what he had done in a past paper with Richard Davison. Worse, I had tipped off Jim Coan about our checking his work. Coan launched a pre-emptive tirade against post-publication scrutiny, his now infamous Negative Psychology rant  He focused his rage on Neuroskeptic, not Neurocritic or me, but the timing was not a coincidence. He then followed up by denouncing me on Facebook as the Chopra Deepak of skepticism.

I still have not unpacked that oxymoronic statement and decided if it was a compliment.

OK, Magneto, I will be less naïve and more thorough this round. I will pass on whatever you uncover.

Check back if you just want to augment your critical appraisal skills with some unconventional ones or if you just enjoy a spectacle. If you want to arrive at your own opinions ahead of time, email Douglas Turkington douglas.turkington@ntw.nhs.uk and for a PDF of his paywalled article. Tell him I said hello. The offer of a debate still stands.