When psychotherapy trials have multiple flaws…

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

mind the brain logo

Multiple flaws pose more threats to the validity of psychotherapy studies than would be inferred when the individual flaws are considered independently.

We can learn to spot features of psychotherapy trials that are likely to lead to exaggerated claims of efficacy for treatments or claims that will not generalize beyond the sample that is being studied in a particular clinical trial. We can look to the adequacy of sample size, and spot what Cochrane collaboration has defined as risk of bias in their handy assessment tool.

We can look at the case-mix in the particular sites where patients were recruited.  We can examine the adequacy of diagnostic criteria that were used for entering patients to a trial. We can examine how blinded the trial was in terms of whoever assigned patients to particular conditions, but also what the patients, the treatment providers, and their evaluaters knew which condition to which particular patients were assigned.

And so on. But what about combinations of these factors?

We typically do not pay enough attention multiple flaws in the same trial. I include myself among the guilty. We may suspect that flaws are seldom simply additive in their effect, but we don’t consider whether they may be even synergism in the negative effects on the validity of a trial. As we will see in this analysis of a clinical trial, multiple flaws can provide more threats to the validity trial than what we might infer when the individual flaws are considered independently.

The particular paper we are probing is described in its discussion section as the “largest RCT to date testing the efficacy of group CBT for patients with CFS.” It also takes on added importance because two of the authors, Gijs Bleijenberg and Hans Knoop, are considered leading experts in the Netherlands. The treatment protocol was developed over time by the Dutch Expert Centre for Chronic Fatigue (NKCV, http://www.nkcv.nl; Knoop and Bleijenberg, 2010). Moreover, these senior authors dismiss any criticism and even ridicule critics. This study is cited as support for their overall assessment of their own work.  Gijs Bleijenberg claims:

Cognitive behavioural therapy is still an effective treatment, even the preferential treatment for chronic fatigue syndrome.

But

Not everybody endorses these conclusions, however their objections are mostly baseless.

Spoiler alert

This is a long read blog post. I will offer a summary for those who don’t want to read through it, but who still want the gist of what I will be saying. However, as always, I encourage readers to be skeptical of what I say and to look to my evidence and arguments and decide for themselves.

Authors of this trial stacked the deck to demonstrate that their treatment is effective. They are striving to support the extraordinary claim that group cognitive behavior therapy fosters not only better adaptation, but actually recovery from what is internationally considered a physical condition.

There are some obvious features of the study that contribute to the likelihood of a positive effect, but these features need to be considered collectively, in combination, to appreciate the strength of this effort to guarantee positive results.

This study represents the perfect storm of design features that operate synergistically:

perfect storm

 Referral bias – Trial conducted in a single specialized treatment setting known for advocating psychological factors maintaining physical illness.

Strong self-selection bias of a minority of patients enrolling in the trial seeking a treatment they otherwise cannot get.

Broad, overinclusive diagnostic criteria for entry into the trial.

Active treatment condition carry strong message how patients should respond to outcome assessment with improvement.

An unblinded trial with a waitlist control lacking the nonspecific elements (placebo) that confound the active treatment.

Subjective self-report outcomes.

Specifying a clinically significant improvement that required only that a primary outcome be less than needed for entry into the trial

Deliberate exclusion of relevant objective outcomes.

Avoidance of any recording of negative effects.

Despite the prestige attached to this trial in Europe, the US Agency for Healthcare Research and Quality (AHRQ) excludes this trial from providing evidence for its database of treatments for chronic fatigue syndrome/myalgic encephalomyelitis. We will see why in this post.

factsThe take away message: Although not many psychotherapy trials incorporate all of these factors, most trials have some. We should be more sensitive to when multiple factors occur in the same trial, like bias in the site for patient recruitment; lacking of blinding; lack of balance between active treatment and control condition in terms of nonspecific factors, and subjective self-report measures.

The article reporting the trial is

Wiborg JF, van Bussel J, van Dijk A, Bleijenberg G, Knoop H. Randomised controlled trial of cognitive behaviour therapy delivered in groups of patients with chronic fatigue syndrome. Psychotherapy and Psychosomatics. 2015;84(6):368-76.

Unfortunately, the article is currently behind a pay wall. Perhaps readers could contact the corresponding author Hans.knoop@radboudumc.nl  and request a PDF.

The abstract

Background: Meta-analyses have been inconclusive about the efficacy of cognitive behaviour therapies (CBTs) delivered in groups of patients with chronic fatigue syndrome (CFS) due to a lack of adequate studies. Methods: We conducted a pragmatic randomised controlled trial with 204 adult CFS patients from our routine clinical practice who were willing to receive group therapy. Patients were equally allocated to therapy groups of 8 patients and 2 therapists, 4 patients and 1 therapist or a waiting list control condition. Primary analysis was based on the intention-to-treat principle and compared the intervention group (n = 136) with the waiting list condition (n = 68). The study was open label. Results: Thirty-four (17%) patients were lost to follow-up during the course of the trial. Missing data were imputed using mean proportions of improvement based on the outcome scores of similar patients with a second assessment. Large and significant improvement in favour of the intervention group was found on fatigue severity (effect size = 1.1) and overall impairment (effect size = 0.9) at the second assessment. Physical functioning and psychological distress improved moderately (effect size = 0.5). Treatment effects remained significant in sensitivity and per-protocol analyses. Subgroup analysis revealed that the effects of the intervention also remained significant when both group sizes (i.e. 4 and 8 patients) were compared separately with the waiting list condition. Conclusions: CBT can be effectively delivered in groups of CFS patients. Group size does not seem to affect the general efficacy of the intervention which is of importance for settings in which large treatment groups are not feasible due to limited referral

The trial registration

http://www.isrctn.com/ISRCTN15823716

Who was enrolled into the trial?

Who gets into a psychotherapy trial is a function of the particular treatment setting of the study, the diagnostic criteria for entry, and patient preferences for getting their care through a trial, rather than what is being routinely provided in that setting.

 We need to pay particular attention to when patients enter psychotherapy trials hoping they will receive a treatment they prefer and not to be assigned to the other condition. Patients may be in a clinical trial for the betterment of science, but in some settings, they are willing to enroll because of a probability of getting treatment they otherwise could not get. This in turn also affects the evaluation of both the condition in which they get the preferred treatment, but also their evaluation of the condition in which they are denied it. Simply put, they register being pleased with what they wanted or not being pleased if they did not get what they wanted.

The setting is relevant to evaluating who was enrolled in a trial.

The authors’ own outpatient clinic at the Radboud University Medical Center was the site of the study. The group has an international reputation for promoting the biopsychosocial model, in which psychological factors are assumed to be the decisive factor in maintaining somatic complaints.

All patients were referred to our outpatient clinic for the management of chronic fatigue.

There is thus a clear referral bias  or case-mix bias but we are not provided a ready basis for quantifying it or even estimating its effects.

The diagnostic criteria.

The article states:

In accordance with the US Center for Disease Control [9], CFS was defined as severe and unexplained fatigue which lasts for at least 6 months and which is accompanied by substantial impairment in functioning and 4 or more additional complaints such as pain or concentration problems.

Actually, the US Center for Disease Control would now reject this trial because these entry criteria are considered obsolete, overinclusive, and not sufficiently exclusive of other conditions that might be associated with chronic fatigue.*

There is a real paradigm shift happening in America. Both the 2015 IOM Report and the Centers for Disease Control and Prevention (CDC) website emphasize Post Exertional Malaise and getting more ill after any effort with M.E. CBT is no longer recommended by the CDC as treatment.

cdc criteriaThe only mandatory symptom for inclusion in this study is fatigue lasting 6 months. Most properly, this trial targets chronic fatigue [period] and not the condition, chronic fatigue syndrome.

Current US CDC recommendations  (See box  7-1 from the IoM document, above) for diagnosis require postexertional malaise for a diagnosis of myalgic encephalomyelitis (ME). See below.

pemPatients meeting the current American criteria for ME would be eligible for enrollment in this trial, but it’s unclear what proportion of the patients enrolled actually met the American criteria. Because of the over-inclusiveness of the entry diagnostic criteria, it is doubtful whether the results would generalize to American sample. A look at patient flow into the study will be informative.

Patient flow

Let’s look at what is said in the text, but also in the chart depicting patient flow into the trial for any self-selection that might be revealed.

In total, 485 adult patients were diagnosed with CFS during the inclusion period at our clinic (fig. 1). One hundred and fifty-seven patients were excluded from the trial because they declined treatment at our clinic, were already asked to participate in research incompatible with inclusion (e.g. research focusing on individual CBT for CFS) or had a clinical reason for exclusion (i.e. they received specifically tailored interventions because they were already unsuccessfully treated with individual CBT for CFS outside our clinic or were between 18 and 21 years of age and the family had to be involved in the therapy). Of the 328 patients who were asked to engage in group therapy, 99 (30%) patients indicated that they were unwilling to receive group therapy. In 25 patients, the reason for refusal was not recorded. Two hundred and four patients were randomly allocated to one of the three trial conditions. Baseline characteristics of the study sample are presented in table 1. In total, 34 (17%) patients were lost to follow-up. Of the remaining 170 patients, 1 patient had incomplete primary outcome data and 6 patients had incomplete secondary outcome data.

flow chart

We see that the investigators invited two thirds of patients attending the clinic to enroll in the trial. Of these, 41% refused. We don’t know the reason for some of the refusals, but almost a third of the patients approached declined because they did not want group therapy. The authors left being able to randomize 42% of patients coming to the clinic or less than two thirds of patients they actually asked. Of these patients, a little more than two thirds received the treatment to which were randomized and were available for follow-up.

These patients receiving treatment to which they were randomized and who were available for follow-up are self-selected minority of the patients coming to the clinic. This self-selection process likely reduced the proportion of patients with myalgic encephalomyelitis. It is estimated that 25% of patients meeting the American criteria a housebound and 75% are unable to work. It’s reasonably to infer that patients being the full criteria would opt out of a treatment that require regular attendance of a group session.

The trial is biased to ambulatory patients with fatigue and not ME. Their fatigue is likely due to some combinations of factors such as multiple co-morbidities, as-yet-undiagnosed medical conditions, drug interactions, and the common mild and subsyndromal  anxiety and depressive symptoms that characterize primary care populations.

The treatment being evaluated

Group cognitive behavior therapy for chronic fatigue syndrome, either delivered in a small (4 patients and 1 therapist) or larger (8 patients and 2 therapists) group format.

The intervention consisted of 14 group sessions of 2 h within a period of 6 months followed by a second assessment. Before the intervention started, patients were introduced to their group therapist in an individual session. The intervention was based on previous work of our research group [4,13] and included personal goal setting, fixing sleep-wake cycles, reducing the focus on bodily symptoms, a systematic challenge of fatigue-related beliefs, regulation and gradual increase in activities, and accomplishment of personal goals. A formal exercise programme was not part of the intervention.

Patients received a workbook with the content of the therapy. During sessions, patients were explicitly invited to give feedback about fatigue-related cognitions and behaviours to fellow patients. This aspect was introduced to facilitate a pro-active attitude and to avoid misperceptions of the sessions as support group meetings which have been shown to be insufficient for the treatment of CFS.

And note:

In contrast to our previous work [4], we communicated recovery in terms of fatigue and disabilities as general goal of the intervention.

Some impressions of the intensity of this treatment. This is a rather intensive treatment with patients having considerable opportunities for interactions with providers. This factor alone distinguishes being assigned to the intervention group versus being left in the wait list control group and could prove powerful. It will be difficult to distinguish intensity of contact from any content or active ingredients of the therapy.

I’ll leave for another time a fuller discussion of the extent to which what was labeled as cognitive behavior therapy in this study is consistent with cognitive therapy as practiced by Aaron Beck and other leaders of the field. However, a few comments are warranted. What is offered in this trial does not sound like cognitive therapy as Americans practice it. What is often in this trial seems emphasize challenging beliefs, pushing patients to get more active, along with psychoeducational activities. I don’t see indications of the supportive, collaborative relationship in which patients are encouraged to work on what they want to work on, engage in outside activities (homework assignments) and get feedback.

What is missing in this treatment is what Beck calls collaborative empiricism, “a systemic process of therapist and patient working together to establish common goals in treatment, has been found to be one of the primary change agents in cognitive-behavioral therapy (CBT).”

Importantly, in Beck’s approach, the therapist does not assume cognitive distortions on the part of the patient. Rather, in collaboration with the patient, the therapist introduces alternatives to the interpretations that the patient has been making and encourages the patient to consider the difference. In contrast, rather than eliciting goal statements from patients, therapist in this study imposes the goal of increased activity. Therapists in this study also seem ready to impose their views that the patients’ fatigue-related beliefs are maladaptive.

The treatment offered in this trial is complex, with multiple components making multiple assumptions that seem quite different from what is called cognitive therapy or cognitive behavioral therapy in the US.

The authors’ communication of recovery from fatigue and disability seems a radical departure not only from cognitive behavior therapy for anxiety and depression and pain, but for cognitive behavior therapy offered for adaptation to acute and chronic physical illnesses. We will return to this “communication” later.

The control group

Patients not randomized to group CBT were placed on a waiting list.

Think about it! What do patients think about having gotten involved in all the inconvenience and burden of a clinical trial in hope that they would get treatment and then being assigned to the control group with just waiting? Not only are they going to be disappointed and register that in their subjective evaluations of the outcome assessments patients may worry about jeopardizing the right to the treatment they are waiting for if they overly endorse positive outcomes. There is a potential for  nocebo effect , compounding the placebo effect of assignment to the CBT active treatment groups.

What are informative comparisons between active treatments and  control conditions?

We need to ask more often what inclusion of a control group accomplishes for the evaluation of a psychotherapy. In doing so, we need to keep in mind that psychotherapies do not have effect sizes, only comparisons of psychotherapies and control condition have effect sizes.

A pre-post evaluation of psychotherapy from baseline to follow-up includes the effects of any active ingredient in the psychotherapy, a host of nonspecific (placebo) factors, and any changes that would’ve occurred in the absence of the intervention. These include regression to the mean– patients are more likely to enter a clinical trial now, rather than later or previously, if there has been exacerbation of their symptoms.

So, a proper comparison/control condition includes everything that the patients randomized to the intervention group get except for the active treatment. Ideally, the intervention and the comparison/control group are equivalent on all these factors, except the active ingredient of the intervention.

That is clearly not what is happening in this trial. Patients randomized to the intervention group get the intervention, the added intensity and frequency of contact with professionals that the intervention provides, and all the support that goes with it; and the positive expectations that come with getting a therapy that they wanted.

Attempts to evaluate the group CBT versus the wait-list control group involved confounding the active ingredients of the CBT and all these nonspecific effects. The deck is clearly being stacked in favor of CBT.

This may be a randomized trial, but properly speaking, this is not a randomized controlled trial, because the comparison group does not control for nonspecific factors, which are imbalanced.

The unblinded nature of the trial

In RCTs of psychotropic drugs, the ideal is to compare the psychotropic drug to an inert pill placebo with providers, patients, and evaluate being blinded as to whether the patients received psychotropic drug or the comparison pill.

While it is difficult to achieve a comparable level of blindness and a psychotherapy trial, more of an effort to achieve blindness is desirable. For instance, in this trial, the authors took pains to distinguish the CBT from what would’ve happened in a support group. A much more adequate comparison would therefore be CBT versus either a professional or peer-led support group with equivalent amounts of contact time. Further blinding would be possible if patients were told only two forms of group therapy were being compared. If that was the information available to patients contemplating consenting to the trial, it wouldn’t have been so obvious from the outset to the patients being randomly assigned that one group was preferable to the other.

Subjective self-report outcomes.

The primary outcomes for the trial were the fatigue subscale of the Checklist Individual Strength;  the physical functioning subscale of the Short Health Survey 36 (SF-36); and overall impairment as measured by the Sickness Impact Profile (SIP).

Realistically, self-report outcomes are often all that is available in many psychotherapy trials. Commonly these are self-report assessments of anxiety and depressive symptoms, although these may be supplemented by interviewer-based assessments. We don’t have objective biomarkers with which to evaluate psychotherapy.

These three self-report measures are relatively nonspecific, particularly in a population that is not characterized by ME. Self-reported fatigue in a primary care population lacks discriminative validity with respect to pain, anxiety and depressive symptoms, and general demoralization.  The measures are susceptible to receipt of support and re-moralization, as well as gratitude for obtaining a treatment that was sought.

Self-report entry criteria include a score 35 or higher on the fatigue severity subscale. Yet, a score of less than 35 on this scale at follow up is part of what is defined as a clinically significant improvement with a composite score from combined self-report measures.

We know from medical trials that differences can be observed with subjective self-report measures that will not be found with objective measures. Thus, mildly asthmatic patients will fail to distinguish in their subjective self-reports between [  between the effective inhalant albuterol, an inert inhalant, and sham acupuncture, but will rate improvement better than getting no intervention.  However,  there will be a strong advantage over the other three conditions with an objective measure, maximum forced expiratory volume in 1 second (FEV1) as assessed  with spirometry.

The suppression of objective outcome measures

We cannot let these the authors of this trial off the hook in their dependence on subjective self-report outcomes. They are instructing patients that recovery is the goal, which implies that it is an attainable goal. We can reasonably be skeptical about acclaim of recovery based on changes in self-report measures. Were the patients actually able to exercise? What was their exercise capacity, as objectively measured? Did they return to work?

These authors have included such objective measurements in past studies, but not included them as primary outcomes, nor, even in some cases, reported them in the main paper reporting the trial.

Wiborg JF, Knoop H, Stulemeijer M, Prins JB, Bleijenberg G. How does cognitive behaviour therapy reduce fatigue in patients with chronic fatigue syndrome? The role of physical activity. Psychol Med. 2010 Jan 5:1

The senior authors’ review fails to mention their three studies using actigraphy that did not find effects for CBT. I am unaware of any studies that did find enduring effects.

Perhaps this is what they mean when they say the protocol has been developed over time – they removed what they found to be threats to the findings that they wanted to claim.

Dismissing of any need to consider negative effects of treatment

Most psychotherapy fail to assess any adverse effects of treatment, but this is usually done discretely, without mention. In contrast, this article states

Potential harms of the intervention were not assessed. Previous research has shown that cognitive behavioural interventions for CFS are safe and unlikely to produce detrimental effects.

Patients who meet stringent criteria for ME would be put at risk for pressure to exert themselves. By definition they are vulnerable to postexertional malaise (PEM). Any trail of this nature needs to assess that risk. Maybe no adverse effects would be found. If that were so, it would strongly indicate the absence of patients with appropriate diagnoses.

Timing of assessment of outcomes varied between intervention and control group.

I at first did not believe what I was reading when I encountered this statement in the results section.

The mean time between baseline and second assessment was 6.2 months (SD = 0.9) in the control condition and 12.0 months (SD = 2.4) in the intervention group. This difference in assessment duration was significant (p < 0.001) and was mainly due to the fact that the start of the therapy groups had to be frequently postponed because of an irregular patient flow and limited treatment capacities for group therapy at our clinic. In accordance with the treatment manual, the second assessment was postponed until the fourteenth group session was accomplished. The mean time between the last group session and the second assessment was 3.3 weeks (SD = 3.5).

So, outcomes were assessed for the intervention group shortly after completion of therapy, when nonspecific (placebo) effects would be stronger, but a mean of six months later than for patients assigned to the control condition.

Post-hoc statistical controls are not sufficient to rescue the study from this important group difference, and it compounds other problems in the study.

Take away lessons

Pay more attention to how limitations any clinical trial may compound each other in terms of the trial provide exaggerated estimates of the effects of treatment or the generalizability of the results to other settings.

Be careful of loose diagnostic criteria because a trial may not generalize to the same criteria being applied in settings that are different either in terms of patient population of the availability of different treatments. This is particularly important when a treatment setting has a bias in referrals and only a minority of patients being invited to participate in the trial actually agree and are enrolled.

Ask questions about just what information is obtained in comparing active treatment group and the study to its control/comparison. For start, just what is being controlled and how might that affect the estimates of the effectiveness of the active treatment?

Pay particular attention to the potent combination of the trial being unblinded, a weak comparision/control, and an active treatment that is not otherwise available to patients.

Note

*The means of determining whether the six months of fatigue might be accounted for by other medical factors was specific to the setting. Note that a review of medical records for sufficient for an unknown proportion of patients, with no further examination or medical tests.

The Department of Internal Medicine at the Radboud University Medical Center assessed the medical examination status of all patients and decided whether patients had been sufficiently examined by a medical doctor to rule out relevant medical explanations for the complaints. If patients had not been sufficiently examined, they were seen for standard medical tests at the Department of Internal Medicine prior to referral to our outpatient clinic. In accordance with recommendations by the Centers for Disease Control, sufficient medical examination included evaluation of somatic parameters that may provide evidence for a plausible somatic explanation for prolonged fatigue [for a list, see [9]. When abnormalities were detected in these tests, additional tests were made based on the judgement of the clinician of the Department of Internal Medicine who ultimately decided about the appropriateness of referral to our clinic. Trained therapists at our clinic ruled out psychiatric comorbidity as potential explanation for the complaints in unstructured clinical interviews.

workup

Before you enroll your child in the MAGENTA chronic fatigue syndrome study: Issues to be considered

[October 3 8:23 AM Update: I have now inserted Article 21 of the Declaration of Helsinki below, which is particularly relevant to discussions of the ethical problems of Dr. Esther Crawley’s previous SMILE trial.]

Petitions are calling for shutting down the MAGENTA trial. Those who organized the effort and signed the petition are commendably brave, given past vilification of any effort by patients and their allies to have a say about such trials.

Below I identify a number of issues that parents should consider in deciding whether to enroll their children in the MAGENTA trial or to withdraw them if they have already been enrolled. I take a strong stand, but I believe I have adequately justified and documented my points. I welcome discussion to the contrary.

This is a long read but to summarize the key points:

  • The MAGENTA trial does not promise any health benefits for the children participating in the trial. The information sheet for the trial was recently modified to suggest they might benefit. However, earlier versions clearly stated that no benefit was anticipated.
  • There is inadequate disclosure of likely harms to children participating in the trial.
  • An estimate of a health benefit can be evaluated from the existing literature concerning the effectiveness of the graded exercise therapy intervention with adults. Obtaining funding for the MAGENTA trial depended on a misrepresentation of the strength of evidence that it works in adult populations.  I am talking about the PACE trial.
  • Beyond any direct benefit to their children, parents might be motivated by the hope of contributing to science and the availability of effective treatments. However, these possible benefits depend on publication of results of a trial after undergoing peer review. The Principal Investigator for the MAGENTA trial, Dr. Esther Crawley, has a history of obtaining parents’ consent for participation of their children in the SMILE trial, but then not publishing the results in a timely fashion. Years later, we are still waiting.
  • Dr. Esther Crawley exposed children to unnecessary risk without likely benefit in her conduct of the SMILE trial. This clinical trial involved inflicting a quack treatment on children. Parents were not adequately informed of the nature of the treatment and the absence of evidence for any mechanism by which the intervention could conceivably be effective. This reflects on the due diligence that Dr. Crawley can be expected to exercise in the MAGENTA trial.
  • The consent form for the MAGENTA trial involves parents granting permission for the investigator to use children and parents’ comments concerning effects of the treatment for its promotion. Insufficient restrictions are placed on how the comments can be used. There is the clear precedent of comments made in the context of the SMILE trial being used to promote the quack Lightning Process treatment in the absence of evidence that treatment was actually effective in the trial. There is no guarantee that any comments collected from children and parents in the MAGENTA trial would not similarly be misused.
  • Dr. Esther Crawley participated in a smear campaign against parents having legitimate concerns about the SMILE trial. Parents making legitimate use of tools provided by the government such as Freedom of Information Act requests, appeals of decisions of ethical review boards and complaints to the General Medical Council were vilified and shamed.
  • Dr. Esther Crawley has provided direct, self-incriminating quotes in the newsletter of the Science Media Centre about how she was coached and directed by their staff to slam the patient community.  She played a key role in a concerted and orchestrated attack on the credibility of not only parents of participants in the MAGENTA trial, but of all patients having chronic fatigue syndrome/ myalgic encephalomyelitis , as well as their advocates and allies.

I am not a parent of a child eligible for recruitment to the MAGENTA trial. I am not even a citizen or resident of the UK. Nonetheless, I have considered the issues and lay out some of my considerations below. On this basis, I signed the global support version  of the UK petition to suspend all trials of graded exercise therapy in children and adults with ME/CFS. I encourage readers who are similarly in my situation outside the UK to join me in signing the global support petition.

If I were a parent of an eligible child or a resident of the UK, I would not enroll my child in MAGENTA. I would immediately withdraw my child if he or she were currently participating in the trial. I would request all the child’s data be given back or evidence that it had been destroyed.

I recommend my PLOS Mind the Brain post, What patients should require before consenting to participate in research…  as either a prelude or epilogue to the following blog post.

What you will find here is a discussion of matters that parents should consider before enrolling their children in the MAGENTA trial of graded exercise for chronic fatigue syndrome. The previous blog post [http://blogs.plos.org/mindthebrain/2015/12/09/what-patients-should-require-before-consenting-to-participate-in-research/ ]  is rich in links to an ongoing initiative from The BMJ to promote broader involvement of patients (and implicitly, parents of patients) in the design, implementation, and interpretation of clinical trials. The views put forth by The BMJ are quite progressive, even if there is a gap between their expression of views and their actual implementation. Overall, that blog post presents a good set of standards for patients (and parents) making informed decisions concerning enrollment in clinical trials.

Simon McGrathLate-breaking update: See also

Simon McGrath: PACE trial shows why medicine needs patients to scrutinise studies about their health

Basic considerations.

Patients are under no obligation to participate in clinical trials. It should be recognized that any participation typically involves burden and possibly risk over what is involved in receiving medical care outside of a clinical trial.

It is a deprivation of their human rights and a violation of the Declaration of Helsinki to coerce patients to participate in medical research without freely given, fully informed consent.

Patients cannot be denied any medical treatment or attention to which they would otherwise be entitled if they fail to enroll in a clinical trial.

Issues are compounded when consent from parents is sought for participation of vulnerable children and adolescents for whom they have legal responsibility. Although assent to participate in clinical trials is sought from children and adolescents, it remains for their parents to consent to their participation.

Parents can at any time withdraw their consent for their children and adolescents participating in trials and have their data removed, without requiring the approval of any authorities of their reason for doing so.

Declaration of Helsinki

The World Medical Association (WMA) has developed the Declaration of Helsinki as a statement of ethical principles for medical research involving human subjects, including research on identifiable human material and data.

It includes:

In medical research involving human subjects capable of giving informed consent, each potential subject must be adequately informed of the aims, methods, sources of funding, any possible conflicts of interest, institutional affiliations of the researcher, the anticipated benefits and potential risks of the study and the discomfort it may entail, post-study provisions and any other relevant aspects of the study. The potential subject must be informed of the right to refuse to participate in the study or to withdraw consent to participate at any time without reprisal. Special attention should be given to the specific information needs of individual potential subjects as well as to the methods used to deliver the information.

[October 3 8:23 AM Update]: I have now inserted Article 21 of the Declaration of Helsinki which really nails the ethical problems of the SMILE trial:

21. Medical research involving human subjects must conform to generally accepted scientific principles, be based on a thorough knowledge of the scientific literature, other relevant sources of information, and adequate laboratory and, as appropriate, animal experimentation. The welfare of animals used for research must be respected.

There is clearly in adequate scientific justification for testing the quack Lightning Process Treatment.

What Is the Magenta Trial?

The published MAGENTA study protocol states

This study aims to investigate the acceptability and feasibility of carrying out a multicentre randomised controlled trial investigating the effectiveness of graded exercise therapy compared with activity management for children/teenagers who are mildly or moderately affected with CFS/ME.

Methods and analysis 100 paediatric patients (8–17 years) with CFS/ME will be recruited from 3 specialist UK National Health Service (NHS) CFS/ME services (Bath, Cambridge and Newcastle). Patients will be randomised (1:1) to receive either graded exercise therapy or activity management. Feasibility analysis will include the number of young people eligible, approached and consented to the trial; attrition rate and treatment adherence; questionnaire and accelerometer completion rates. Integrated qualitative methods will ascertain perceptions of feasibility and acceptability of recruitment, randomisation and the interventions. All adverse events will be monitored to assess the safety of the trial.

The first of two treatments being compared is:

Arm 1: activity management

This arm will be delivered by CFS/ME specialists. As activity management is currently being delivered in all three services, clinicians will not require further training; however, they will receive guidance on the mandatory, prohibited and flexible components (see online supplementary appendix 1). Clinicians therefore have flexibility in delivering the intervention within their National Health Service (NHS) setting. Activity management aims to convert a ‘boom–bust’ pattern of activity (lots 1 day and little the next) to a baseline with the same daily amount before increasing the daily amount by 10–20% each week. For children and adolescents with CFS/ME, these are mostly cognitive activities: school, schoolwork, reading, socialising and screen time (phone, laptop, TV, games). Those allocated to this arm will receive advice about the total amount of daily activity, including physical activity, but will not receive specific advice about their use of exercise, increasing exercise or timed physical exercise.

So, the first arm of the trial is a comparison condition consisting of standard care delivered without further training of providers. The treatment is flexibly delivered, expected to vary between settings, and thus largely uncontrolled. The treatment represents a methodologically weak condition that does not adequately control for attention and positive expectations. Control conditions should be equivalent to the intervention being evaluated in these dimensions.

The second arm of the study:

Arm 2: graded exercise therapy (GET)

This arm will be delivered by referral to a GET-trained CFS/ME specialist who will receive guidance on the mandatory, prohibited and flexible components (see online supplementary appendix 1). They will be encouraged to deliver GET as they would in their NHS setting.20 Those allocated to this arm will be offered advice that is focused on exercise with detailed assessment of current physical activity, advice about exercise and a programme including timed daily exercise. The intervention will encourage children and adolescents to find a baseline level of exercise which will be increased slowly (by 10–20% a week, as per NICE guidance5 and the Pacing, graded Activity and Cognitive behaviour therapy – a randomised Evaluation (PACE)12 ,21). This will be the median amount of daily exercise done during the week. Children and adolescents will also be taught to use a heart rate monitor to avoid overexertion. Participants will be advised to stay within the target heart rate zones of 50–70% of their maximum heart rate.5 ,7

The outcome of the trial will be evaluated in terms of

Quantitative analysis

The percentage recruited of those eligible will be calculated …Retention will be estimated as the percentage of recruited children and adolescents reaching the primary 6-month follow-up point, who provide key outcome measures (the Chalder Fatigue Scale and the 36-Item Short-Form Physical Functioning Scale (SF-36 PFS)) at that assessment point.

actigraphObjective data will be collected in the form of physical activity measured by Accelerometers. These are

Small, matchbox-sized devices that measure physical activity. They have been shown to provide reliable indicators of physical activity among children and adults.

However, actual evaluation of the outcome of the trial will focus on recruitment and retention and subjective, self-report measures of fatigue and physical functioning. These subjective measures have been shown to be less valid than objective measures. Scores are  vulnerable  to participants knowing what condition they are assigned to (called ‘being unblinded’) and their perception of which intervention the investigators prefer.

It is notable that in the PACE trial of CBT and GET for chronic fatigue syndrome in adults, the investigators manipulated participants’ self-reports with praise in newsletters sent out during the trial . The investigators also switched their scoring of the self-report measures and produced results that they later conceded to have been exaggerated by their changing in scoring of the self-report measures [http://www.wolfson.qmul.ac.uk/current-projects/pace-trial#news ].

Irish ME/CFS Association Officer & Tom Kindlon
Tom Kindlon, Irish ME/CFS Association Officer

See an excellent commentary by Tom Kindlon at PubMed Commons [What’s that? ]

The validity of using subjective outcome measures as primary outcomes is questionable in such a trial

The bottom line is that the investigators have a poorly designed study with inadequate control condition. They have chosen subjective self-reports that are prone to invalidity and manipulation over objective measures like actual changes in activity or practical real-world measures like school attendance. Not very good science here. But they are asking parents to sign their children up.

What is promised to parents consenting to have the children enrolled in the trial?

The published protocol to which the investigators supposedly committed themselves stated

What are the possible benefits and risks of participating?
Participants will not benefit directly from taking part in the study although it may prove enjoyable contributing to the research. There are no risks of participating in the study.

Version 7 of the information sheet provided to parents, states

Your child may benefit from the treatment they receive, but we cannot guarantee this. Some children with CFS/ME like to know that they are helping other children in the future. Your child may also learn about research.

Survey assessments conducted by the patient community strongly contradict the suggestion that there is no risk of harm with GET.

alemAlem Matthees, the patient activist who obtained release of the PACE data and participated in reanalysis has commented:

“Given that post-exertional symptomatology is a hallmark of ME/CFS, it is premature to do trials of graded exercise on children when safety has not first been properly established in adults. The assertion that graded exercise is safe in adults is generally based on trials where harms are poorly reported or where the evidence of objectively measured increases in total activity levels is lacking. Adult patients commonly report that their health was substantially worsened after trying to increase their activity levels, sometimes severely and permanently, therefore this serious issue cannot be ignored when recruiting children for research.”

See also

Kindlon T. Reporting of harms associated with graded exercise therapy and cognitive behavioural therapy in myalgic encephalomyelitis/chronic fatigue syndrome. Bulletin of the IACFS/ME. 2011;19(2):59-111.

This thorough systematic review reports inadequacy in harm reporting in clinical trials, but:

Exercise-related physiological abnormalities have been documented in recent studies and high rates of adverse  reactions  to exercise have been  recorded in  a number of  patient surveys. Fifty-one percent of  survey respondents (range 28-82%, n=4338, 8 surveys) reported that GET worsened their health while 20% of respondents (range 7-38%, n=1808, 5 surveys) reported similar results for CBT.

The unpublished results of Dr. Esther Crawley’s SMILE trial

 A Bristol University website indicates that recruitment of the SMILE trial was completed in 2013. The published protocol for the SMILE trial

[Note the ® in the title below, indicating a test of trademarked commercial product. The significance of that is worthy of a whole other blog post. ]

Crawley E, Mills N, Hollingworth W, Deans Z, Sterne JA, Donovan JL, Beasant L, Montgomery A. Comparing specialist medical care with specialist medical care plus the Lightning Process® for chronic fatigue syndrome or myalgic encephalomyelitis (CFS/ME): study protocol for a randomised controlled trial (SMILE Trial). Trials. 2013 Dec 26;14(1):1.

States

The data monitoring group will receive notice of serious adverse events (SAEs) for the sample as whole. If the incidence of SAEs of a similar type is greater than would be expected in this population, it will be possible for the data monitoring group to receive data according to trial arm to determine any evidence of excess in either arm.

Primary outcome data at six months will be examined once data are available from 50 patients, to ensure that neither arm is having a detrimental effect on the majority of patients. An independent statistician with no other involvement in the study will investigate whether more than 20 participants in the study sample as a whole have experienced a reduction of ≥ 30 points on the SF-36 at six months. In this case, the data will then be summarised separately by trial arm, and sent to the data monitoring group for review. This process will ensure that the trial team will not have access to the outcome data separated by treatment arm.

A Bristol University website indicates that recruitment of the SMILE trial was completed in 2013. The trial was thus completed a number of years ago, but these valuable data have never been published.

The only publication from the trial so far uses selective quotes from child participants that cannot be independently evaluated. Readers are not told how representative these quotes, the outcomes for the children being quoted or the overall outcomes of the trial.

Parslow R, Patel A, Beasant L, Haywood K, Johnson D, Crawley E. What matters to children with CFS/ME? A conceptual model as the first stage in developing a PROM. Archives of Disease in Childhood. 2015 Dec 1;100(12):1141-7.

The “evaluation” of the quack Lightning Treatment in the SMILE trial and quotes from patients have also been used to promote Parker’s products as being used in NHS clinics.

How can I say the Lightning Process is quackery?

 Dr. Crawley describes the Lightning Process in the Research Ethics Application Form for the SMILE study as   ombining the principles of neurolinguistic programming, osteopathy, and clinical hypnotherapy.

That is an amazing array of three different frameworks from different disciplines. You would be hard pressed to find an example other than the Lightning Process that claimed to integrate them. Yet, any mechanisms for explaining therapeutic interventions cannot be a creative stir fry of whatever is on hand being thrown together. For a treatment to be considered science-based, there has to be a solid basis of evidence that these presumably complex processes fit together as assumed and work as assumed. I challenge Dr. Crawley or anyone else to produce a shred of credible, peer-reviewed evidence for the basic mechanism of the Lightning Process.

The entry for Neuro-linguistic programming (NLP) in Wikipedia states

There is no scientific evidence supporting the claims made by NLP advocates and it has been discredited as a pseudoscience by experts.[1][12] Scientific reviews state that NLP is based on outdated metaphors of how the brain works that are inconsistent with current neurological theory and contain numerous factual errors.[13][14

The respected Skeptics Dictionary offers a scathing critique of Phil Parker’s Lightning Process. The critique specifically cites concerns that Crawley’s SMILE trial switched outcomes to increase the likelihood of obtaining evidence of effectiveness.

 The Hampshire (UK) County Council Trading Standards Office filed a formal complaint against Phil Parker for claims made on the Lightning Process website concerning effects on CFS/ME:

The “CFS/ME” page of the website included the statements “Our survey found that 81.3 %* of clients report that they no longer have the issues they came with by day three of the LP course” and “The Lightning Process is working with the NHS on a feasibility study, please click here for further details, and for other research information click here”.

parker nhs advert
Seeming endorsements on Parker’s website. Two of them –Northern Ireland and NHS Suffolk subsequently complained that use of their insignias was unauthorized and they were quickly removed.

The “working with the NHS” refers to the collaboration with Dr. Easter Crawley.

The UK Advertising Standards Authority upheld this complaint, as well as about Parker’s claims about effectiveness with other conditions, including  multiple sclerosis, irritable bowel syndrome and fibromyalgia

 Another complaint in 2013 about claims on Phil Parker’s website was similarly upheld:

 The claims must not appear again in their current form. We welcomed the decision to remove the claims. We told Phil Parker Group not to make claims on websites within their control that were directly connected with the supply of their goods and services if those claims could not be supported with robust evidence. We also told them not to refer to conditions for which advice should be sought from suitably qualified health professionals.

 As we will see, these upheld charges of quackery occurred when parents of children participating in the SMILE trial were being vilified in the BMJ and elsewhere. Dr. Crawley was prominently featured in this vilification and was quoted in a celebration of its success by the Science Media Centre, which had orchestrated the vilification.

Captured cfs praker ad

The Research Ethics Committee approval of the SMILE trial and the aftermath

 I was not very aware of the CFS/ME literature, and certainly not all its controversies when the South West Research Ethics Committee (REC) reviewed the application for the SMILE trial and ultimately approved it on September 8, 2010.

I would have had strong opinions about it. I only first started blogging a little afterwards.  But I was very concerned about any patients being exposed to alternative and unproven medical treatments in other contexts that were not evidence-based – even more so to treatments for which promoters claimed implausible mechanisms by which they worked. I would not have felt it appropriate to inflict the Lightning Process on unsuspecting children. It is insufficient justification to put them a clinical trial simply because a particular treatment has not been evaluated.

 Prince Charles once advocated organic coffee enemas to treat advanced cancer. His endorsement generated a lot of curiosity from cancer patients. But that would not justify a randomized trial of coffee enemas. By analogy, I don’t think Dr. Esther Crawley had sufficient justification to conduct her trial, especially without warnings that that there was no scientific basis to expect the Lightning Process to work or that it would not hurt the children.

 I am concerned about clinical trials that have little likelihood of producing evidence that a treatment is effective, but that seemed designed to get these treatments into routine clinical care. it is now appreciated that some clinical trials have little scientific value but serve as experimercials or means of placing products in clinical settings. Pharmaceutical companies notoriously do this.

As it turned out, the SMILE trial succeeded admirably as a promotion for the Lightning Process, earning Phil Parker unknown but substantial fees through its use in the SMILE trial, but also in successful marketing throughout the NHS afterwards.

In short, I would been concerned about the judgment of Dr. Esther Crawley in organizing the SMILE trial. I would been quite curious about conflicts of interest and whether patients were adequately informed of how Phil Parker was benefiting.

The ethics review of the SMILE trial gave short shrift to these important concerns.

When the patient community and its advocate, Dr. Charles Shepherd, became aware of the SMILE trial’s approval, there were protests leading to re-evaluations all the way up to the National Patient Safety Agency. Examining an Extract of Minutes from South West 2 REC meeting held on 2 December 2010, I see many objections to the approval being raised and I am unsatisfied by the way in which they were discounted.

Patient, parent, and advocate protests escalated. If some acted inappropriate, this did not undermine the righteousness of others legitimate protest. By analogy, I feel strongly about police violence aimed against African-Americans and racist policies that disproportionately target African-Americans for police scrutiny and stoppng. I’m upset when agitators and provocateurs become violent at protests, but that does not delegitimize my concerns about the way black people are treated in America.

Dr. Esther Crawley undoubtedly experienced considerable stress and unfair treatment, but I don’t understand why she was not responsive to patient concerns nor  why she failed to honor her responsibility to protect child patients from exposure to unproven and likely harmful treatments.

Dr. Crawley is extensively quoted in a British Medical Journal opinion piece authored by a freelance journalist,  Nigel Hawkes:

Hawkes N. Dangers of research into chronic fatigue syndrome. BMJ. 2011 Jun 22;342:d3780.

If I had been on the scene, Dr. Crawley might well have been describing me in terms of how I would react, including my exercising of appropriate, legally-provided means of protest and complaint:

Critics of the method opposed the trial, first, Dr Crawley says, by claiming it was a terrible treatment and then by calling for two ethical reviews. Dr Shepherd backed the ethical challenge, which included the claim that it was unethical to carry out the trial in children, made by the ME Association and the Young ME Sufferers Trust. After re-opening its ethical review and reconsidering the evidence in the light of the challenge, the regional ethical committee of the NHS reiterated its support for the trial.

There was arguably some smearing of Dr. Shepherd, even in some distancing of him from the action of others:

This point of view, if not the actions it inspires, is defended by Charles Shepherd, medical adviser to and trustee of the ME Association. “The anger and frustration patients have that funding has been almost totally focused on the psychiatric side is very justifiable,” he says. “But the way a very tiny element goes about protesting about it is not acceptable.

This article escalated with unfair comparisons to animal rights activists, with condemnation of appropriate use of channels of complaint – reporting physicians to the General Medical Council.

The personalised nature of the campaign has much in common with that of animal rights activists, who subjected many scientists to abuse and intimidation in the 1990s. The attitude at the time was that the less said about the threats the better. Giving them publicity would only encourage more. Scientists for the most part kept silent and journalists desisted from writing about the subject, partly because they feared anything they wrote would make the situation worse. Some journalists have also been discouraged from writing about CFS/ME, such is the unpleasant atmosphere it engenders.

While the campaigners have stopped short of the violent activities of the animal rights groups, they have another weapon in their armoury—reporting doctors to the GMC. Willie Hamilton, an academic general practitioner and professor of primary care diagnostics at Peninsula Medical School in Exeter, served on the panel assembled by the National Institute for Health and Clinical Excellence (NICE) to formulate treatment advice for CFS/ME.

Simon Wessely and the Principal Investigator of the PACE trial, Peter White, were given free rein to dramatize their predicament posed by the protest. Much later, in the 2016 Lower Tribunal Hearing, testimony would be given by PACE

Co-Investigator Trudie Chalder would much later (2016) cast doubt on whether the harassment was as severe or violent as it was portrayed. Before that, the financial conflicts of interest of Peter White that were denied in the article would be exposed.

In response to her testimony, the UK Information Officer stated:

Professor Chalder’s evidence when she accepts that unpleasant things have been said to and about PACE researchers only, but that no threats have been made either to researchers or participants.

But in 2012, a pamphlet celebrating the success of The Science Media Centre started by Wessely would be rich in indiscreet quotes from Esther Crawley. The article in BMJ was revealed to be part of a much larger orchestrated campaign to smear, discredit and silence patients, parents, advocates and their allies.

Dr. Esther Crawley’s participation in a campaign organized by the Science Media Center to discredit patients, parents, advocates and supporters.

 The SMC would later organize a letter writing campaign to Parliament in support of Peter White and his refusal to release the PACE data to Alem Mattheees who had made a requestunder the Freedom of Information Act. The letter writing campaign was an effort to get scientific data excluded from the provisions of the freedom of information act. The effort failed and the data were subsequently released.

But here is how Esther Crawley described her assistance:

The SMC organised a meeting so we could discuss what to do to protect researchers. Those who had been subject to abuse met with press officers, representatives from the GMC and, importantly, police who had dealt with the  animal rights campaign. This transformed my view of  what had been going on. I had thought those attacking us were “activists”; the police explained they were “extremists”.

And

We were told that we needed to make better use of the law and consider using the press in our favour – as had researchers harried by animal rights extremists. “Let the public know what you are trying to do and what is happening to you,” we were told. “Let the public decide.”

And

I took part in quite a few interviews that day, and have done since. I was also inundated with letters, emails and phone calls from patients with CFS/ME all over the world asking me to continue and not “give up”. The malicious, they pointed out, are in a minority. The abuse has stopped completely. I never read the activists’ blogs, but friends who did told me that they claimed to be “confused” and “upset” – possibly because their role had been switched from victim to abuser. “We never thought we were doing any harm…”

 The patient community and its allies are still burdened by the damage of this effort and are rebuilding its credibility only slowly. Only now are they beginning to get an audience as suffering human beings with significant, legitimate unmet needs. Only now are they escaping the stigmatization that occurred at this time with Esther Crawley playing a key role.

Where does this leave us?

stop posterParents are being asked to enroll in a clinical trial without clear benefit to the children but with the possibility of considerable risk from the graded exercise. They are being asked by Esther Crawley, a physician, who has previously inflicted a quack treatment on their children with CFS/ME in the guise of a clinical trial, for which he is never published the resulting data. She has played an effective role in damaging the legitimacy and capacity of patients and parents to complain.

Given this history and these factors, why would a parent possibly want to enroll their children in the MAGENTA trial? Somebody please tell me.

Special thanks to all the patient citizen-scientists who contributed to this blog post. Any inaccuracies or excesses are entirely my own, but these persons gave me substantial help. Some are named in the blog, but others prefer anonymity.

 All opinions expressed are solely those of James C Coyne. The blog post in no way conveys any official position of Mind the Brain, PLOS blogs or the larger PLOS community. I appreciate the free expression of  personal opinion that I am allowed.

 

 

 

 

 

 

Relaxing vs Stimulating Acupressure for Fatigue Among Breast Cancer Patients: Lessons to be Learned

  • A chance to test your rules of thumb for quickly evaluating clinical trials of alternative or integrative  medicine in prestigious journals.
  • A chance to increase your understanding of the importance of  well-defined control groups and blinding in evaluating the risk of bias of clinical trials.
  • A chance to understand the difference between merely evidence-based treatments versus science-based treatments.
  • Lessons learned can be readily applied to many wasteful evaluations of psychotherapy with shared characteristics.

A press release from the University of Michigan about a study of acupressure for fatigue in cancer patients was churnaled  – echoed – throughout the media. It was reproduced dozens of times, with little more than an editor’s title change from one report to the next.

Fortunately, the article that inspired all the fuss was freely available from the prestigious JAMA: Oncology. But when I gained access, I quickly saw that it was not worth my attention, based on what I already knew or, as I often say, my prior probabilities. Rules of thumb is a good enough term.

So the article became another occasion for us to practice our critical appraisal skills, including, importantly, being able to make reliable and valid judgments that some attention in the media is worth dismissing out of hand, even when tied to an article in a prestigious medical journal.

The press release is here: Acupressure reduced fatigue in breast cancer survivors: Relaxing acupressure improved sleep, quality of life.

A sampling of the coverage:

sample coverage

As we’ve come to expect, the UK Daily Mail editor added its own bit of spin:

daily mailHere is the article:

Zick SM, Sen A, Wyatt GK, Murphy SL, Arnedt J, Harris RE. Investigation of 2 Types of Self-administered Acupressure for Persistent Cancer-Related Fatigue in Breast Cancer Survivors: A Randomized Clinical Trial. JAMA Oncol. Published online July 07, 2016. doi:10.1001/jamaoncol.2016.1867.

Here is the Trial registration:

All I needed to know was contained in a succinct summary at the Journal website:

key points

This is a randomized clinical trial (RCT) in which two active treatments that

  • Lacked credible scientific mechanisms
  • Were predictably shown to be better than
  • A routine care that lacked the positive expectations and support.
  • A primary outcome assessed by  subjectiveself-report amplified the illusory effectiveness of the treatments.

But wait!

The original research appeared in a prestigious peer-reviewed journal published by the American Medical Association, not a  disreputable journal on Beall’s List of Predatory Publishers.

Maybe  this means publication in a peer-reviewed prestigious journal is insufficient to erase our doubts about the validity of claims.

The original research was performed with a $2.65 million peer-reviewed grant from the National Cancer Institute.

Maybe NIH is wasting scarce money on useless research.

What is acupressure?

 According to the article

Acupressure, a method derived from traditional Chinese medicine (TCM), is a treatment in which pressure is applied with fingers, thumbs, or a device to acupoints on the body. Acupressure has shown promise for treating fatigue in patients with cancer,23 and in a study24 of 43 cancer survivors with persistent fatigue, our group found that acupressure decreased fatigue by approximately 45% to 70%. Furthermore, acupressure points termed relaxing (for their use in TCM to treat insomnia) were significantly better at improving fatigue than another distinct set of acupressure points termed stimulating (used in TCM to increase energy).24 Despite such promise, only 5 small studies24– 28 have examined the effect of acupressure for cancer fatigue.

290px-Acupuncture_point_Hegu_(LI_4)You can learn more about acupressure here. It is a derivative of acupuncture, that does not involve needles, but the same acupuncture pressure points or acupoints as acupuncture.

Don’t be fooled by references to traditional Chinese medicine (TCM) as a basis for claiming a scientific mechanism.

See Chairman Mao Invented Traditional Chinese Medicine.

Chairman Mao is quoted as saying “Even though I believe we should promote Chinese medicine, I personally do not believe in it. I don’t take Chinese medicine.”

 

Alan Levinovitz, author of the Slate article further argues:

 

In truth, skepticism, empiricism, and logic are not uniquely Western, and we should feel free to apply them to Chinese medicine.

After all, that’s what Wang Qingren did during the Qing Dynasty when he wrote Correcting the Errors of Medical Literature. Wang’s work on the book began in 1797, when an epidemic broke out in his town and killed hundreds of children. The children were buried in shallow graves in a public cemetery, allowing stray dogs to dig them up and devour them, a custom thought to protect the next child in the family from premature death. On daily walks past the graveyard, Wang systematically studied the anatomy of the children’s corpses, discovering significant differences between what he saw and the content of Chinese classics.

And nearly 2,000 years ago, the philosopher Wang Chong mounted a devastating (and hilarious) critique of yin-yang five phases theory: “The horse is connected with wu (fire), the rat with zi (water). If water really conquers fire, [it would be much more convincing if] rats normally attacked horses and drove them away. Then the cock is connected with ya (metal) and the hare with mao (wood). If metal really conquers wood, why do cocks not devour hares?” (The translation of Wang Chong and the account of Wang Qingren come from Paul Unschuld’s Medicine in China: A History of Ideas.)

Trial design

A 10-week randomized, single-blind trial comparing self-administered relaxing acupressure with stimulating acupressure once daily for 6 weeks vs usual care with a 4-week follow-up was conducted. There were 5 research visits: at screening, baseline, 3 weeks, 6 weeks (end of treatment), and 10 weeks (end of washout phase). The Pittsburgh Sleep Quality Index (PSQI) and Long-Term Quality of Life Instrument (LTQL) were administered at baseline and weeks 6 and 10. The Brief Fatigue Inventory (BFI) score was collected at baseline and weeks 1 through 10.

Note that the trial was “single-blind.” It compared two forms of acupressure, relaxing versus stimulating. Only the patient was blinded to which of these two treatments was being provided, except patients clearly knew whether or not they were randomized to usual care. The providers were not blinded and were carefully supervised by the investigators and provided feedback on their performance.

The combination of providers not being blinded, patients knowing whether they were randomized to routine care, and subjective self-report outcomes together are the makings of a highly biased trial.

Interventions

Usual care was defined as any treatment women were receiving from health care professionals for fatigue. At baseline, women were taught to self-administer acupressure by a trained acupressure educator.29 The 13 acupressure educators were taught by one of the study’s principal investigators (R.E.H.), an acupuncturist with National Certification Commission for Acupuncture and Oriental Medicine training. This training included a 30-minute session in which educators were taught point location, stimulation techniques, and pressure intensity.

Relaxing acupressure points consisted of yin tang, anmian, heart 7, spleen 6, and liver 3. Four acupoints were performed bilaterally, with yin tang done centrally. Stimulating acupressure points consisted of du 20, conception vessel 6, large intestine 4, stomach 36, spleen 6, and kidney 3. Points were administered bilaterally except for du 20 and conception vessel 6, which were done centrally (eFigure in Supplement 2). Women were told to perform acupressure once per day and to stimulate each point in a circular motion for 3 minutes.

Note that the control/comparison condition was an ill-defined usual care in which it is not clear that patients received any attention and support for their fatigue. As I have discussed before, we need to ask just what was being controlled by this condition. There is no evidence presented that patients had similar positive expectations and felt similar support in this condition to what was provided in the two active treatment conditions. There is no evidence of equivalence of time with a provider devoted exclusively to the patients’ fatigue. Unlike patients assigned to usual care, patients assigned to one of the acupressure conditions received a ritual delivered with enthusiasm by a supervised educator.

Note the absurdity of the  naming of the acupressure points,  for which the authority of traditional Chinese medicine is invoked, not evidence. This absurdity is reinforced by a look at a diagram of acupressure points provided as a supplement to the article.

relaxation acupuncture pointsstimulation acupressure points

 

Among the many problems with “acupuncture pressure points” is that sham stimulation generally works as well as actual stimulation, especially when the sham is delivered with appropriate blinding of both providers and patients. Another is that targeting places of the body that are not defined as acupuncture pressure points can produce the same results. For more elaborate discussion see Can we finally just say that acupuncture is nothing more than an elaborate placebo?

 Worth looking back at credible placebo versus weak control condition

In a recent blog post   I discussed an unusual study in the New England Journal of Medicine  that compared an established active treatment for asthma to two credible control conditions, one, an inert spray that was indistinguishable from the active treatment and the other, acupuncture. Additionally, the study involved a no-treatment control. For subjective self-report outcomes, the active treatment, the inert spray and acupuncture were indistinguishable, but all were superior to the no treatment control condition. However, for the objective outcome measure, the active treatment was more effective than all of the three comparison conditions. The message is that credible placebo control conditions are superior to control conditions lacking and positive expectations, including no treatment and, I would argue, ill-defined usual care that lacks positive expectations. A further message is ‘beware of relying on subjective self-report measures to distinguish between active treatments and placebo control conditions’.

Results

At week 6, the change in BFI score from baseline was significantly greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.6 [1.5] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.1 [1.6] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P  = .29). At week 10, the change in BFI score from baseline was greater in relaxing acupressure and stimulating acupressure compared with usual care (mean [SD], −2.3 [1.4] for relaxing acupressure, −2.0 [1.5] for stimulating acupressure, and −1.0 [1.5] for usual care; P < .001 for both acupressure arms vs usual care), and there was no significant difference between acupressure arms (P > .99) (Figure 2). The mean percentage fatigue reductions at 6 weeks were 34%, 27%, and −1% in relaxing acupressure, stimulating acupressure, and usual care, respectively.

These are entirely expectable results. Nothing new was learned in this study.

The bottom line for this study is that there was absolutely nothing to be gained by comparing an inert placebo condition to another inert placebo condition to an uninformative condition without clear evidence the control condition offered control of nonspecific factors – positive expectations, support, and attention. This was a waste of patient time and effort, as well as government funds, and produced results that were potentially misleading to patients. Namely, results are likely to be misinterpreted the acupressure is an effective, evidence-based treatment for cancer-related fatigue.

How the authors explained their results

Why might both acupressure arms significantly improve fatigue? In our group’s previous work, we had seen that cancer fatigue may arise through multiple distinct mechanisms.15 Similarly, it is also known in the acupuncture literature that true and sham acupuncture can improve symptoms equally, but they appear to work via different mechanisms.40 Therefore, relaxing acupressure and stimulating acupressure could elicit improvements in symptoms through distinct mechanisms, including both specific and nonspecific effects. These results are also consistent with TCM theory for these 2 acupoint formulas, whereby the relaxing acupressure acupoints were selected to treat insomnia by providing more restorative sleep and improving fatigue and the stimulating acupressure acupoints were chosen to improve daytime activity levels by targeting alertness.

How could acupressure lead to improvements in fatigue? The etiology of persistent fatigue in cancer survivors is related to elevations in brain glutamate levels, as well as total creatine levels in the insula.15 Studies in acupuncture research have demonstrated that brain physiology,41 chemistry,42 and function43 can also be altered with acupoint stimulation. We posit that self-administered acupressure may have similar effects.

Among the fallacies of the authors’ explanation is the key assumption that they are dealing with a specific, active treatment effect rather than a nonspecific placebo intervention. Supposed differences between relaxing versus stimulating acupressure arise in trials with a high risk of bias due to unblinded providers of treatment and inadequate control/comparison conditions. ‘There is no there there’ to be explained, to paraphrase a quote attributed to Gertrude Stein

How much did this project cost?

 According to the NIH Research Portfolios Online Reporting Tools website, this five-year project involved support by the federal government of $2,265,212 in direct and indirect costs. The NCI program officer for investigator-initiated  R01CA151445 is Ann O’Marawho serves ina similar role for a number of integrative medicine projects.

How can expenditure of this money be justified for determining whether so-called stimulating acupressure is better than relaxing acupressure for cancer-related fatigue?

 Consider what could otherwise have been done with these monies.

 Evidence-based versus science based medicine

Proponents of unproven “integrative cancer treatments” can claim on the basis of the study the acupressure is an evidence-based treatment. Future Cochrane Collaboration Reviews may even cite this study as evidence for this conclusion.

I normally label myself as an evidence-based skeptic. I require evidence for claims of the efficacy of treatments and am skeptical of the quality of the evidence that is typically provided, especially when it comes from enthusiasts of particular treatments. However, in other contexts, I describe myself as a science based medicine skeptic. The stricter criteria for this term is that not only do I require evidence of efficacy for treatments, I require evidence for the plausibility of the science-based claims of mechanism. Acupressure might be defined by some as an evidence-based treatment, but it is certainly not a science-based treatment.

For further discussion of this important distinction, see Why “Science”-Based Instead of “Evidence”-Based?

Broader relevance to psychotherapy research

The efficacy of psychotherapy is often overestimated because of overreliance on RCTs that involve inadequate comparison/control groups. Adequately powered studies of the comparative efficacy of psychotherapy that include active comparison/control groups are infrequent and uniformly provide lower estimates of just how efficacious psychotherapy is. Most psychotherapy research includes subjective patient self-report measures as the primary outcomes, although some RCTs provide independent, blinded interview measures. A dependence on subjective patient self-report measures amplifies the bias associated with inadequate comparison/control groups.

I have raised these issues with respect to mindfulness-based stress reduction (MBSR) for physical health problems  and for prevention of relapse in recurrence in patients being tapered from antidepressants .

However, there is a broader relevance to trials of psychotherapy provided to medically ill patients with a comparison/control condition that is inadequate in terms of positive expectations and support, along with a reliance on subjective patient self-report outcomes. The relevance is particularly important to note for conditions in which objective measures are appropriate, but not obtained, or obtained but suppressed in reports of the trial in the literature.

Was independent peer review of the PACE trial articles possible?

I ponder this question guided by Le Chavalier C. Auguste Dupin, the first fictional detective, before anyone was called “detective.”

mccartney too manyArticles reporting the PACE trial have extraordinary numbers of authors, acknowledgments, and institutional affiliations. A considerable proportion of all persons and institutions involved in researching chronic fatigue and related conditions in the UK have a close connection to PACE.

This raises issues about

  • Obtaining independent peer review of these articles that is not tainted by reviewer conflict of interest.
  • Just what authorship on a PACE trial paper represents and whether granting of authorship conforms to international standards.
  • The security of potential critics contemplating speaking out about whatever bad science they find in the PACE trial articles. The security of potential reviewers who are negative and can be found out. Critics within the UK risk isolation and blacklisting from a large group who have investments in what could be exaggerated estimates of the quality and outcome of PACE trial.
  • Whether grants associated with multimillion pound PACE study could have received the independent peer review that is so crucial to assuring that proposals selected to be funded are of the highest quality.

Issues about the large number of authors, acknowledgments, and institutional affiliations become all the more salient as critics [1, 2, 3] find again serious flaws inthe conduct and the reporting of the Lancet Psychiatry 2015 long-term follow-up study. Numerous obvious Questionable Research Practices (QRPs) survived peer review. That implies at least ineptness in peer review or even Questionable Publication Practices (QPPs).

The important question becomes: how is the publication of questionable science to be explained?

Maybe there were difficulties finding reviewers with relevant expertise who were not in some way involved in the PACE trial or affiliated with departments and institutions that would be construed as benefiting from a positive review outcome, i.e. a publication?

Or in the enormous smallness of the UK, is independent peer review achieved by persons putting those relationships and affiliations aside to produce an impeccably detached and rigorous review process?

The untrustworthiness of both the biomedical and psychological literatures are well-established. Nonpharmacological interventions have fewer safeguards than drug trials, in terms of adherence to preregistration, reporting standards like CONSORT, and enforcement of sharing of data.

Open-minded skeptics should be assured of independent peer review of nonpharmacological clinical trials, particularly when there is evidence that persons and groups with considerable financial interests attempt to control what gets published and what is said about their favored interventions. Reviewers with potential conflicts of interest should be excluded from evaluation of manuscripts.

Independent peer review of the PACE trial by those with relevant expertise might not be possible the UK where much of the conceivable expertise is in some way directly or indirectly attached to the PACE trial.

A Dutch observer’s astute observations about the PACE articles

My guest blogger Dutch research biologist Klaas van Dijk  called attention to the exceptionally large number of authors and institutions listed for a pair of PACE trial papers.

klaasKlaas noted

The Pubmed entry for the 2011 Lancet paper lists 19 authors:

B J Angus, H L Baber, J Bavinton, M Burgess, T Chalder, L V Clark, D L Cox, J C DeCesare, K A Goldsmith, A L Johnson, P McCrone, G Murphy, M Murphy, H O’Dowd, PACE trial management group*, L Potts, M Sharpe, R Walwyn, D Wilks and P D White (re-arranged in an alphabetic order).

The actual article from the Lancet website ( http://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(11)60096-2.pdf and also http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext ) lists 19 authors who are acting ‘on behalf of the PACE trial management group†’. But the end of the paper (page 835) states: “PACE trial group.” This term is not identical to “PACE trial management group”.
.
In total, another 19 names are listed under “PACE trial group” (page 835): Hiroko Akagi, Mansel Aylward, Barbara Bowman Jenny Butler, Chris Clark, Janet Darbyshire, Paul Dieppe, Patrick Doherty, Charlotte Feinmann, Deborah Fleetwood, Astrid Fletcher, Stella Law, M Llewelyn, Alastair Miller, Tom Sensky, Peter Spencer, Gavin Spickett, Stephen Stansfeld and Alison Wearden (re-arranged in an alphabetic order).

There is no overlap with the first 19 people who are listed as author of the paper.

So how many people can claim to be an author of this paper? Are all these 19 people of the “PACE trial management group” (not identical to “PACE trial group”???) also some sort of co-author of this paper? Do all these 19 people of the second group also agree with the complete contents of the paper? Do all 38 people agree with the full contents of the paper?

The paper lists many affiliations:
* Queen Mary University of London, UK
* King’s College London, UK
* University of Cambridge, UK
* University of Cumbria, UK
* University of Oxford, UK
* University of Edinburgh, UK
* Medical Research Council Clinical Trials Unit, London, UK
* South London and Maudsley NHS Foundation Trust, London, UK
* The John Radcliffe Hospital, Oxford, UK
* Royal Free Hospital NHS Trust, London, UK
* Barts and the London NHS Trust, London, UK
* Frenchay Hospital NHS Trust, Bristol, UK;
* Western General Hospital, Edinburgh, UK

Do all these affiliations also agree with the full contents of the paper? Am I right to assume that all 38 people (names see above) and all affiliations / institutes (see above) plainly refuse to give critics / other scientists / patients / patient groups (etc.) access to the raw research data of this paper and am I am right with my assumption that it is therefore impossible for all others (including allies of patients / other scientists / interested students, etc.) to conduct re-calculations, check all statements with the raw data, etc?

Decisions whether to accept manuscripts for publication are made in dark places based on opinions offered by people whose identities may be known only to editors. Actually, though, in a small country like the UK, peer-reviewed may be a lot less anonymous than intended and possibly a lot less independent and free of conflict of interests. Without a lot more transparency than is currently available concerning peer review the published papers underwent, we are left to our speculation.

Prepublication peer review is just one aspect of the process of getting research findings vetted and shaped and available to the larger scientific community, and an overall process that is now recognized as tainted with untrustworthiness.

Rules for granting authorship

Concerns about gift and unwarranted authorship have increased not only because of growing awareness of unregulated and unfair practices, but because of the importance attached to citations and authorship for professional advancement. Journals are increasingly requiring documentation that all authors have made an appropriate contribution to a manuscript and have approved the final version

Yet operating rules for granting authorship in many institutional settings vary greatly from the stringent requirements of journals. Contrary to the signed statements that corresponding authors have to make in submitting a manuscript to a journal, many clinicians expect an authorship in return for access to patients. Many competitive institutions award and withhold authorship based on politics and good or bad behavior that have nothing to do with requirements of journals.

Basically, despite the existence of numerous ethical guidelines and explicit policies, authors and institutions can largely do what they want when it comes to granting and withholding authorship.

Persons are quickly disappointed when they are naïve enough to complain about unwarranted authorships or being forced to include authors on papers without appropriate contribution or being denied authorship for an important contribution. They quickly discover that whistleblowers are generally considered more of a threat to institutions and punished more severely than alleged wrongdoers, no matter how strong the evidence may be.

The Lancet website notes

The Lancet is a signatory journal to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, issued by the International Committee of Medical Journal Editors (ICMJE Recommendations), and to the Committee on Publication Ethics (COPE) code of conduct for editors. We follow COPE’s guidelines.

The ICMJE recommends that an author should meet all four of the following criteria:

  • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work;
  • Drafting the work or revising it critically for important intellectual content;
  • Final approval of the version to be published;
  • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.”

The intent of these widely endorsed recommendations is that persons associated with a large project have to do a lot to claim their places as authors.

Why the fuss about acknowledgments?

I’ve heard from a number of graduate students and junior investigators that they have had their first manuscripts held up in the submission process because they did not obtain written permission for acknowledgments. Why is that considered so important?

Mention in an acknowledgment is an honor. But it implies involvement in a project and approval of a resulting manuscript. In the past, there were numerous instances where people were named in acknowledgments without having given permission. There was a suspicion sometimes confirmed, that they had been acknowledged only to improve the prospects of a manuscript for getting published. There are other instances where persons were included in acknowledgments without permission with the intent of authors avoiding them in the review process because of the appearance of a conflict of interest.

The expectation is that anyone contributing enough to a manuscript to be acknowledged as a potential conflict of interest in deciding whether it is suitable for publication.

But, as in other aspects of a mysterious and largely anonymous review process, whether people who were acknowledged in manuscripts were barred from participating in review of a manuscript cannot be established by readers.

What is the responsibility of reviewers to declare conflict of interest?

Reviewers are expected to declare conflicts of interest accepting a manuscript to review. But often they are presented with a tick box without a clear explanation of the criteria for the appearance of conflict of interest. But reviewers can usually continue considering a manuscript after acknowledging that they do have an association with authors or institutional affiliation, but they do not consider it a conflict. It is generally accepted that statement.

Authors excluding from the review process persons they consider to have a negative bias

In submitting a manuscript, authors are offered an opportunity to identify persons who should be excluded because of the appearance of a negative bias. Editors generally take these requests quite seriously. As an editor, I sometimes receive a large number of requested exclusions by authors who worry about opinions of particular people.

While we don’t know what went on in prepublication peer review, the PACE investigators have repeatedly and aggressively attempted to manipulate post publication portrayals of their trial in the media. Can we rule out that they similarly try to control potential critics in the prepublication peer review of their papers?

The 2015 Lancet Psychiatry secondary mediation analysis article

Chalder, T., Goldsmith, K. A., Walker, J., & White, P. D. Sharpe, M., Pickles, A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. The Lancet Psychiatry, 2: 141–52

The acknowledgments include

We acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, excluding ARP, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, LV Clark, DL Cox, JC DeCesare, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks. This report is independent research partly arising from a doctoral research fellowship supported by the NIHR.

Fifteen of the authors of the 2011 Lancet PACE paper are no longer present, and another author has been added. The PACE Trial Management Group is again acknowledged, but there is no mention of the separate PACE trial group. We can’t tell why there has been a major reduction in the number of authors and acknowledgments or why it came about. Or whether people who would been dropped participated in a review of this paper. But what is obvious is that this is an exceedingly flawed mediation analysis crafted to a foregone conclusion. I’ll say more about that in future blogs, but we can only speculate how the bad publication practices made it through peer review.

This article is a crime against the practice of secondary mediation analyses. If I were a prospect of author present in a discussion, I would flee before it became a crime scene.

I am told I have over 350 publications, but I considered vulgar for authors to keep track of exact numbers. But there are many potential publications that are not included in this number because I declined authorship because I could not agree with the spin that others were trying to put on the reporting of the findings. In such instances, I exclude myself from review of the resulting manuscript because of the appearance of a conflict of interest. We can ponder how many of the large pool of past PACE authors refused authorship on this paper when it was offered and homely declined to participate in subsequent peer review because of the appearance of a conflict of interest.

The 2015 Lancet Psychiatry long-term follow-up article

Sharpe, M., Goldsmith, K. A., Chalder, T., Johnson, A.L., Walker, J., & White, P. D. (2015). Rehabilitative treatments for chronic fatigue syndrome: long-term follow-up from the PACE trial. The Lancet Psychiatry, http://dx.doi.org/10.1016/S2215-0366(15)00317-X

The acknowledgments include

We gratefully acknowledge the help of the PACE Trial Management Group, which consisted of the authors of this paper, plus (in alphabetical order): B Angus, H Baber, J Bavinton, M Burgess, L V Clark, D L Cox, J C DeCesare, E Feldman, P McCrone, G Murphy, M Murphy, H O’Dowd, T Peto, L Potts, R Walwyn, and D Wilks, and the King’s Clinical Trials Unit. We thank Hannah Baber for facilitating the long-term follow-up data collection.

Again, there are authors and acknowledgments missing from the early paper and were in the dark about how and why that happened and whether missing persons were considered free enough of conflict of interest to evaluate this article when it was in manuscript form. But as documented in a blog post at Mind the Brain, there were serious, obvious flaws in the conduct and reporting of the follow-up study. It is a crime against best practices for the proper conduct and reporting of clinical trials. And again we can speculate how it got through peer review.

… And grant reviews?

Where can UK granting agencies obtain independent peer review of past and future grants associated with the PACE trial? To take just one example, the 2015 Lancet Psychiatry secondary mediation analysis was funded in part by a NIHR doctoral research fellowship grant. The resulting paper has many fewer authors than the 2011 Lancet. Did everyone who was an author or mentioned in the acknowledgments on that paper exclude themselves from review of the screen? Who, then, would be left

In Germany and the Netherlands, concerns about avoiding the appearance of conflict of interest in obtaining independent peer review of grants has led to heavy reliance on expertise from outside the country. This does not imply any improprieties from expertise within these countries, but rather the necessity of maintaining a strong appearance that vested interests have not unduly influenced grant review. Perhaps the situation of apparent with the PACE trial suggests that journals and grant review panels within the UK might consider similar steps.

Contemplating the evidence against independent peer review

  • We have a mob of people as authors and mentions in acknowledgments. We have a huge conglomerate of institutions acknowledged.
  • We have some papers with blatant questionable research and reporting practices published in prestigious journals after ostensible peer review.
  • We are left in the dark about what exactly happened in peer review, but that the articles were adequately peer reviewed is a crucial part of their credability.

What are we to conclude?

The_Purloined_LetterI think of what Edgar Allen Poe’s wise character, Le Chevalier C. Auguste Dupin would say. For those of you who don’t know who he is:

Le Chevalier C. Auguste Dupin  is a fictional detective created by Edgar Allan Poe. Dupin made his first appearance in Poe’s “The Murders in the Rue Morgue” (1841), widely considered the first detective fiction story.[1] He reappears in “The Mystery of Marie Rogêt” (1842) and “The Purloined Letter” (1844)…

Poe created the Dupin character before the word detective had been coined. The character laid the groundwork for fictitious detectives to come, including Sherlock Holmes, and established most of the common elements of the detective fiction genre.

I think if we asked Dupin, he would say the danger is that the question is too fascinating to give up, but impossible to resolve without evidence we cannot access. We can blog, we can discuss this important question, but in the end we cannot answer it with certainty.

Sigh.

Uninterpretable: Fatal flaws in PACE Chronic Fatigue Syndrome follow-up study

Earlier decisions by the investigator group preclude valid long-term follow-up evaluation of CBT for chronic fatigue syndrome (CFS).

CFS-Think-of-the-worst1At the outset, let me say that I’m skeptical whether we can hold the PACE investigators responsible for the outrageous headlines that have been slapped on their follow-up study and on the comments they have made in interviews.

The Telegraph screamed

Chronic Fatigue Syndrome sufferers ‘can overcome symptoms of ME with positive thinking and exercise’

Oxford University has found ME is not actually a chronic illness

My own experience critiquing media interpretation of scientific studies suggests that neither researchers nor even journalists necessarily control shockingly inaccurate headlines placed on otherwise unexceptional media coverage. On the other hand, much distorted and exaggerated media coverage starts with statements made by researchers and by press releases from their institutions.

The one specific quote attributed to a PACE investigator is unfortunate because of its potential to be misinterpreted by professionals, persons who suffer from chronic fatigue syndrome, and the people around them affected by their functioning.

“It’s wrong to say people don’t want to get better, but they get locked into a pattern and their life constricts around what they can do. If you live within your limits that becomes a self-fulfilling prophesy.”

It suggests that willfulness causes CFS sufferers’ impaired functioning. This is ridiculous as application of the discredited concept of fighting spirit to cancer patients’ failure to triumph against their life altering and life-threatening condition. Let’s practice the principle of charity and assume this is not the intention of the PACE investigator, particularly when there is so much more for which we should give them responsibility.

Go here for a fuller evaluation that I endorse of the Telegraph coverage of PACE follow-up study.

Having read the PACE follow-up study carefully, my assessment is that the data presented are uninterpretable. We can temporarily suspend critical thinking and some basic rules for conducting randomized trials (RCTs), follow-up studies, and analyzing the subsequent data. Even if we do, we should reject some of the interpretations offered by the PACE investigators as unfairly spun to fit what has already a distorted positive interpretation oPACE trial HQf the results.

It is important to note that the PACE follow-up study can only be as good as the original data it’s based on. And in the case of the PACE study itself, a recent longread critique by UC Berkeley journalism and public health lecturer David Tuller has arguably exposed such indefensible flaws that any follow-up is essentially meaningless. See it for yourself [1, 2, 3 ].

This week’s report of the PACE long term follow-up study and a commentary  are available free at the Lancet Psychiatry website after a free registration. I encourage everyone to download a copy before reading further. Unfortunately, some crucial details of the article are highly technical and some details crucial to interpreting the results are not presented.

I will provide practical interpretations of the most crucial technical details so that they are more understandable to the nonspecialist. Let me know where I fail.

1When Cherished Beliefs Clash with EvidenceTo encourage proceeding with this longread, but to satisfy those who are unwilling or unable to proceed, I’ll reveal my main points are

  • The PACE investigators sacrificed any possibility of meaningful long-term follow-up by breaking protocol and issuing patient testimonials about CBT before accrual was even completed.
  • This already fatal flaw was compounded with a loose recommendation for treatment after the intervention phase of the trial ended. The investigators provide poor documentation of which treatment was taken up by which patients and whether there was crossover in the treatment being received during follow up.
  • Investigators’ attempts to correct methodological issues with statistical strategies lapses into voodoo statistics.
  • The primary outcome self-report variables are susceptible to manipulation, investigator preferences for particular treatments, peer pressure, and confounding with mental health variables.
  • The Pace investigators exploited ambiguities in the design and execution of their trial with self-congratulatory, confirmatory bias.

The Lancet Psychiatry summary/abstract of the article

Background. The PACE trial found that, when added to specialist medical care (SMC), cognitive behavioural therapy (CBT), or graded exercise therapy (GET) were superior to adaptive pacing therapy (APT) or SMC alone in improving fatigue and physical functioning in people with chronic fatigue syndrome 1 year after randomisation. In this pre-specified follow-up study, we aimed to assess additional treatments received after the trial and investigate long-term outcomes (at least 2 years after randomisation) within and between original treatment groups in those originally included in the PACE trial.

Findings Between May 8, 2008, and April 26, 2011, 481 (75%) participants from the PACE trial returned questionnaires. Median time from randomisation to return of long-term follow-up assessment was 31 months (IQR 30–32; range 24–53). 210 (44%) participants received additional treatment (mostly CBT or GET) after the trial; with participants originally assigned to SMC alone (73 [63%] of 115) or APT (60 [50%] of 119) more likely to seek treatment than those originally assigned to GET (41 [32%] of 127) or CBT (36 [31%] of 118; p<0·0001). Improvements in fatigue and physical functioning reported by participants originally assigned to CBT and GET were maintained (within-group comparison of fatigue and physical functioning, respectively, at long-term follow-up as compared with 1 year: CBT –2·2 [95% CI –3·7 to –0·6], 3·3 [0·02 to 6·7]; GET –1·3 [–2·7 to 0·1], 0·5 [–2·7 to 3·6]). Participants allocated to APT and to SMC alone in the trial improved over the follow-up period compared with 1 year (fatigue and physical functioning, respectively: APT –3·0 [–4·4 to –1·6], 8·5 [4·5 to 12·5]; SMC –3·9 [–5·3 to –2·6], 7·1 [4·0 to 10·3]). There was little evidence of differences in outcomes between the randomised treatment groups at long-term follow-up.

Interpretation The beneficial effects of CBT and GET seen at 1 year were maintained at long-term follow-up a median of 2·5 years after randomisation. Outcomes with SMC alone or APT improved from the 1 year outcome and were similar to CBT and GET at long-term follow-up, but these data should be interpreted in the context of additional therapies having being given according to physician choice and patient preference after the 1 year trial final assessment. Future research should identify predictors of response to CBT and GET and also develop better treatments for those who respond to neither.

fem imageNote the contradiction here which will persist throughout the paper, the official Oxford University press release, quotes from the PACE investigators to the media, and media coverage. On the one hand we are told:

Improvements in fatigue and physical functioning reported by participants originally assigned to CBT and GET were maintained…

Yet we are also told:

There was little evidence of differences in outcomes between the randomised treatment groups at long-term follow-up.

Which statement is to be given precedence? To the extent that features of a randomized trial have been preserved in the follow-up (which we will see, is not actually the case), a lack of between group differences at follow-up should be given precedence over any persistence of change within groups from baseline. That is a not controversial point for interpreting clinical trials.

A statement about group differences at follow up should proceed and qualify any statement about within-group follow up. Otherwise why bother with a RCT in the first place?

The statement in the Interpretation section of the summary/abstract has an unsubstantiated spin in favor of the investigators’ preferred intervention.

Outcomes with SMC alone or APT improved from the 1 year outcome and were similar to CBT and GET at long-term follow-up, but these data should be interpreted in the context of additional therapies having being given according to physician choice and patient preference after the 1 year trial final assessment.

If we’re going to be cautious and qualified in our statements, there are lots of other explanations for similar outcomes in the intervention and control groups that are more plausible. Simply put and without unsubstantiated assumptions, any group differences observed earlier have dissipated. Poof! Any advantages of CBT and GET are not sustained.

How the PACE investigators destroyed the possibility of an interpretable follow-up study

imagesNeither the Lancet Psychiatry article nor any recent statements by the PACE investigators acknowledged how these investigators destroyed any possibility of analyses of meaningful follow-up data.

Before the intervention phase of the trial was even completed, even before accrual of patients was complete, the investigators published a newsletter in December 2008 directed at trial participants. An article appropriately reminds participants of the upcoming two and one half year follow-up. But then it acknowledges difficulty accruing patients, but that additional funding has been received from the MRC to extend recruiting. And then glowing testimonials appear on p. 3 of the newsletter about the effects of their intervention.

“Being included in this trial has helped me tremendously. (The treatment) is now a way of life for me, I can’t imagine functioning fully without it. I have nothing but praise and thanks for everyone involved in this trial.”

“I really enjoyed being a part of the PACE Trial. It helped me to learn more about myself, especially (treatment), and control factors in my life that were damaging. It is difficult for me to gauge just how effective the treatment was because 2007 was a particularly strained, strange and difficult year for me but I feel I survived and that the trial armed me with the necessary aids to get me through. It was also hugely beneficial being part of something where people understand the symptoms and illness and I really enjoyed this aspect.”

These testimonials are a horrible breach of protocol. Taken together with the acknowledgment of the difficulty accruing patients, the testimonials solicit expression of gratitude and apply pressure on participants to endorse the trial by providing a positive of their outcome. Some minimal effort is made to disguise the conditions from which the testimonials come. However, references to a therapist and, in the final quote above, to “control factors in my life that were damaging” leave no doubt that the CBT and GET favored by the investigators is having positive results.

Probably more than in most chronic illnesses, CFS sufferers turn to each other for support in the face of bewildering and often stigmatizing responses from the medical community. These testimonials represent a form of peer pressure for positive evaluations of the trial.

Any investigator group that would deliberately violate protocol in this manner deserves further scrutiny for other violations and threats to the validity of their results. I challenge defenders of the PACE study to cite other precedents for this kind of manipulation of clinical trials participants. What would they have thought if a drug company had done this for the evaluation of their medication?

The breakdown of randomization as further destruction of the interpretability of follow-up results

Returning to the Lancet Psychiatry article itself, note the following:

After completing their final trial outcome assessment, trial participants were offered an additional PACE therapy if they were still unwell, they wanted more treatment, and their PACE trial doctor agreed this was appropriate. The choice of treatment offered (APT, CBT, or GET) was made by the patient’s doctor, taking into account both the patient’s preference and their own opinion of which would be most beneficial. These choices were made with knowledge of the individual patient’s treatment allocation and outcome, but before the overall trial findings were known. Interventions were based on the trial manuals, but could be adapted to the patient’s needs.

Readers who are methodologically inclined might be interested in a paper in which I discuss incorporating patient preference in randomized trials, as well as another paper describing clinical trial conducted with German colleagues  in which we incorporated patient preference in evaluation of antidepressants and psychotherapy for depression in primary care. Patient preference can certainly be accommodated in a clinical trial in ways that preserve the benefits of randomization, but not as the PACE investigators have done.

Following completion of the treatment to which particular patients were randomly assigned, the PACE trial offered a complex negotiation between patient and trial physician about further treatment. This represents a thorough breakdown of the benefits of a controlled randomized trial for the evaluation of treatments. Any focus on the long-term effects of initial randomization is sacrificed by what could be substantial departures from that randomization. Any attempts at statistical corrections will fail.

Of course, investigators cannot ethically prevent research participants from seeking additional treatment. But in the case of PACE, the investigators encouraged departures from the randomized treatment yet did not adequately take into account the decisions that were made. An alternative would have been to continue with the randomized treatment, taking into account and quantifying any cross over into another treatment arm.

2When Cherished Beliefs Clash with EvidenceVoodoo statistics in dealing with incomplete follow-up data.

Between May 8, 2008, and April 26, 2011, 481 (75%) participants from the PACE trial returned questionnaires.

This is a very good rate of retention of participants for follow-up. The serious problem is that neither

  • loss to follow-up nor
  • whether there was further treatment, nor
  • whether there was cross over in the treatment received in follow-up versus the actual trial

is random.

Furthermore, any follow-up data is biased by the exhortation of the newsletter.

No statistical controls can restore the quality of the follow-up data to what would’ve been obtained with preservation of the initial randomization. Nothing can correct for the exhortation.

Nonetheless, the investigators tried to correct for loss of participants to follow-up and subsequent treatment. They described their effort in a technically complex passage, which I will subsequently interpret:

We assessed the differences in the measured outcomes between the original randomised treatment groups with linear mixed-effects regression models with the 12, 24, and 52 week, and long-term follow-up measures of outcomes as dependent variables and random intercepts and slopes over time to account for repeated measures.

We included the following covariates in the models: treatment group, trial stratification variables (trial centre and whether participants met the international chronic fatigue syndrome criteria,3 London myalgic encephalomyelitis criteria,4 and DSM IV depressive disorder criteria),18,19 time from original trial randomisation, time by treatment group interaction term, long-term follow-up data by treatment group interaction term, baseline values of the outcome, and missing data predictors (sex, education level, body-mass index, and patient self-help organisation membership), so the differences between groups obtained were adjusted for these variables.

Nearly half (44%; 210 of 479) of all the follow-up study participants reported receiving additional trial treatments after their final 1 year outcome assessment (table 2; appendix p 2). The number of participants who received additional therapy differed between the original treatment groups, with more participants who were originally assigned to SMC alone (73 [63%] of 115) or to APT (60 [50%] of 119) receiving additional therapy than those assigned to GET (41 [32%] of 127) or CBT (36 [31%] of 118; p<0·0001).

In the trial analysis plan we defined an adequate number of therapy sessions as ten of a maximum possible of 15. Although many participants in the follow-up study had received additional treatment, few reported receiving this amount (table 2). Most of the additional treatment that was delivered to this level was either CBT or GET.

The “linear mixed-effects regression models” are rather standard techniques for compensating for missing data by using all of the available data to estimate what is missing. The problem is that this approach assumes that any missing data are random, which is an untested assumption that is unlikely to be true in this study.

3aWhen Cherished Beliefs Clash with Evidence-page-0The inclusion of “covariates” is an effort to control for possible threats to the validity of the overall analyses by taking into account what is known about participants. There are numerous problems here. We can’t be assured that the results are any more robust and reliable than what would be obtained without these efforts at statistical control. The best publishing practice is to make the unadjusted outcome variables available and let readers decide. Greatest confidence in results is obtained when there is no difference between the results in the adjusted and unadjusted analyses.

Methodologically inclined readers should consult an excellent recent article by clinical trial expert, Helene Kraemer, A Source of False Findings in Published Research Studies Adjusting for Covariates.

The effectiveness of statistical controls depends on certain assumptions being met about patterns of variation within the control variables. There is no indication that any diagnostic analyses were done to determine whether possible candidate control variables should be eliminated in order to avoid a violation of assumptions about the multivariate distribution of covariates. With so many control variables, spurious results are likely. Apparent results could change radically with the arbitrary addition or subtraction of control variables. See here for a further explanation of this problem.

We don’t even know how this set of covariate/control variables, rather than some other set, was established. Notoriously, investigators often try out various combinations of control variables and present only those that make their trial looked best. Readers are protected from this questionable research practice only with pre-specification of analyses before investigators know their results—and in an unblinded trial, researchers often know the result trends long before they see the actual numbers.

See JP Simmons’  hilarious demonstration that briefly listening to the Beatles’ “When I’m 64” can be leave research participants a year and a half older younger than listening to “Kalimba” – at least when investigators have free reign to manipulate the results they want in an study without pre-registration of analytic plans.

Finally, the efficacy of complex statistical controls is widely overestimated and depends on unrealistic assumptions. First, it is assumed that all relevant variables that need to be controlled have been identified. Second, even when this unrealistic assumption has been met, it is assumed that all statistical control variables have been measured without error. When that is not the case, results can appear significant when they actually are not. See a classic paper by Andrew Phillips and George Davey Smith for further explanation of the problem of measurement error producing spurious findings.

What the investigators claim the study shows

In an intact clinical trial, investigators can analyze outcome data with and without adjustments and readers can decide which to emphasize. However, this is far from an intact clinical trial and these results are not interpretable.

The investigators nonetheless make the following claims in addition to what was said in the summary/abstract.

In the results the investigators state

The improvements in fatigue and physical functioning reported by participants allocated to CBT or GET at their 1 year trial outcome assessment were sustained.

This was followed by

The improvements in impairment in daily activities and in perceived change in overall health seen at 1 year with these treatments were also sustained for those who received GET and CBT (appendix p 4). Participants originally allocated to APT reported further improvements in fatigue, physical functioning, and impairment in daily activities from the 1 year trial outcome assessment to long-term follow-up, as did those allocated to SMC alone (who also reported further improvements in perceived change in overall health; figure 2; table 3; appendix p 4).

If the investigators are taking their RCT design seriously, they should give precedence to the null findings for group differences at follow-up. They should not be emphasizing the sustaining of benefits within the GET and CBT groups.

The investigators increase their positive spin on the trial in the opening sentence of the Discussion

The main finding of this long-term follow-up study of the PACE trial participants is that the beneficial effects of the rehabilitative CBT and GET therapies on fatigue and physical functioning observed at the final 1 year outcome of the trial were maintained at long-term follow-up 2·5 years from randomisation.

This is incorrect. The main finding   is that any reported advantages of CBT and GET at the end of the trial were lost by long-term follow up. Because an RCT is designed to focus on between group differences, the statement about sustaining of benefits is post-hoc.

The Discussion further states

In so far as the need to seek additional treatment is a marker of continuing illness, these findings support the superiority of CBT and GET as treatments for chronic fatigue syndrome.

This makes unwarranted and self-serving assumptions that treatment choice was mainly driven by the need for further treatment, when decision-making was contaminated by investigative preference, as stated in the newsletter. Note also that CBT is a novel treatment for research participants and more likely to be chosen on the basis of novelty alone in the face of overall modest improvement rates for the trial and lack of improvements in objective measures. Whether or not the investigators designate a limited range of self-report measures as primary, participant decision-making may be driven by other, more objective measures.

Regardless, investigators have yet to present any data concerning how decisions for further treatment were made, if such data exist.

The investigators further congratulate themselves with

There was some evidence from an exploratory analysis that improvement after the 1 year trial final outcome was not associated with receipt of additional treatment with CBT or GET, given according to need. However this finding must be interpreted with caution because it was a post-hoc subgroup analysis that does not allow the separation of patient and treatment factors that random allocation provides.

However, why is this analysis singled out has exploratory and to be interpreted with caution because it is a post-hoc subgroup analysis when similarly post-hoc subgroup analyses are recommended without such caution?

The investigators finally get around to depicting what should be their primary finding, but do so in a dismissive fashion.

Between the original groups, few differences in outcomes were seen at long-term follow-up. This convergence in outcomes reflects the observed improvement in those originally allocated to SMC and APT, the possible reasons for which are listed above.

The discussion then discloses a limitation of the study that should have informed earlier presentation and discussion of results

First, participant response was incomplete; some outcome data were missing. If these data were not missing at random it could have led to either overestimates or underestimates of the actual differences between the groups.

This minimizes the implausibility of the assumption of random missing variables, as well as the problems introduced by the complex attempts to control confounds statistically.

And then there is an unsubstantiated statement that is sure to upset persons who suffer from CFS and those who care for them.

the outcomes were all self-rated, although these are arguably the most pertinent measures in a condition that is defined by symptoms.

I could double the length of this already lengthy blog post if I fully discussed this. But let me raise a few issues.

  1. The self-report measures do not necessarily capture subjective experience, only forced choice responses to a limited set of statements.
  2. One of the two outcome measures, the physical health scale of the SF-36  requires forced choice responses to a limited set of statements selected for general utility across all mental and physical conditions. Despite its wide use, the SF-36 suffers from problems in internal consistency and confounding with mental health variables. Anyone inclined to get excited about it should examine  its items and response options closely. Ask yourself, do differences in scores reliably capture clinically and personally significant changes in the experience and functioning associated with the full range of symptoms of CHF?
  3. The validity other primary outcome measure, the Chalder Fatigue Scale depends heavily on research conducted by this investigator group and has inadequate validation of its sensitivity to change in objective measures of functioning.
  4. Such self-report measures are inexorably confounded with morale and nonspecific mental health symptoms with large, unwanted correlation tendency to endorse negative self-statements that is not necessarily correlated with objective measures.

Although it was a long time ago, I recall well my first meeting with Professor Simon Wessely. It was at a closed retreat sponsored by NIH to develop a consensus about the assessment of fatigue by self-report questionnaire. I listened to a lot of nonsense that was not well thought out. Then, I presented slides demonstrating a history of failed attempts to distinguish somatic complaints from mental health symptoms by self-report. Much later, this would become my “Stalking bears, finding bear scat in the woods” slide show.

you can't see itBut then Professor Wessely arrived at the meeting late, claiming to be grumbly because of jet lag and flight delays. Without slides and with devastating humor, he upstaged me in completing the demolition of any illusions that we could create more refined self-report measures of fatigue.

I wonder what he would say now.

But alas, people who suffer from CFS have to contend with a lot more than fatigue. Just ask them.

borg max[To be continued later if there is interest in my doing so. If there is, I will discuss the disappearance of objective measures of functioning from the PACE study and you will find out why you should find some 3-D glasses if you are going to search for reports of these outcomes.]