Did a placebo affect allergic reactions to a pin prick or only in the authors’ minds?

Can placebo effects be harnessed to improve treatment outcomes? Stories of a placebo changing bodily function are important in promoting mind-body medicine, but mostly turn out to be false positives. Was this one an exception?

mind the brain logoCan placebo effects be harnessed to improve treatment outcomes? Stories of a placebo changing bodily function are important in promoting mind-body medicine, but mostly turn out to be false positives. Was this one an exception?

A lesson in critical appraisal: How to screen complicated studies in order to decide whether to put the time and energy into a closer look.

The study:

Howe LC, Goyer JP, Crum AJ. Harnessing the Placebo Effect: Exploring the Influence of Physician Characteristics on Placebo Response. Health Psychology Vol 36(11), Nov 2017, 1074-1082 http://dx.doi.org/10.1037/hea0000499

From the Abstract:

After inducing an allergic reaction in participants through a histamine skin prick test, a health care provider administered a cream with no active ingredients and set either positive expectations (cream will reduce reaction) or negative expectations (cream will increase reaction).

The provider demonstrated either high or low warmth, or either high or low competence.

Results: The impact of expectations on allergic response was enhanced when the provider acted both warmer and more competent and negated when the provider acted colder and less competent.

Conclusion: This study suggests that placebo effects should be construed not as a nuisance variable with mysterious impact but instead as a psychological phenomenon that can be understood and harnessed to improve treatment outcomes.

Why I dismissed this study

bigger skin prickThe small sample size was set in a power analysis based on the authors hopes of finding a moderate effect size, not any existing results. With only 20 participants per cell, most significant findings are likely to be false positives.

The authors had a complicated design with multiple manipulations and  time points, They examined 2 physiological measures, but only reported results for one of them in the paper, the one with stronger results.

The authors did not report a key overall test of whether there was a significant main or interaction effect. Without such a finding, jumping down to significant comparisons between groups is likely to a false positive.

The authors did not adjust for multiple comparisons, despite doing a huge number.

The authors did not report raw mean differences for comparisons, only differences at two time points controlling for gender, race, and the first two time points. No rationale is given.

The authors used language like ‘marginally significant, and ‘different, but not significantly so,’ which might suggest they were chasing and selectively reporting significant findings.

The phenomena under study was mild allergic reaction in the short term:  three time points,  9-15 minutes, with data for 2 earlier time points not reported as outcomes. It is unclear the mechanism by which an experimental manipulation could have an observable effect on such a mild reaction in such a short period of time.


Claims of placebo effects figures heavily in discussions of the power of the mind over the body. Yet, this power is greatly exaggerated by lay persons and in the lay press and social media. Effects of a placebo manipulation on objective physiological measures, as opposed to subjective self-report measures are uncommon and usually turn out to be false positives.

A New England Journal of Medicine review  of 130 clinical trials found

Little evidence in general that placebos had powerful clinical effects. Although placebos had no significant effects on objective or binary outcomes, they had possible small benefits in studies with continuous subjective outcomes and for the treatment of pain. Outside the setting of clinical trials, there is no justification for the use of placebos.

I often cite another great NEJM study  showing the sharp contrast in positive results obtained subjective self-report versus negative results with objective physical functioning measures.

That is probably the case with a recent report of effects of expectancies and interpersonal relationship on a mild allergic reaction induced by a histamine skin prick test (SPT). The study involved manipulation of the perceived warmth and competence of a provider, as well as whether research participants were told that an inert cream being applied would have a positive or negative effect.

The authors invoke in claiming support that psychological variables do indeed influence a mild allergic reaction. Examining all of the numerous pairwise comparisons,  would be a long and tedious task. However, I decided from some details of the design and analysis of the study, I would not proceed.

Some notable features of the study.

The key manipulations of high versus low warmth and high versus low competence were in the behavior of a single unblinded experimenter.

The design is described as 2x2x2 with a cell size of n= 20 (19 in one cell).

It is more properly described as 2x2x2x(5) because of the 5 time points after the provider administeried the skin prick:

(T1 = 3 min post-SPT, T2 = 6 mi  post-SPT and cream administered directly afterward, T3 = 9 min post-SPTand 3 min post-cream,T4 = 12 min post-SPT and 6 min post-cream, T5 =15 min post-SPT and 9 min. post-cream).

The small number of participants per cell was set in a power analysis based on hope a moderate effect size could be shown, not on past results.

The physiological reaction was measured in terms of size of a wheal (raised bump) and size of the flare (redness surrounding the bump).

Numerous other physiological measures were obtained, including blood pressure and pre-post session saliva samples. It is not stated what was done with these data, but they could have been used to evaluate further the manipulation of experimenter behavior.

No simple correlation between participants’ perceptions of warm and competence are reported, which would have been helpful in interpreting the 2×2 crossing of warmth and competence.

In the supplementary materials, readers are told ratings of itchiness and mood were obtained after the skin prick. No effects of the experimental manipulation were observed, which would seem not to support the effectiveness of the intervention.

No overall ANOVA or test for significance of interactions is presented.

Instead, numerous paired comparisons are presented without correction for post hoc multiplicity.

Further comparisons were conducted with a sample that was constructed post hoc:

To better understand the mechanism by which expectations differed, within a setting of high warmth and high competence, we compared the wheal and flare size for the positive and negative expectations conditions to a follow-up sample who received neutral expectations. This resulted in a total sample of N=62.

Differences arising using this sample were discussed, despite significance levels being p = .095 and p =  .155.

Raw mean scores are not presented nor discussed. Instead, all comparisons controlled for gender and race and size of the wheal at Times 1 and 2,

Only the size of the wheal is reported in the body of the paper, but it was reported

The results on the flare of the reaction were mostly similar (see the supplemental material available online).

Actually, the results reported in the supplemental material were considerably weaker, with claims of differences being marginally significant and favoring results that were only significant at particular time points.

So, what do you think? If you are interested, take a look at the study and let me know if I was premature to dismiss it.

Preorders are being accepted for e-books providing skeptical lookseBook_PositivePsychology_345x550 at mindfulness and positive psychology, and arming citizen scientists with critical thinking skills. Right now there is a special offer for free access to a Mindfulness Master Class. But hurry, it won’t last.

I will also be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. I want to give researchers the tools to get into the journals where their work will get the attention it deserves.

Sign up at my website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

Hazards of pointing out bad meta-analyses of psychological interventions


A cautionary tale

Psychology has a meta-analysis problem. And that’s contributing to its reproducibility problem. Meta-analyses are wallpapering over many research weaknesses, instead of being used to systematically pinpoint them. – Hilda Bastian

  • Meta-analyses of psychological interventions are often unreliable because they depend on a small number of poor quality, underpowered studies.
  • It is surprisingly easy to screen the studies being assembled for a meta-analysis and quickly determine that the literature is not suitable because it does not have enough quality studies. Apparently, the authors of many published meta-analyses did not undertake such a brief assessment or were undeterred by it from proceeding anyway.
  • We can’t tell how many efforts at meta-analyses were abandoned because of the insufficiencies of the available literature. But we can readily see that many published meta-analyses offer summary effect sizes for interventions that can’t be expected to be valid or generalizable.
  • We are left with a glut of meta-analyses of psychological interventions that convey inflated estimates of the efficacy of interventions and on this basis, make unwarranted recommendations that broad classes of interventions are ready for dissemination.
  • Professional organizations and promoters of particular treatments have strong vested interests in portraying their psychological interventions as effective. They will use their resources to resist efforts to publish critiques of their published meta-analyses and even fight the teaching of basic critical skills for appraising meta-analysis.
  • Publication of thorough critiques has little or no impact on the subsequent citation or influence of meta-analyses. Furthermore, such critiques are largely ignored.
  • Debunking bad meta-analyses of psychological interventions can be frustrating at best, and, at worst, hazardous to careers.
  • You should engage in such activities if you feel it is right to do so. It will be a valuable learning experience. And you can only hope that someone at some point will take notice.

3 Simple screening questions to decide whether a meta analysis is worth delving into.

I’m sick and tired of spending time trying to make sense of meta-analyses of psychological interventions that should have been dismissed out of hand. The likelihood of any contribution to the literature was ruled out by repeated, gross misapplication of meta-analysis by some authors  or, more often, the pathetic quality and quantity of literature available for meta-analysis.

Just recently, Retraction Watch reported the careful scrutiny of a pair of meta-analyses by two psychology graduate students, Paul-Christian Bürkner and Donald Williams. Coverage in Retraction Watch focused on their inability to get credit for the retraction of one of the papers that had occurred because of their critique.

But I was more saddened by their having spent so much time on the second meta-analysis, “A meta-analysis and theoretical critique of oxytocin and psychosis: Prospects for attachment and compassion in promoting recovery, The authors of this meta-analysis  had themselves acknowledged the literature was quite deficient, but proceeded anyway and published a paper that has already been cited 13 times.

The graduate students, as well as the original authors could simply have taken a quick look at the study’s Table 1: the seven included studies had from 9 to 35 patients exposed to oxytocin.  The study  with 35 patients was an outlier. This study also provided only a within-subject effect size, which should not have been entered into the meta-analysis with the results of the other studies.

The six remaining studies had an average sample size of 14 in the intervention group. I doubt that anyone would have undertaken a study of psychotic patients inhaling oxytocin to generate a robust estimate of effect size with only 9, 10, or 11 patients. It’s unclear why the original investigators stopped accruing patients when they did.

Without having specified their sample size ahead of time (there is no evidence that the investigators did), original investigators could simply have stopped when a peek at the data revealed statistically significant findings or they could have kept accruing patients when a peek revealed only nonsignificant findings. Or they could have dropped some patients. Regardless, the reported samples are so small that adding only one or two more patients could substantially change the results.

Furthermore, if the investigators were struggling to get enough patients, the study was probably under-resourced and compromised in other ways. Small sample sizes compound the problems posed by poor methodology and reporting. The authors conducting this particular meta-analysis could only confirm for one of the studies that data from all patients who were randomized were analyzed, i.e., that there was intention to treat analyses. Reporting was that bad, and worse. Again, think of the effects of the loss of data from the analysis of one or a few patients- it could be decisive for results –  particularly when the loss was not random.

Overall, the authors of the original meta-analysis conceded that the seven studies they were entering into the meta-analyses had a high risk of bias.

It should be apparent that authors cannot take a set of similarly flawed studies and integrate their effect sizes with a meta-analysis and expect to get around the limitations. Bottom line – readers should just dismiss the meta-analysis and get on to other things…

These well-meaning graduate students were wasting their time and talent carefully scrutinizing a pair of meta-analyses that were unworthy of their sustained attention. Think of what they could be doing more usefully. There is so much other bad science out there to uncover.

Everybody – I recommend not putting a lot of effort into analyzing obviously flawed meta-analysis, other than maybe posting a warning notice on PubMed Commons  or ranting in a blog post or both.

Detecting Bad Meta Analyses

Over a decade ago, I developed some quick assessment tools by which I can reliably determine that some meta-analyses are not worth our attention. You can see more about the quickly answered questions here.

To start such an assessment, directly to the table describing studies that were included in a published meta-analysis.

  1. Ask: “To what extent are the studies dominated by cell sample sizes less than 35?” Studies of this size have only a power of .50 to detect a moderate size effect. So, even if an effect were present, it would only be detected 50% of the time of all studies were being reported.
  2. Next, check to see whether whoever did the meta-analysis rated the included studies for risk of bias and how, if at all, risk of bias was taken into account in the meta-analyses.
  3. Finally, does the meta analysis adequately deal with clinical heterogeneity of included studies? Is there a basis for giving a meaningful interpretation to a single summary effect size?

Combining studies may be inappropriate for a variety of the following reasons: differences in patient eligibility criteria in the included trials, different interventions and outcomes, and other methodological differences or missing information.  Moher et al., 1998

I have found this quick exercise often reveals that meta-analyses of psychological interventions are dominated by underpowered studies of low methodological quality that produce positive effects for interventions at a greater rate than would be expected. There is little reason to proceed to calculate a summary effect size.

Pothole-FinalThe potholed road from a presentation to a publication.

My colleagues and I applied these criteria in a 2008 presentation to a packed audience at the European Health Psychology Conference in Bath. My focus was Undertook a similar exercise with four meta-analyses of behavioral interventions for adults (Dixon, Keefe, Scipio, Perri, & Abernethy, 2007; Hoffman, Papas, Chatkoff, & Kerns, 2007 ; Irwin, Cole, & Nicassio, 2006; and Jacobsen, Donovan, Vadaparampil, & Small, 2007) that appeared in a new section of Health Psychology, Evidence Based Treatment Reviews.

A sampling of what we found::

Irwin et al. The Irwin et al meta analysis had the stated objective of

comparing responses in studies that exclusively enrolled persons who were 55 years of age or older versus outcomes in randomized controlled trials that enrolled adults who were, on average, younger than 55 years of age(p. 4).

A quick assessment revealed exclusion of small trials (n < 35) would have eliminated all studies of older adults; five studies included 15 or fewer participants per condition. For the studies including younger adults, only one of the 15 studies would have remained.

Hoffman et al. We found that 17 of the 22 included fell below n = 35 per group. Response to our request, the authors graciously shared a table of the methodological quality of the included studies.

Intervention and control groups were not comparable In 60% of the studies on key variables at baseline.

Less than half provided adequate information concerning number of patients enrolled, treatment drop-out and reasons for drop-outs.

Only 15% of trials provided intent-to-treat analyses.

In a number of studies, the psychological intervention was part of the multicomponent package so that its unique contribution could not be determined. Often the psychological intervention was minimal. For instance, one study noted: “a lecture to give the patient an understanding that ordinary physical activity would not harm the disk and a recommendation to use the back and bend it.”

The only studies comparing a psychological intervention to an active control condition consisted of three underpowered studies into in which effects of the psychological component cannot be separated from the rest of the package in which it was embedded. In one of the studies, massage was the psychological intervention, but in another, it was the control group.

Nonetheless,  Hoffman et al. concluded ““The robust nature of these findings should encourage confidence among clinicians and researchers alike.”

As I readily demolished the meta-analyses  to the delight of the audience, I remarked something to the effect that I’m glad the editor of Health Psychology is not here to hear what I am saying about articles published in the journal he edits.

But Robert Kaplan was there. He invited me for a beer as I left the symposium. He said that such critical probing was sorely lacking in the journal. He invited that my colleagues and I submit an invited article. Eventually it would be published as:

Coyne JC, Thombs BD, Hagedoorn M. Ain’t necessarily so: Review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology. 2010 Mar;29(2):107.

However, Kaplan first had an Associate Editor send out the manuscript for review. The manuscriptwas rejected  based on a pair of reviews that were not particularly informative . One reviewer stated:

The authors level very serious accusations against fellow scientists and claim to have identified significant shortcomings in their published work. When this is done in public, the authors must have done their homework, dotted all the i’s, and crossed all the t’s. Instead, they reveal “we do not redo these meta-analyses or offer a comprehensive critique, but provide a preliminary evaluation of the adequacy of the conduct, reporting and clinical recommendations of these meta-analyses”. To be frank, this is just not enough when one accuses colleagues of mistakes, poor judgment, false inferences, incompetence, and perhaps worse.

In what he would later describe as the only time he did this in his term as editor of Health Psychology, Bob Kaplan overruled the unanimous recommendations of his associate editor and the two reviewers. He accepted a revision of our manuscript in which we try to be clearer about the bases of our judgments.

According to Google Scholar, our “Ain’t necessarily so…” has been cited 53 times. Apparently it had little effect on the reception of the four meta-analyses. Hoffman et al. has been cited 599 times.

From a well-received workshop to a workshop canceled in order to celebrate a bad meta-analysis.

Mariet Hagedorn and I gave a well-received workshop at the annual meeting of The Society for Behavioral Medicine the next year. A member of SBM’s Evidence-based Behavioral Medicine Committee invited us to their committee meeting held immediately after the workshop. We were invited to give the workshop again in two years. I also became a member of the committee. I offered to be involved in future meta-analyses, learning that a number were planned.

I actually thought that I was involved in a meta-analysis of interventions for depressive symptoms among cancer patients. I immediately identified a study of problem-solving therapy for cancer patients that had such improbably large effect sizes that should be excluded from any meta-analysis as an extreme outlier. The suggestion was appreciated.

But I heard nothing further about the meta-analyses and to I was contacted by one of the authors who said that my permission was needed to be acknowledged in the accepted manuscript. I refused. When I finally saw the published version of the manuscript in the prestigious Journal of the National Cancer Institute, I published a scathing critique, which you can read here. My critique has so far been cited once, the meta-analysis in eighty times.

Only a couple of months before our workshop had been scheduled to occur I was told it was canceled in order to clear the schedule for full press coverage of a new meta-analysis. I only learned of this when I emailed the committee concerning the specific timing of the workshop.  The reply came from the first author of the new meta-analysis.

I have subsequently made the case that that meta-analysis was horribly done and horribly misleading of consumers in two blog posts:

Faux Evidence-Based Behavioral Medicine at Its Worst (Part I)

Faux Evidence-Based Behavioral Medicine Part 2

Some highlights:

The authors boasted of “robust findings” of “substantial rigor” in a meta-analysis that provided “strong evidence for psychosocial pain management approaches.” They claimed their findings supported the “systematic implementation” of these techniques.

The meta-analysis depended heavily on small trials. Of the 38 trials, 19 studies had less than 35 patients in the intervention or control group and so would be excluded with application of this criterion.

Some of the smaller trials were quite small. One had 7 patients receiving an education intervention;  another had 10 patients getting hypnosis; another, 15 patients getting education; another, 15 patients getting self hypnosis; and still another, 8 patients getting relaxation and eight patients getting CBT plus relaxation.

Two of what were by far the largest trials should have been excluded because they involved complex intervention. Patients received telephone-based collaborative care, which had a number of components, including support for adherence to medication.

It appears that listening to music, being hypnotized during a medical procedure, and being taught self hypnosis over 52 sessions, are all under the rubric of skills training. Similarly, interactive educational sessions are considered equivalent to passing out informational materials and simply pamphleteering.

But here’s what most annoyed me about clinical and policy decisions being made on the basis of this meta-analysis:

Perhaps most importantly from a cancer pain control perspective, there was no distinguishing of whether the cancer pain was procedural, acute, or chronic. These types of pain take very different management strategies. In preparation for surgery or radiation treatment, it might be appropriate to relax or hypnotize the patient or provide soothing music. The efficacy could be examined in a randomized trial. But the management of acute pain is quite different and best achieved with medication. Here is where the key gap exists between the known efficacy of medication and the poor control in the community, due to professional and particularly patient attitudes. Control of chronic pain, months after any painful procedures, is a whole different matter, and based on studies of noncancer pain, I would guess that here is another place for psychosocial intervention, but that should be established in randomized trials.

shushedGetting shushed about the sad state of couples interventions for cancer patients research

One of the psychologists present at the SBM meeting published a meta-analysis of couples interventions   in which I was thanked for my input in an acknowledgment. I did not give permission and this notice was subsequently retracted.

Ioana Cristea and Nilufer Kafescioglu and I subsequently submitted a critique to Psycho-Oncology. We were initially told it would be accepted as a letter to the editor, but then it was subject to an extraordinary six uninformative reviews and rejected. The article that we critiqued was given special status as a featured article and distributed free by the otherwise pay walled journal.

A version of our critique was relegated to a blog post.

The complicated politics of meta-analyses supported by professional organizations.

Starting with our “Ain’t necessarily so..” effort, we were taking aim at meta-analyses making broad, enthusiastic claims about the efficacy and readiness for dissemination of psychological interventions. Society for Behavioral Medicine was enjoying a substantial increase in membership, but like other associations dominated by psychologists, the new members were clinicians, not primarily academic researchers. SBM wanted to offer a branding of “evidence-based” to the psychological interventions for which the clinicians were seeking reimbursement. At the time, insurance companies were challenging that licensed psychologists would get reimbursed for psychological interventions that would not administered to patients with psychiatric diagnoses.

People involved with the governance of SBM at the time cannot help but be aware of an ugly side to the politics back then. A small amount of money had been given by NCI to support meta-analyses and it was quite a struggle to control its distribution. That the SBM-sponsored meta-analyses were oddly published in the APA journal, Health Psychology, rather than SBM’s Annals of Behavioral Medicine reflected the bid for presidency of APA’s Division of Health Psychology by someone who had been told that she could not run for president of SBM. But worse, there was a lot of money and undeclared conflicts of interest in play.

Someone originally involved in the meta-analysis of interventions for depressive symptoms among cancer patients had received a $10 million grant from Pfizer to develop a means of monitoring cancer surgeons’ inquiring about psychological distress and their offering of interventions. The idea (which was actually later mandated) was that cancer surgeons could not close their electronic records until they had indicated that they had asked the patient about psychological distress. If patient reported distress, the surgeons had to indicate what intervention was offered to the patient. Only then could they close the medical record. Of course, these requirements could be met simply by asking if a breast cancer patient was distressed and offering her antidepressant without any formal diagnosis or follow-up. These procedures were mandated as part of accreditation of facilities providing cancer care.

Psycho-Oncology, the journal with which we skirmished about the meta-analysis of couples interventions was the official publication of the International Psycho-Oncology Society, another organization dominated by commission seeking reimbursement for services to cancer patients.

You can’t always get what you want.

I nonetheless encourage others, particularly early career investigators, to take the tools that I offer. Please scrutinize meta-analyses that otherwise would have clinical and public policy recommendations attached to their findings. You may have trouble getting published, and you will be slowly disappointed if you expect to influence the reception of already published meta-analysis. You can always post your critiques at PubMed Commons.

You will learn important skills and the politics of trying to publish critiques of papers that are protected as having been “peer reviewed.” If enough of you do this and visibly complain about how ineffectual your efforts have been, we may finally overcome the incumbent advantage and protection from further criticism that goes with getting published.

And bloggers like myself and Hilda Bastian will recognize you and express appreciation.