Hazards of pointing out bad meta-analyses of psychological interventions

 

A cautionary tale

Psychology has a meta-analysis problem. And that’s contributing to its reproducibility problem. Meta-analyses are wallpapering over many research weaknesses, instead of being used to systematically pinpoint them. – Hilda Bastian

  • Meta-analyses of psychological interventions are often unreliable because they depend on a small number of poor quality, underpowered studies.
  • It is surprisingly easy to screen the studies being assembled for a meta-analysis and quickly determine that the literature is not suitable because it does not have enough quality studies. Apparently, the authors of many published meta-analyses did not undertake such a brief assessment or were undeterred by it from proceeding anyway.
  • We can’t tell how many efforts at meta-analyses were abandoned because of the insufficiencies of the available literature. But we can readily see that many published meta-analyses offer summary effect sizes for interventions that can’t be expected to be valid or generalizable.
  • We are left with a glut of meta-analyses of psychological interventions that convey inflated estimates of the efficacy of interventions and on this basis, make unwarranted recommendations that broad classes of interventions are ready for dissemination.
  • Professional organizations and promoters of particular treatments have strong vested interests in portraying their psychological interventions as effective. They will use their resources to resist efforts to publish critiques of their published meta-analyses and even fight the teaching of basic critical skills for appraising meta-analysis.
  • Publication of thorough critiques has little or no impact on the subsequent citation or influence of meta-analyses. Furthermore, such critiques are largely ignored.
  • Debunking bad meta-analyses of psychological interventions can be frustrating at best, and, at worst, hazardous to careers.
  • You should engage in such activities if you feel it is right to do so. It will be a valuable learning experience. And you can only hope that someone at some point will take notice.

3 Simple screening questions to decide whether a meta analysis is worth delving into.

I’m sick and tired of spending time trying to make sense of meta-analyses of psychological interventions that should have been dismissed out of hand. The likelihood of any contribution to the literature was ruled out by repeated, gross misapplication of meta-analysis by some authors  or, more often, the pathetic quality and quantity of literature available for meta-analysis.

Just recently, Retraction Watch reported the careful scrutiny of a pair of meta-analyses by two psychology graduate students, Paul-Christian Bürkner and Donald Williams. Coverage in Retraction Watch focused on their inability to get credit for the retraction of one of the papers that had occurred because of their critique.

But I was more saddened by their having spent so much time on the second meta-analysis, “A meta-analysis and theoretical critique of oxytocin and psychosis: Prospects for attachment and compassion in promoting recovery, The authors of this meta-analysis  had themselves acknowledged the literature was quite deficient, but proceeded anyway and published a paper that has already been cited 13 times.

The graduate students, as well as the original authors could simply have taken a quick look at the study’s Table 1: the seven included studies had from 9 to 35 patients exposed to oxytocin.  The study  with 35 patients was an outlier. This study also provided only a within-subject effect size, which should not have been entered into the meta-analysis with the results of the other studies.

The six remaining studies had an average sample size of 14 in the intervention group. I doubt that anyone would have undertaken a study of psychotic patients inhaling oxytocin to generate a robust estimate of effect size with only 9, 10, or 11 patients. It’s unclear why the original investigators stopped accruing patients when they did.

Without having specified their sample size ahead of time (there is no evidence that the investigators did), original investigators could simply have stopped when a peek at the data revealed statistically significant findings or they could have kept accruing patients when a peek revealed only nonsignificant findings. Or they could have dropped some patients. Regardless, the reported samples are so small that adding only one or two more patients could substantially change the results.

Furthermore, if the investigators were struggling to get enough patients, the study was probably under-resourced and compromised in other ways. Small sample sizes compound the problems posed by poor methodology and reporting. The authors conducting this particular meta-analysis could only confirm for one of the studies that data from all patients who were randomized were analyzed, i.e., that there was intention to treat analyses. Reporting was that bad, and worse. Again, think of the effects of the loss of data from the analysis of one or a few patients- it could be decisive for results –  particularly when the loss was not random.

Overall, the authors of the original meta-analysis conceded that the seven studies they were entering into the meta-analyses had a high risk of bias.

It should be apparent that authors cannot take a set of similarly flawed studies and integrate their effect sizes with a meta-analysis and expect to get around the limitations. Bottom line – readers should just dismiss the meta-analysis and get on to other things…

These well-meaning graduate students were wasting their time and talent carefully scrutinizing a pair of meta-analyses that were unworthy of their sustained attention. Think of what they could be doing more usefully. There is so much other bad science out there to uncover.

Everybody – I recommend not putting a lot of effort into analyzing obviously flawed meta-analysis, other than maybe posting a warning notice on PubMed Commons  or ranting in a blog post or both.

Detecting Bad Meta Analyses

Over a decade ago, I developed some quick assessment tools by which I can reliably determine that some meta-analyses are not worth our attention. You can see more about the quickly answered questions here.

To start such an assessment, directly to the table describing studies that were included in a published meta-analysis.

  1. Ask: “To what extent are the studies dominated by cell sample sizes less than 35?” Studies of this size have only a power of .50 to detect a moderate size effect. So, even if an effect were present, it would only be detected 50% of the time of all studies were being reported.
  2. Next, check to see whether whoever did the meta-analysis rated the included studies for risk of bias and how, if at all, risk of bias was taken into account in the meta-analyses.
  3. Finally, does the meta analysis adequately deal with clinical heterogeneity of included studies? Is there a basis for giving a meaningful interpretation to a single summary effect size?

Combining studies may be inappropriate for a variety of the following reasons: differences in patient eligibility criteria in the included trials, different interventions and outcomes, and other methodological differences or missing information.  Moher et al., 1998

I have found this quick exercise often reveals that meta-analyses of psychological interventions are dominated by underpowered studies of low methodological quality that produce positive effects for interventions at a greater rate than would be expected. There is little reason to proceed to calculate a summary effect size.

Pothole-FinalThe potholed road from a presentation to a publication.

My colleagues and I applied these criteria in a 2008 presentation to a packed audience at the European Health Psychology Conference in Bath. My focus was Undertook a similar exercise with four meta-analyses of behavioral interventions for adults (Dixon, Keefe, Scipio, Perri, & Abernethy, 2007; Hoffman, Papas, Chatkoff, & Kerns, 2007 ; Irwin, Cole, & Nicassio, 2006; and Jacobsen, Donovan, Vadaparampil, & Small, 2007) that appeared in a new section of Health Psychology, Evidence Based Treatment Reviews.

A sampling of what we found::

Irwin et al. The Irwin et al meta analysis had the stated objective of

comparing responses in studies that exclusively enrolled persons who were 55 years of age or older versus outcomes in randomized controlled trials that enrolled adults who were, on average, younger than 55 years of age(p. 4).

A quick assessment revealed exclusion of small trials (n < 35) would have eliminated all studies of older adults; five studies included 15 or fewer participants per condition. For the studies including younger adults, only one of the 15 studies would have remained.

Hoffman et al. We found that 17 of the 22 included fell below n = 35 per group. Response to our request, the authors graciously shared a table of the methodological quality of the included studies.

Intervention and control groups were not comparable In 60% of the studies on key variables at baseline.

Less than half provided adequate information concerning number of patients enrolled, treatment drop-out and reasons for drop-outs.

Only 15% of trials provided intent-to-treat analyses.

In a number of studies, the psychological intervention was part of the multicomponent package so that its unique contribution could not be determined. Often the psychological intervention was minimal. For instance, one study noted: “a lecture to give the patient an understanding that ordinary physical activity would not harm the disk and a recommendation to use the back and bend it.”

The only studies comparing a psychological intervention to an active control condition consisted of three underpowered studies into in which effects of the psychological component cannot be separated from the rest of the package in which it was embedded. In one of the studies, massage was the psychological intervention, but in another, it was the control group.

Nonetheless,  Hoffman et al. concluded ““The robust nature of these findings should encourage confidence among clinicians and researchers alike.”

As I readily demolished the meta-analyses  to the delight of the audience, I remarked something to the effect that I’m glad the editor of Health Psychology is not here to hear what I am saying about articles published in the journal he edits.

But Robert Kaplan was there. He invited me for a beer as I left the symposium. He said that such critical probing was sorely lacking in the journal. He invited that my colleagues and I submit an invited article. Eventually it would be published as:

Coyne JC, Thombs BD, Hagedoorn M. Ain’t necessarily so: Review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology. 2010 Mar;29(2):107.

However, Kaplan first had an Associate Editor send out the manuscript for review. The manuscript was rejected  based on a pair of reviews that were not particularly informative . One reviewer stated:

The authors level very serious accusations against fellow scientists and claim to have identified significant shortcomings in their published work. When this is done in public, the authors must have done their homework, dotted all the i’s, and crossed all the t’s. Instead, they reveal “we do not redo these meta-analyses or offer a comprehensive critique, but provide a preliminary evaluation of the adequacy of the conduct, reporting and clinical recommendations of these meta-analyses”. To be frank, this is just not enough when one accuses colleagues of mistakes, poor judgment, false inferences, incompetence, and perhaps worse.

In what he would later describe as the only time he did this in his term as editor of Health Psychology, Bob Kaplan overruled the unanimous recommendations of his associate editor and the two reviewers. He accepted a revision of our manuscript in which we try to be clearer about the bases of our judgments.

According to Google Scholar, our “Ain’t necessarily so…” has been cited 53 times. Apparently it had little effect on the reception of the four meta-analyses. Hoffman et al. has been cited 599 times.

From a well-received workshop to a workshop canceled in order to celebrate a bad meta-analysis.

Mariet Hagedorn and I gave a well-received workshop at the annual meeting of The Society for Behavioral Medicine the next year. A member of SBM’s Evidence-based Behavioral Medicine Committee invited us to their committee meeting held immediately after the workshop. We were invited to give the workshop again in two years. I also became a member of the committee. I offered to be involved in future meta-analyses, learning that a number were planned.

I actually thought that I was involved in a meta-analysis of interventions for depressive symptoms among cancer patients. I immediately identified a study of problem-solving therapy for cancer patients that had such improbably large effect sizes that should be excluded from any meta-analysis as an extreme outlier. The suggestion was appreciated.

But I heard nothing further about the meta-analyses and to I was contacted by one of the authors who said that my permission was needed to be acknowledged in the accepted manuscript. I refused. When I finally saw the published version of the manuscript in the prestigious Journal of the National Cancer Institute, I published a scathing critique, which you can read here. My critique has so far been cited once, the meta-analysis in eighty times.

Only a couple of months before our workshop had been scheduled to occur I was told it was canceled in order to clear the schedule for full press coverage of a new meta-analysis. I only learned of this when I emailed the committee concerning the specific timing of the workshop.  The reply came from the first author of the new meta-analysis.

I have subsequently made the case that that meta-analysis was horribly done and horribly misleading of consumers in two blog posts:

Faux Evidence-Based Behavioral Medicine at Its Worst (Part I)

Faux Evidence-Based Behavioral Medicine Part 2

Some highlights:

The authors boasted of “robust findings” of “substantial rigor” in a meta-analysis that provided “strong evidence for psychosocial pain management approaches.” They claimed their findings supported the “systematic implementation” of these techniques.

The meta-analysis depended heavily on small trials. Of the 38 trials, 19 studies had less than 35 patients in the intervention or control group and so would be excluded with application of this criterion.

Some of the smaller trials were quite small. One had 7 patients receiving an education intervention;  another had 10 patients getting hypnosis; another, 15 patients getting education; another, 15 patients getting self hypnosis; and still another, 8 patients getting relaxation and eight patients getting CBT plus relaxation.

Two of what were by far the largest trials should have been excluded because they involved complex intervention. Patients received telephone-based collaborative care, which had a number of components, including support for adherence to medication.

It appears that listening to music, being hypnotized during a medical procedure, and being taught self hypnosis over 52 sessions, are all under the rubric of skills training. Similarly, interactive educational sessions are considered equivalent to passing out informational materials and simply pamphleteering.

But here’s what most annoyed me about clinical and policy decisions being made on the basis of this meta-analysis:

Perhaps most importantly from a cancer pain control perspective, there was no distinguishing of whether the cancer pain was procedural, acute, or chronic. These types of pain take very different management strategies. In preparation for surgery or radiation treatment, it might be appropriate to relax or hypnotize the patient or provide soothing music. The efficacy could be examined in a randomized trial. But the management of acute pain is quite different and best achieved with medication. Here is where the key gap exists between the known efficacy of medication and the poor control in the community, due to professional and particularly patient attitudes. Control of chronic pain, months after any painful procedures, is a whole different matter, and based on studies of noncancer pain, I would guess that here is another place for psychosocial intervention, but that should be established in randomized trials.

shushedGetting shushed about the sad state of couples interventions for cancer patients research

One of the psychologists present at the SBM meeting published a meta-analysis of couples interventions   in which I was thanked for my input in an acknowledgment. I did not give permission and this notice was subsequently retracted.

Ioana Cristea and Nilufer Kafescioglu and I subsequently submitted a critique to Psycho-Oncology. We were initially told it would be accepted as a letter to the editor, but then it was subject to an extraordinary six uninformative reviews and rejected. The article that we critiqued was given special status as a featured article and distributed free by the otherwise pay walled journal.

A version of our critique was relegated to a blog post.

The complicated politics of meta-analyses supported by professional organizations.

Starting with our “Ain’t necessarily so..” effort, we were taking aim at meta-analyses making broad, enthusiastic claims about the efficacy and readiness for dissemination of psychological interventions. Society for Behavioral Medicine was enjoying a substantial increase in membership, but like other associations dominated by psychologists, the new members were clinicians, not primarily academic researchers. SBM wanted to offer a branding of “evidence-based” to the psychological interventions for which the clinicians were seeking reimbursement. At the time, insurance companies were challenging that licensed psychologists would get reimbursed for psychological interventions that would not administered to patients with psychiatric diagnoses.

People involved with the governance of SBM at the time cannot help but be aware of an ugly side to the politics back then. A small amount of money had been given by NCI to support meta-analyses and it was quite a struggle to control its distribution. That the SBM-sponsored meta-analyses were oddly published in the APA journal, Health Psychology, rather than SBM’s Annals of Behavioral Medicine reflected the bid for presidency of APA’s Division of Health Psychology by someone who had been told that she could not run for president of SBM. But worse, there was a lot of money and undeclared conflicts of interest in play.

Someone originally involved in the meta-analysis of interventions for depressive symptoms among cancer patients had received a $10 million grant from Pfizer to develop a means of monitoring cancer surgeons’ inquiring about psychological distress and their offering of interventions. The idea (which was actually later mandated) was that cancer surgeons could not close their electronic records until they had indicated that they had asked the patient about psychological distress. If patient reported distress, the surgeons had to indicate what intervention was offered to the patient. Only then could they close the medical record. Of course, these requirements could be met simply by asking if a breast cancer patient was distressed and offering her antidepressant without any formal diagnosis or follow-up. These procedures were mandated as part of accreditation of facilities providing cancer care.

Psycho-Oncology, the journal with which we skirmished about the meta-analysis of couples interventions was the official publication of the International Psycho-Oncology Society, another organization dominated by commission seeking reimbursement for services to cancer patients.

You can’t always get what you want.

I nonetheless encourage others, particularly early career investigators, to take the tools that I offer. Please scrutinize meta-analyses that otherwise would have clinical and public policy recommendations attached to their findings. You may have trouble getting published, and you will be slowly disappointed if you expect to influence the reception of already published meta-analysis. You can always post your critiques at PubMed Commons.

You will learn important skills and the politics of trying to publish critiques of papers that are protected as having been “peer reviewed.” If enough of you do this and visibly complain about how ineffectual your efforts have been, we may finally overcome the incumbent advantage and protection from further criticism that goes with getting published.

And bloggers like myself and Hilda Bastian will recognize you and express appreciation.

 

 

Neurobalm: the pseudo-neuroscience of couples therapy

soothingsyrup1Special thanks to Professor Keith Laws, blogger at LawsDystopiaBlog and especially the pseudonymous Neurocritic for their helpful comments. But any excesses or inaccuracies are entirely my own responsibility.

 

You may be more able to debunk bad neuroscience than you think.

In my last blog post, I began critically examining whether emotionally focused couples therapy (EFT) could be said to sooth the brains of wives who had received it.

Claims were made in a peer-reviewed article available here and amplified in a University of Ottawa press release that EFT was a particularly potent form of couples therapy. An fMRI study supposedly demonstrated how EFT changed the way the brain encoded threatening situations.

True love creates resilience, turning off fear and pain in the brain

OTTAWA, May 1, 2014— New research led by Dr. Sue Johnson of the University of Ottawa’s School of Psychology confirms that those with a truly felt loving connection to their partner seem to be calmer, stronger and more resilient to stress and threat.

In the first part of the study, which was recently published in PLOS ONE, couples learned how to reach for their lover and ask for what they need in a “Hold Me Tight” conversation. They learned the secrets of emotional responsiveness and connection.

The second part of the study, summarized here, focused on how this also changed their brain. It compared the activation of the female partner’s brain when a signal was given that an electric shock was pending before and after the “Hold Me Tight” conversation.

The experiment explored three different conditions. In the first, the subject lay alone in a scanner knowing that when she saw a red X on a screen in front of her face there was a 20% chance she would receive a shock to her ankles. In the second, a male stranger held her hand throughout the same procedure. In the third, her partner held her hand. Subjects also pressed a screen after each shock to rate how painful they perceived it to be.

Before the “Hold Me Tight” conversation, even when the female partner was holding her mate’s hand, her brain became very activated by the threat of the shock — especially in areas such as the inferior frontal gyrus, anterior insula, frontal operculum and orbitofrontal cortex, where fear is controlled. These are all areas that process alarm responses. Subjects also rated the shock as painful under all conditions.

However, after the partners were guided through intense bonding conversations (a structured therapy titled Emotionally Focused Couple Therapy or EFT), the brain activation and reported level of pain changed —under one condition. While the shock was again described as painful in the alone and in the stranger hand holding conditions (albeit with some small change compared to before), the shock was described as merely uncomfortable when the husband offered his hand. Even more interesting, in the husband hand-holding condition, the subject’s brain remained calm with minimal activation in the face of threat.

These results support the effectiveness of EFT and its ability to shape secure bonding. The physiological effects are exactly what one would expect from more secure bonding. This study also adds to the evidence that attachment bonds and their soothing impact are a key part of adult romantic love. Results shed new light on other positive findings on secure attachment in adults, suggesting the mechanisms by which safe haven contact fosters more stability and less reactivity to threat.

You can find my succinct deconstruction of the press release here.

I invite you to carefully read the article or my last blog post and this one. This shouldhold me tight prepare you to detect some important signs this press release is utter nonsense, designed to mislead and falsely impress clinicians to whom EFT workshops and trainings are marketed. For instance, where in the procedures described in the PLOS One article is there any indication of the “Hold Me Tight” conversation? But that is just the start of the nonsense.

The PLOS One article ends with the claim that this “experiment” was conducted with a rigor comparable to a randomized clinical trial. Reading the article or these blog posts, you should also be able to see that this claim too is utter nonsense.

In my last blog post, I showed a lack of compelling evidence that EFT was better than any other couples treatment. To the extent to which EFT has been evaluated at all, the studies are quite small and all supervised by promoters of EFT. Couples in the EFT studies are recruited to be less martially dissatisfied than in other couples therapy research, and there is some evidence that improvement in marital functioning does not persist after therapy ends.

I called attention to the neuroscientist Neurocritic’s caution against expecting fMRI studies to reveal much about the process or effectiveness of psychotherapy that we do not know already.

Of course, we should expect some effects of psychotherapy to be apparent in pre-post therapy fMRI studies. But we should also expect the same of bowling or watching a TV series for equivalent amount of time. Are we really getting much more than what we what we can observe in couples’ behavior or what they report after therapy to what we can find with an fMRI? And without a comparison group, studies are not particularly revealing.

The larger problem looming in the background is authors intentionally or unintentionally intimidating readers with glib interpretations of neuroscience. Few readers feel confident in their ability to interpret such claims, especially the therapists to whom author Susan Johnson’s workshops are promoted.

This blog post could surprise you.

Maybe it will reassure you that you possess basic critical faculties with which you can debunk the journal article –if you are willing to commit the time and energy to reading and rereading it with skepticism.

I would settle, however, for leaving you thoroughly confused and skeptical about the claims in the PLOS One article. There are lots of things that do not make sense and that should be confusing if you think about them.

Confusion is a healthy reaction, particularly if the alternative is gullibility and being persuaded by pseudoscience.

I begin by ignoring that this was specifically an fMRI study.  Instead, I will look at some numbers and details of the study that you can readily discover. Maybe you would have had to look some things up on the Internet, but many of you could replicate my efforts.

In the text below, I have inserted some numbers in brackets. If you click on them, you will be taken to a secondary blog site where there are some further explanations.

The 23 wives for whom data were reported are in unrepresentative and highly select subsample of the 666 wives in couples expressing an interest in response to advertisements for the study.

With such a small number of participants–

  •  Including or excluding one or two participants can change results [1]. There is some evidence this could have occurred after initial results were known [2].
  • Any positive significant findings are likely to be false, and of necessity, significant findings will be large in magnitude, even when false positives [3].

The sample was restricted to couples experiencing only mild to moderate marital dissatisfaction. So, the study sample was less dissatisfied with their marriages, i.e.,  not comparable to those recruited by other research groups for couples intervention studies.

Given the selection procedure, it was impossible for the authors to obtain a sample of couples with the mean levels of marital dissatisfaction that they reported for baseline assessments.

They stated that they recruited couples with the criteria that their marital dissatisfactionyour sample sizes are small initially be between 80-96 on the DAS. They then report that initial mean DAS score was 81.2 (SD=14.0). Impossible. [4]

Yup, and this throws into doubt all the other results that are reported, especially when they find they need to explain results that did not occur as expected in differences between pre and post EFT fMRI, but only in a complex interaction between pre/post fMRI and initial DAS scores.

Couples therapy was continued until some vaguely defined clinical goal had been achieved.  None of the details were presented that one would expect a scientific paper for how it was decided that this was enough therapy.

We were not told who decided, by what criteria, or with what interrater reliability the judgments were made. We do know Susan Johnson, CEO of the nonprofit and profit-making companies promoting EFT supervised all therapy and the study.

Basically, Dr. Johnson was probably able to prolong the therapy and the follow-up fMRI assessment until she believed that the wives responses would make the therapy look good. And with no further follow-up, she implies that “how the brain processes threat” had been changed without any evidence that whether changes in fMRI persisted or were transient.

This might be fine for the pseudo-magic of a workshop presentation, but is unacceptable for a peer-reviewed article for which readers are supposed to be able to arrive at an independent judgment. And far removed from the experimental control of a clinical trial in which timing of follow up assessments are fixed.

Randomized clinical trials take this kind of control away from investigators and put it into the design and the phenomenon being studied so that maybe investigators can be proved incorrect.

The amount of therapy that these wives received (M= 22-9, range =13-35) was substantially more what was provided in past EFT outcome studies. Whatever therapeutic gains were observed in the sample could not be expected to generalize to past studies. [5]

Despite the therapy that they had received and despite the low levels of marital dissatisfaction with which they had begun, the average couple finishing the study still qualified for entering it. [6]

There is no explanation given why only wives data are presented. No theoretical or clinical rationale is given for not studying husbands or presenting their data as well [7]

A great deal is made of whether particular results are statistically significant or not. However, keep in mind that there was a very small sample size and the seemingly sharp distinction between significant and nonsignificant is arbitrary. Certainly, the size of most differences between results characterized as significant versus nonsignificant is not itself statistically significant. [8]

And, we will see, much is being made of small differences that did not occur for all wives, only those initially with the lowest marital satisfaction.

The number of statistical tests the conducted was many times number of women in the study. The authors do not indicate all the analyses they conducted and selectively reported a subset of the analyses conducted, but there was considerable room for capitalizing on chance.

cherrypickingMultiple statistical tests in  a small sample without adjustment for there being so many tests is a common complaint about small fMRI studies, but this study is a particularly bad example. Happy cherrypicking!

The article and Johnson’s promotional materials make much of differences that were observed from fMRI data collected before and after therapy. But the article never reports results for actually testing these differences.This is an important discovery. Let’s stop and explore it.

The article leads off its presentation of the fMRI results with

The omnibus test of EFT and handholding on all voxels activated in the original Coan et al. handholding study indicated a significant interaction between EFT, handholding and DAS, F (2, 72.6) = 3.6, p= .03 (Alone x EFT x DAS b= 10.3, SE =3.7; Stranger x EFT x DAS b = 2.5, SE =3.3).

What is oddly missing here is any test of the simple interaction between EFT (before versus after therapy) and handholding, i.e., EFT x handholding. The authors do not tell us whether the overall effects on hand holding (partner versus alone versus stranger) were different from before versus after completion of EFT (partner versus alone versus stranger), but that is the difference they want to discuss.

Basically, the authors only report interactions between EFT and handholding as qualified by level of initial marital satisfaction.

So? The authors proposed the simple hypothesis that receiving EFT will affect fMRI results in a situation involving threat of pain. They are about to do a very large number of multiple statistical tests and they want to reassure the reader that they are not capitalizing on chance.

For reassurance, they need an interaction between EFT and handholding in the omnibus test. Apparently they did not get it. What they end up doing is going back and forth between whatever few statistical tests are significant from the well over 100 tests that they conducted for pre-/post-fMRI findings. When most of those tests proved nonsignificant they went to a more complex interaction between fMRI results qualified by wives’ level of marital satisfaction.

NThis  is a classic fishing expedition with a high probability that many of the fish should be thrown back as false positives. And the authors do not even have the fishing license that they hoped  significant omnibus results would have provided.

The article makes repeated references to following up and replicating an earlier study by one of the authors, Jim Coan. That study involved only 16 women selected for higher marital satisfaction, so much so, they were called “supercouples” in press coverage of the study. You can find Neurocritic’s critique of that study here.

The levels of marital satisfaction for the two small samples were discontinuous with each other—any couples eligible for one would be disqualified from the other by a wide margin. Most of the general population of married people would fall in between these two studies in  level of marital satisfaction. And any reference, as these authors make, to findings for women with low marital satisfaction in the Coan study are bunk. The highly select sample in the Coan study did not have any women with low marital satisfaction.

The two  samples are very different, but neither study presented data in a way that allowed direct comparison with the other. Both studies departed from transparent, conventional presentation of data. Maybe the results for the original Coan study were weak as well and were simply covered up. That is suggested in the Neurocritic blog post.

But the problem is worse than that. The authors claim that they are preselected the regions of interest (ROIs) based on the results that Coan obtained with his sample of 16 women. If you take the trouble to examine Table 1 from this article and compare it to Coan’s results, you will see that some of the areas of the brain they are examining did not produce significant results in Coan’s study. More evidence of a fishing expedition.

It is apparent that the authors changed their hypotheses after seeing the data. They did not expect changes in the stranger condition and scrambled to explain these results. If you jump to the Discussion section concerning fMRI results for the stranger condition, you get a lot of amazing post-hoc gobbledygook as the authors try to justify the results they obtained. They should simply have admitted that their hypothesis was not confirmed.

j figure 2.pone.0079314.g002
Figure 2. Point estimates of percent signal change graphed as a function of EFT (pre vs. post) by handholding (alone, stranger, partner) and DAS score.

The graphic representations in Figures 2 and 4 were produced by throwing away two thirds of the available data [9].  Yup. Each line represents results for two wives. It is unclear what interpretation is possible, except that it appears that after throwing away all this data, differences between pre- and post-therapy were not apparent for the group that started with higher marital satisfaction. It is nearly flat in the partner condition, which the authors consider so important.

We do not want to make too much of these graphs because they are based on so few wives. But they do seem to suggest that not much was happening for women with higher marital satisfaction to begin with. And this may be particularly true for the responses when they were holding the hand of their partner. Yikes!

aPLOS Johnson EFT-1
Click to enlarge

In looking at the graphical representations of self-report data in figure 1 and the fMRI data in figures 3 and 5, pay particular attention to the bracketing +/- zones, not just the heights of the bar graphs. Some of the brackets overlap or nearly so and you can see that small differences are being discussed.

And, oh, the neuroscience….

It is helpful to know something about fMRI studies to go much further in evaluating this one. But I can provide you with some light weaponry for dispensing with common nonsense.

First, beware of multiple statistical tests from small samples. The authors reassure us that their omnibus test reduced that threat, but they did not report relevant results and they probably did not obtain the results they needed for reassurance. And the results they expected for the omnibus test would not have been much reassurance anyway, they would still be largely capitalizing on chance. The authors also claim that they were testing regions of interest (ROIs), but if you take a careful look, they were testing other regions of the brain and they generally did not replicate much of Coan’s findings from his small study.

new phrenologySecond, beware of suggestions that particular complex mental functions are localized in single regions of the brain so that a difference for that mental function can be inferred from a specific finding for that region. The tendency of investigators to lapse into such claims has been labeled the new phrenology, phrenology being the 19th century pseudoscience of bumps. The authors of this study lead us into this trap when they attempt to explain in the discussion section findings they did not expect.

Third, beware of glib interpretations that a particular region of the brain is activated in terms of meaning that certain mental processes are occurring. It is often hard to tell what activation means. More activity can mean that more mental activity is occurring or it can mean the same mental activity requires more effort.

Fourth, beware of investigators claiming that changes in activation observed in fMRI data represent changes in the structure of the brain or mental processes (in this case, the authors’ claim that processing of threat had been changed). They are simply changes in activity and they may or may not persist and they may or may not be compensated by other changes. Keep in mind the brain is complex and function is interconnected.

Overall, the MRI results were weak, inconsistent, and obscured by the authors’ failure to report simple pre-post differences in any straightforward fashion. And what is presented really does not allow direct comparison between the earlier Coan study and the present one.

The authors started with the simple hypothesis that fMRI assessments conducted before and after EFT would show changes in wives’ response to threat of pain relative to whether there hand was being held by their partner, a stranger, or no one. Results were inconsistent and the authors were left struggling with findings that after a course of EFT, among other things, the wives were more comfortable with their hands been held by a stranger and less comfortable being alone. And that overall, results that they expected to be simply a result of the wives getting EFT actually were limited to wives who got EFT, but who had the lowest marital satisfaction to begin with.

We could continue our analysis by getting into the specific areas of brain functioning for which significant results were or were not obtained. That is dubious business because so many of the results are likely to be due to chance. If we nonetheless continue, we have to confront post-hoc gobbledygook efforts to explain results like

In the substantia nigra/red nucleus, threat-related activity was generally greater during stranger than partner handholding, F (1, 47.4) = 6.5, p = .01. In the vmPFC, left NAcc, left pallidum, right insula, right pallidum, and right planum polare, main effects of EFT revealed general decreases from pre- to post- therapy in threat activation, regardless of whose hand was held, all Fs (1, 41.1 to 58.6) > 3.9, all ps < .05.

Okay, now we started talking about seemingly serious neuroscience and fMRIs and you are confused. But you ought to be confused. Even a neuroscientist would be confused, because the authors are not providing a transparent presentation of their findings, only a lot of razzle dazzle designed to shock and awe, not really inform.

Magneto, the BS-fighting superhero summoned by Neurocritic
Magneto, the BS-fighting superhero summoned by Neurocritic

In an earlier blog post concerning the PLOS One study, Neurocritic detected nonsense and announced that Magneto, a BS-fighting superhero was being summoned. But even mighty Magneto was thwarted by the confused presentation of ambiguous results and the absence of knowledge of what other results had been examined but were suppressed because they did not support the story the authors wanted to tell.

I’m not sure that I understand this formulation, or that a dissociation between behavioral self-report and dACC activity warrants a reinterpretation of EFT’s therapeutic effects. Ultimately, I don’t feel like a BS-fighting superhero either, because it’s not clear whether Magneto has effectively corrected the misperceptions and overinterpretations that have arisen from this fMRI research.

Some of you may be old enough to recall Ronald Reagan doing advertisements for Generalconfused-man Electric on television. He would always end with “Progress is our most important product.” We have been trying to make sense of neuroscience data being inappropriately used to promote psychotherapy,and have had to  deal with all the confusion, contradictory results, and outright cover-up in an article in PLOS One. To paraphrase Reagan, “Confusion is our most important product.” If you are not confused, you don’t sufficiently grasp what is being done in the PLOS One article and the press coverage and promotional video.