Sex and the single amygdala: A tale almost saved by a peek at the data

So sexy! Was bringing up ‘risky sex’ merely a strategy to publish questionable and uninformative science?

wikipedia 1206_FMRIMy continuing question: Can skeptics who are not specialists, but who are science-minded and have some basic skills, learn to quickly screen and detect questionable science in the journals and media coverage?

You don’t need a weatherman to know which way the wind blows.” – Bob Dylandylan wind blows

I hope so. One goal of my blogging is to arouse readers’ skepticism and provide them some tools so that they can decide for themselves what to believe, what to reject, and what needs a closer look or a check against trusted sources.

Skepticism is always warranted in science, but it is particularly handy when confronting the superficial application of neuroscience to every aspect of human behavior. Neuroscience is increasingly being brought into conversations to sell ideas and products when it is neither necessary nor relevant. Many claims about how the brain is involved are false or exaggerated not only in the media, but in the peer-reviewed journals themselves.

A while ago I showed how a neuroscientist and a workshop guru teamed up to try to persuade clinicians with functional magnetic resonance imaging (fMRI) data  that a couples therapy was more sciencey than the rest. Although I took a look at some complicated neuroscience, a lot of my reasoning [1, 2, 3] merely involved applying basic knowledge of statistics and experimental design. I raised sufficient skepticism to dismiss the neuroscientist and psychotherapy guru’s claims, Even putting aside the excellent specialist insights provided by Neurocritic and his friend Magneto.

In this issue of Mind the Brain, I’m pursuing another tip from Neurocritic about some faulty neuroscience in need of debunking.

The paper

Victor, E. C., Sansosti, A. A., Bowman, H. C., & Hariri, A. R. (2015). Differential Patterns of Amygdala and Ventral Striatum Activation Predict Gender-Specific Changes in Sexual Risk Behavior. The Journal of Neuroscience, 35(23), 8896-8900.

Unfortunately, the paper is behind a pay wall. If you can’t get it through a university library portal, you can send a request for a PDF to the corresponding author, elizabeth.victor@duke.edu.

The abstract

Although the initiation of sexual behavior is common among adolescents and young adults, some individuals express this behavior in a manner that significantly increases their risk for negative outcomes including sexually transmitted infections. Based on accumulating evidence, we have hypothesized that increased sexual risk behavior reflects, in part, an imbalance between neural circuits mediating approach and avoidance in particular as manifest by relatively increased ventral striatum (VS) activity and relatively decreased amygdala activity. Here, we test our hypothesis using data from seventy 18- to 22-year-old university students participating in the Duke Neurogenetics Study. We found a significant three-way interaction between amygdala activation, VS activation, and gender predicting changes in the number of sexual partners over time. Although relatively increased VS activation predicted greater increases in sexual partners for both men and women, the effect in men was contingent on the presence of relatively decreased amygdala activation and the effect in women was contingent on the presence of relatively increased amygdala activation. These findings suggest unique gender differences in how complex interactions between neural circuit function contributing to approach and avoidance may be expressed as sexual risk behavior in young adults. As such, our findings have the potential to inform the development of novel, gender-specific strategies that may be more effective at curtailing sexual risk behavior.

My thought processes

Hmm, sexual risk behavior -meaning number of partners? How many new partners during a follow-up period constitutes “risky” and does it matter whether safe sex was practiced? Well, ignoring these issues and calling it “sexual risk behavior “allows the authors to claim relevance to hot topics like HIV prevention….

But let’s cut to the chase: I’m always skeptical about a storyline depending on a three-way statistical interaction. These effects are highly unreliable, particularly in a sample size of only N = 70. I’m suspicious why investigators ahead of time staking their claims on a three-way interaction, not something simpler. I will be looking for evidence that they started with this hypothesis in mind, rather than cooking it up after peeking at the data.

fixed-designs-for-psychological-research-35-638Three-way interactions involve dividing a sample up into at eight boxes, in this case, 2 x (2) x (2). Such interactions can be mind-boggling to interpret, and this one is no exception

Although relatively increased VS activation predicted greater increases in sexual partners for both men and women, the effect in men was contingent on the presence of relatively decreased amygdala activation and the effect in women was contingent on the presence of relatively increased amygdala activation.

And then the “simple” interpretation?

These findings suggest unique gender differences in how complex interactions between neural circuit function contributing to approach and avoidance may be expressed as sexual risk behavior in young adults.

And the public health implications?

As such, our findings have the potential to inform the development of novel, gender-specific strategies that may be more effective at curtailing sexual risk behavior.

hs-amygdalaJust how should these data inform public health strategies beyond what we knew before we stumbled upon this article? Really, should we stick people’s heads in a machine and gather fMRI data  before offering them condoms? Should we encourage computer dating services to post along with a recent headshot, recent fMRI images showing that prospective dates do not have their risky behavior center in the amygdala activated? Or encourage young people to get their heads examined with an fMRI before deciding whether it’s wise to sleep with somebody new?

So it’s difficult to see the practical relevance of these findings, but let’s stick around and consider the paragraph that Neurocritic singled out.

The paragraph

outlierThe majority of the sample reported engaging in vaginal sex at least once in their lifetime (n = 42, 60%). The mean number of vaginal sexual partners at baseline was 1.28 (SD =0.68). The mean increase in vaginal sexual partners at the last follow-up was 0.71 (SD = 1.51). There were no significant differences between men and women in self-reported baseline or change in self-reported number of sexual partners (t=0.05, p=0.96; t=1.02, p= 0.31, respectively). Although there was not a significant association between age and self-reported number of partners at baseline (r = 0.17, p= 0.16), younger participants were more likely to report a greater increase in partners over time (r =0.24, p =0.04). Notably, distribution analyses revealed two individuals with outlying values (3 SD from M; both subjects reported an increase in 8 partners between baseline and follow up). Given the low rate of sexual risk behavior reported in the sample, these outliers were not excluded, as they likely best represent young adults engaging in sexual risk behavior.

What triggers skepticism?

This paragraph is quite revealing if we just ponder it a bit.

First, notice there is only a single significant correlation (p=.04) in a subgroup analysis. Differences between men and women were examined finding no significant findings in either baseline or changes in number of sexual partners over the length of the observation. However, disregarding that finding, the authors went on to explore changes in number of partners over time among the younger participants and, bingo, there was their p =0.04.

Whoa! Age was never mentioned in the abstract. We are now beyond the 2 x 2 x 2 interaction mentioned in the abstract and rooting through another dimension, younger versus older.

But, worse, getting that significance required retaining two participants with eight new sexual partners each during the follow-up period. The decision to retain these participants was made after the pattern of results was examined with and without inclusion of these outliers. The authors say so and essentially say they decided because it made a better story.

The only group means and standard deviation included these two participants. Even including the participants, the average number of new sexual partners was less than one during some follow-up. We have no idea whether that one was risky or not. It’s a safer assumption that having eight new partners is risky, but even that we don’t know for sure.

Keep in mind for future reference: Investigators are supposed to make decisions about outliers without reference to the fate of the hypothesis being studied. And knowing nothing about this particular study, most authorities would say if two people out of 70 are way out there on a particular variable that otherwise has little variance, you should exclude them.

It is considered a Questionable Research Practice to make decisions about inclusion/exclusion based on what story the outcome of this decision allows the authors to tell. It is p-hacking, and significance chasing.

And note the distribution of numbers of vaginal sex partners. Twenty eight participants had none at the end of the study. Most accumulated less than one during the follow up, and even that mean number was distorted by two having eight partners. Hmm, it is going to be hard to get multivariate statistics to work appropriately when we get to the fancy neuroscience data. We could go off on discussions of multivariate normal or Poisson distributions or just think a bit..

We can do a little detective work and determine that one outlier was a male, another a female. (*1) Let’s go back to our eight little boxes of participants that are involved in the interpretation of the three-way interaction. It’s going to make a great difference exactly where the deviant male and female are dropped into one of the boxes or whether they are left out.

And think about sampling issues. What if, for reasons having nothing to with the study, neither of these outliers had shown up? Or if only one of them had showed up, it would skew the results in a particular direction, depending on whether the participant was the male or female.

Okay, if we were wasting our time continuing to read the article after finding what we did in the abstract, we are certainly wasting more of our time by continuing after reading this paragraph. But let’s keep poking around as an educational exercise.

The rest of the methods and results sections

We learn from the methods section that there was an ethnically diverse sample with a highly variable follow-up, from zero days to 3.9 years (M = 188.72 d, SD = 257.15; range = 0 d–3.19 years). And there were only 24 men in the original sample for the paper of 70 participants.

We don’t know whether these two outliers had eight sexual partners within a week of the first assessment or they were the ones captured in extending the study to almost 4 years. That matters somewhat, but we also have to worry whether this was an appropriate sample – with so few participants in it in the first place and even fewer who had sex by the end of the study – and length of follow-up to do such a study. The mean follow-up of about six months and huge standard deviation suggest there is not a lot of evidence of risky behavior, at least in terms of casual vaginal sex.

This is all getting very funky.

So I wondered about the larger context of the study, with increasing doubts that the authors had gone to all this trouble just to test an a priori hypothesis about risky sex.

We are told that the larger context is the ongoing “Duke Neurogenetics Study (DNS), which assesses a wide range of behavioral and biological traits.” The extensive list of inclusions and exclusions suggests a much more ambitious study. If we had more time, we could go look up the Duke Neurogenetics Study and see if that’s the case. But I have a strong suspicion that the study was not organized around the specific research questions of this paper (*2). I really can’t tell without any preregistration of this particular paper but I certainly have questions about how much Hypothesizing after the Results Are Known (HARKing) is going on here in the refining of hypotheses and measures, and decisions about which data to report.

Further explorations of the results section

I remind readers that I know little about fMRI data. Put it aside and we can discover some interesting things reading through the brief results section.

Main effects of task

As expected, our fMRI paradigms elicited robust affect-related amygdala and reward-related VS activity across the entire parent sample of 917 participants (Fig. 1). In our substudy sample of 70 participants, there were no significant effects of gender (t(70) values < 0.88, p values >0.17) or age (r values < 0.22; p values > 0.07) on VS or amygdala activity in either hemisphere.

figure1Hmm, let’s focus on the second sentence first. The authors tell us absolutely nothing is going on in terms of differences in amygdala and reward-related VS activity in relation to age and gender in the sample of 70 participants in the current study. In fact, we don’t even need to know what “amygdala and reward-related VS activity” is to wonder why the first sentence of this paragraph directs us to a graph not of the 70 participants, but a larger sample of 917 participants. And when we go to figure 1, we see some wild wowie zowie, hit-the-reader-between-the-eyes differences (in technical terms, intraocular trauma) for women. And claims of p < 0.000001 twice. But wait! One might think significance of that magnitude would have to come from the 917 participants, except the labeling of the X-axis must come from the substudy of the 70 participants for whom data concerning number of sex partners was collected. Maybe the significance comes from the anchoring of one of the graph lines by the one wayout outlier.

Note that the outlier woman with eight partners anchors the blue line for High Left Amygdala. Without inclusion of that single woman, the nonsignificant trends between women with High Left Amygdala versus women with Low Left Amygdala would be reversed.

figure2The authors make much of the differences between Figure 1 showing Results for Women and Figure 2 showing Results for Men. The comparison seems dramatic except that, once again, the one outlier sends the red line for Low Left Amygdala off from the blue line for High Left Amygdala. Otherwise, there is no story to tell. Mind-boggling, but I think we can safely conclude that something is amiss in these Frankenstein graphs.

Okay, we should stop beating a corpse of an article. There are no vital signs left.

Alternatively, we could probe the section on Poisson regressions and minimally note some details. There is the flash of some strings of zeros in the P values, but it seems complicated and then we are warned off with “no factors survive Bonferroni correction.” And then in the next paragraph, we get to exploring dubious interactions. And there is the final insult of the authors bringing in a two-way interaction trending toward significance among men, p =.051.

But we were never told how all this would lead as we were promised in the end of the abstract, “to the development of novel, gender-specific strategies that may be more effective at curtailing sexual risk behavior.”

Rushing through the discussion section, we note the disclosure that

The nature of these unexpected gender differences on clear and warrants further consideration.

So, the authors confess that they did not start with expectations of finding a gender difference. They had nothing to report from a subset of data from an ambitious project put together for other purposes with an ill-suited follow-up for the research question (and even an ill-suited experimental task. They made a decision to include two outliers, salvaged some otherwise weak and inconsistent differences, and then constructed a story that depended on their inclusion. Bingo, they can survive confirmation bias and get published.

Readers might have been left with just their skepticism about the three-way interaction described in the abstract. However, the authors implicated themselves by disclosing in the article their examination of a distribution and reasons for including outlier. Then they further disclosed they did not start with a hypothesis about gender differences.

Why didn’t the editor and reviewers at Journal of Neuroscience (impact factor 6.344) do their job and cry foul? Questionable research practices (QRPs) are brought to us courtesy of questionable publication practices (QPPs).

And then we end with the confident

These limitations notwithstanding, our current results suggest the importance of considering gender-specific patterns of interactions between functional neural circuits supporting approach and avoidance in the expression of sexual risk behavior in young adults.

Yet despite this vague claim, the authors still haven’t explained how this research could be translated to practice.

Takeaway points for the future.

Without a tip from NeuroCritic, I might not have otherwise zeroed in on the dubious complex statistical interaction on which the storyline in the abstract depended. I also benefited from the authors for whatever reason telling us that they had peeked at the data and telling us further in the discussion that they had not anticipated the gender difference. With current standards for transparency and no preregistration of such studies, it would’ve been easy for us to miss what was done because the authors did not need to alert us. Until there are more and better standards enforced, we just need to be extra skeptical of claims of the application of neuroscience to everyday life.

Trust your skepticism.

Apply whatever you know about statistics and experimental methods. You probably know more than you think you do

Beware of modest sized neuroscience studies for which authors develop storylines from the patterning authors can discover in their data, not from a priori hypotheses suggested by a theory. If you keep looking around in the scientific literature and media coverage of it, I think you will find a lot of this QRP and QPP.

Don’t go into a default believe-it mode just because an article is peer-reviewed.

Notes

  1. If both the outliers were of the same gender, it would have been enough for that gender to have had significantly more sex partners than the other.
  1. Later we had told in the Discussion section that particular stimuli for which fMRI data were available were not chosen for relevance to the research question claimed for this this paper.

We did not measure VS and amygdala activity in response to sexually provocative stimuli but rather to more general representations of reward and affective arousal. It is possible that variability in VS and amygdala activity to such explicit stimuli may have different or nonexistent gender-specific patterns that may or may not map onto sexual risk behaviors.

Special thanks to Neurocritic for suggesting this blog post and for feedback, as well as to Neuroskeptic, Jessie Sun, and Hayley Jach for helpful feedback. However, @CoyneoftheRealm bears sole responsibility for any excesses or errors in this post.

 

 

 

 

 

 

 

 

 

 

Neurobalm: the pseudo-neuroscience of couples therapy

soothingsyrup1Special thanks to Professor Keith Laws, blogger at LawsDystopiaBlog and especially the pseudonymous Neurocritic for their helpful comments. But any excesses or inaccuracies are entirely my own responsibility.

 

You may be more able to debunk bad neuroscience than you think.

In my last blog post, I began critically examining whether emotionally focused couples therapy (EFT) could be said to sooth the brains of wives who had received it.

Claims were made in a peer-reviewed article available here and amplified in a University of Ottawa press release that EFT was a particularly potent form of couples therapy. An fMRI study supposedly demonstrated how EFT changed the way the brain encoded threatening situations.

True love creates resilience, turning off fear and pain in the brain

OTTAWA, May 1, 2014— New research led by Dr. Sue Johnson of the University of Ottawa’s School of Psychology confirms that those with a truly felt loving connection to their partner seem to be calmer, stronger and more resilient to stress and threat.

In the first part of the study, which was recently published in PLOS ONE, couples learned how to reach for their lover and ask for what they need in a “Hold Me Tight” conversation. They learned the secrets of emotional responsiveness and connection.

The second part of the study, summarized here, focused on how this also changed their brain. It compared the activation of the female partner’s brain when a signal was given that an electric shock was pending before and after the “Hold Me Tight” conversation.

The experiment explored three different conditions. In the first, the subject lay alone in a scanner knowing that when she saw a red X on a screen in front of her face there was a 20% chance she would receive a shock to her ankles. In the second, a male stranger held her hand throughout the same procedure. In the third, her partner held her hand. Subjects also pressed a screen after each shock to rate how painful they perceived it to be.

Before the “Hold Me Tight” conversation, even when the female partner was holding her mate’s hand, her brain became very activated by the threat of the shock — especially in areas such as the inferior frontal gyrus, anterior insula, frontal operculum and orbitofrontal cortex, where fear is controlled. These are all areas that process alarm responses. Subjects also rated the shock as painful under all conditions.

However, after the partners were guided through intense bonding conversations (a structured therapy titled Emotionally Focused Couple Therapy or EFT), the brain activation and reported level of pain changed —under one condition. While the shock was again described as painful in the alone and in the stranger hand holding conditions (albeit with some small change compared to before), the shock was described as merely uncomfortable when the husband offered his hand. Even more interesting, in the husband hand-holding condition, the subject’s brain remained calm with minimal activation in the face of threat.

These results support the effectiveness of EFT and its ability to shape secure bonding. The physiological effects are exactly what one would expect from more secure bonding. This study also adds to the evidence that attachment bonds and their soothing impact are a key part of adult romantic love. Results shed new light on other positive findings on secure attachment in adults, suggesting the mechanisms by which safe haven contact fosters more stability and less reactivity to threat.

You can find my succinct deconstruction of the press release here.

I invite you to carefully read the article or my last blog post and this one. This shouldhold me tight prepare you to detect some important signs this press release is utter nonsense, designed to mislead and falsely impress clinicians to whom EFT workshops and trainings are marketed. For instance, where in the procedures described in the PLOS One article is there any indication of the “Hold Me Tight” conversation? But that is just the start of the nonsense.

The PLOS One article ends with the claim that this “experiment” was conducted with a rigor comparable to a randomized clinical trial. Reading the article or these blog posts, you should also be able to see that this claim too is utter nonsense.

In my last blog post, I showed a lack of compelling evidence that EFT was better than any other couples treatment. To the extent to which EFT has been evaluated at all, the studies are quite small and all supervised by promoters of EFT. Couples in the EFT studies are recruited to be less martially dissatisfied than in other couples therapy research, and there is some evidence that improvement in marital functioning does not persist after therapy ends.

I called attention to the neuroscientist Neurocritic’s caution against expecting fMRI studies to reveal much about the process or effectiveness of psychotherapy that we do not know already.

Of course, we should expect some effects of psychotherapy to be apparent in pre-post therapy fMRI studies. But we should also expect the same of bowling or watching a TV series for equivalent amount of time. Are we really getting much more than what we what we can observe in couples’ behavior or what they report after therapy to what we can find with an fMRI? And without a comparison group, studies are not particularly revealing.

The larger problem looming in the background is authors intentionally or unintentionally intimidating readers with glib interpretations of neuroscience. Few readers feel confident in their ability to interpret such claims, especially the therapists to whom author Susan Johnson’s workshops are promoted.

This blog post could surprise you.

Maybe it will reassure you that you possess basic critical faculties with which you can debunk the journal article –if you are willing to commit the time and energy to reading and rereading it with skepticism.

I would settle, however, for leaving you thoroughly confused and skeptical about the claims in the PLOS One article. There are lots of things that do not make sense and that should be confusing if you think about them.

Confusion is a healthy reaction, particularly if the alternative is gullibility and being persuaded by pseudoscience.

I begin by ignoring that this was specifically an fMRI study.  Instead, I will look at some numbers and details of the study that you can readily discover. Maybe you would have had to look some things up on the Internet, but many of you could replicate my efforts.

In the text below, I have inserted some numbers in brackets. If you click on them, you will be taken to a secondary blog site where there are some further explanations.

The 23 wives for whom data were reported are in unrepresentative and highly select subsample of the 666 wives in couples expressing an interest in response to advertisements for the study.

With such a small number of participants–

  •  Including or excluding one or two participants can change results [1]. There is some evidence this could have occurred after initial results were known [2].
  • Any positive significant findings are likely to be false, and of necessity, significant findings will be large in magnitude, even when false positives [3].

The sample was restricted to couples experiencing only mild to moderate marital dissatisfaction. So, the study sample was less dissatisfied with their marriages, i.e.,  not comparable to those recruited by other research groups for couples intervention studies.

Given the selection procedure, it was impossible for the authors to obtain a sample of couples with the mean levels of marital dissatisfaction that they reported for baseline assessments.

They stated that they recruited couples with the criteria that their marital dissatisfactionyour sample sizes are small initially be between 80-96 on the DAS. They then report that initial mean DAS score was 81.2 (SD=14.0). Impossible. [4]

Yup, and this throws into doubt all the other results that are reported, especially when they find they need to explain results that did not occur as expected in differences between pre and post EFT fMRI, but only in a complex interaction between pre/post fMRI and initial DAS scores.

Couples therapy was continued until some vaguely defined clinical goal had been achieved.  None of the details were presented that one would expect a scientific paper for how it was decided that this was enough therapy.

We were not told who decided, by what criteria, or with what interrater reliability the judgments were made. We do know Susan Johnson, CEO of the nonprofit and profit-making companies promoting EFT supervised all therapy and the study.

Basically, Dr. Johnson was probably able to prolong the therapy and the follow-up fMRI assessment until she believed that the wives responses would make the therapy look good. And with no further follow-up, she implies that “how the brain processes threat” had been changed without any evidence that whether changes in fMRI persisted or were transient.

This might be fine for the pseudo-magic of a workshop presentation, but is unacceptable for a peer-reviewed article for which readers are supposed to be able to arrive at an independent judgment. And far removed from the experimental control of a clinical trial in which timing of follow up assessments are fixed.

Randomized clinical trials take this kind of control away from investigators and put it into the design and the phenomenon being studied so that maybe investigators can be proved incorrect.

The amount of therapy that these wives received (M= 22-9, range =13-35) was substantially more what was provided in past EFT outcome studies. Whatever therapeutic gains were observed in the sample could not be expected to generalize to past studies. [5]

Despite the therapy that they had received and despite the low levels of marital dissatisfaction with which they had begun, the average couple finishing the study still qualified for entering it. [6]

There is no explanation given why only wives data are presented. No theoretical or clinical rationale is given for not studying husbands or presenting their data as well [7]

A great deal is made of whether particular results are statistically significant or not. However, keep in mind that there was a very small sample size and the seemingly sharp distinction between significant and nonsignificant is arbitrary. Certainly, the size of most differences between results characterized as significant versus nonsignificant is not itself statistically significant. [8]

And, we will see, much is being made of small differences that did not occur for all wives, only those initially with the lowest marital satisfaction.

The number of statistical tests the conducted was many times number of women in the study. The authors do not indicate all the analyses they conducted and selectively reported a subset of the analyses conducted, but there was considerable room for capitalizing on chance.

cherrypickingMultiple statistical tests in  a small sample without adjustment for there being so many tests is a common complaint about small fMRI studies, but this study is a particularly bad example. Happy cherrypicking!

The article and Johnson’s promotional materials make much of differences that were observed from fMRI data collected before and after therapy. But the article never reports results for actually testing these differences.This is an important discovery. Let’s stop and explore it.

The article leads off its presentation of the fMRI results with

The omnibus test of EFT and handholding on all voxels activated in the original Coan et al. handholding study indicated a significant interaction between EFT, handholding and DAS, F (2, 72.6) = 3.6, p= .03 (Alone x EFT x DAS b= 10.3, SE =3.7; Stranger x EFT x DAS b = 2.5, SE =3.3).

What is oddly missing here is any test of the simple interaction between EFT (before versus after therapy) and handholding, i.e., EFT x handholding. The authors do not tell us whether the overall effects on hand holding (partner versus alone versus stranger) were different from before versus after completion of EFT (partner versus alone versus stranger), but that is the difference they want to discuss.

Basically, the authors only report interactions between EFT and handholding as qualified by level of initial marital satisfaction.

So? The authors proposed the simple hypothesis that receiving EFT will affect fMRI results in a situation involving threat of pain. They are about to do a very large number of multiple statistical tests and they want to reassure the reader that they are not capitalizing on chance.

For reassurance, they need an interaction between EFT and handholding in the omnibus test. Apparently they did not get it. What they end up doing is going back and forth between whatever few statistical tests are significant from the well over 100 tests that they conducted for pre-/post-fMRI findings. When most of those tests proved nonsignificant they went to a more complex interaction between fMRI results qualified by wives’ level of marital satisfaction.

NThis  is a classic fishing expedition with a high probability that many of the fish should be thrown back as false positives. And the authors do not even have the fishing license that they hoped  significant omnibus results would have provided.

The article makes repeated references to following up and replicating an earlier study by one of the authors, Jim Coan. That study involved only 16 women selected for higher marital satisfaction, so much so, they were called “supercouples” in press coverage of the study. You can find Neurocritic’s critique of that study here.

The levels of marital satisfaction for the two small samples were discontinuous with each other—any couples eligible for one would be disqualified from the other by a wide margin. Most of the general population of married people would fall in between these two studies in  level of marital satisfaction. And any reference, as these authors make, to findings for women with low marital satisfaction in the Coan study are bunk. The highly select sample in the Coan study did not have any women with low marital satisfaction.

The two  samples are very different, but neither study presented data in a way that allowed direct comparison with the other. Both studies departed from transparent, conventional presentation of data. Maybe the results for the original Coan study were weak as well and were simply covered up. That is suggested in the Neurocritic blog post.

But the problem is worse than that. The authors claim that they are preselected the regions of interest (ROIs) based on the results that Coan obtained with his sample of 16 women. If you take the trouble to examine Table 1 from this article and compare it to Coan’s results, you will see that some of the areas of the brain they are examining did not produce significant results in Coan’s study. More evidence of a fishing expedition.

It is apparent that the authors changed their hypotheses after seeing the data. They did not expect changes in the stranger condition and scrambled to explain these results. If you jump to the Discussion section concerning fMRI results for the stranger condition, you get a lot of amazing post-hoc gobbledygook as the authors try to justify the results they obtained. They should simply have admitted that their hypothesis was not confirmed.

j figure 2.pone.0079314.g002
Figure 2. Point estimates of percent signal change graphed as a function of EFT (pre vs. post) by handholding (alone, stranger, partner) and DAS score.

The graphic representations in Figures 2 and 4 were produced by throwing away two thirds of the available data [9].  Yup. Each line represents results for two wives. It is unclear what interpretation is possible, except that it appears that after throwing away all this data, differences between pre- and post-therapy were not apparent for the group that started with higher marital satisfaction. It is nearly flat in the partner condition, which the authors consider so important.

We do not want to make too much of these graphs because they are based on so few wives. But they do seem to suggest that not much was happening for women with higher marital satisfaction to begin with. And this may be particularly true for the responses when they were holding the hand of their partner. Yikes!

aPLOS Johnson EFT-1
Click to enlarge

In looking at the graphical representations of self-report data in figure 1 and the fMRI data in figures 3 and 5, pay particular attention to the bracketing +/- zones, not just the heights of the bar graphs. Some of the brackets overlap or nearly so and you can see that small differences are being discussed.

And, oh, the neuroscience….

It is helpful to know something about fMRI studies to go much further in evaluating this one. But I can provide you with some light weaponry for dispensing with common nonsense.

First, beware of multiple statistical tests from small samples. The authors reassure us that their omnibus test reduced that threat, but they did not report relevant results and they probably did not obtain the results they needed for reassurance. And the results they expected for the omnibus test would not have been much reassurance anyway, they would still be largely capitalizing on chance. The authors also claim that they were testing regions of interest (ROIs), but if you take a careful look, they were testing other regions of the brain and they generally did not replicate much of Coan’s findings from his small study.

new phrenologySecond, beware of suggestions that particular complex mental functions are localized in single regions of the brain so that a difference for that mental function can be inferred from a specific finding for that region. The tendency of investigators to lapse into such claims has been labeled the new phrenology, phrenology being the 19th century pseudoscience of bumps. The authors of this study lead us into this trap when they attempt to explain in the discussion section findings they did not expect.

Third, beware of glib interpretations that a particular region of the brain is activated in terms of meaning that certain mental processes are occurring. It is often hard to tell what activation means. More activity can mean that more mental activity is occurring or it can mean the same mental activity requires more effort.

Fourth, beware of investigators claiming that changes in activation observed in fMRI data represent changes in the structure of the brain or mental processes (in this case, the authors’ claim that processing of threat had been changed). They are simply changes in activity and they may or may not persist and they may or may not be compensated by other changes. Keep in mind the brain is complex and function is interconnected.

Overall, the MRI results were weak, inconsistent, and obscured by the authors’ failure to report simple pre-post differences in any straightforward fashion. And what is presented really does not allow direct comparison between the earlier Coan study and the present one.

The authors started with the simple hypothesis that fMRI assessments conducted before and after EFT would show changes in wives’ response to threat of pain relative to whether there hand was being held by their partner, a stranger, or no one. Results were inconsistent and the authors were left struggling with findings that after a course of EFT, among other things, the wives were more comfortable with their hands been held by a stranger and less comfortable being alone. And that overall, results that they expected to be simply a result of the wives getting EFT actually were limited to wives who got EFT, but who had the lowest marital satisfaction to begin with.

We could continue our analysis by getting into the specific areas of brain functioning for which significant results were or were not obtained. That is dubious business because so many of the results are likely to be due to chance. If we nonetheless continue, we have to confront post-hoc gobbledygook efforts to explain results like

In the substantia nigra/red nucleus, threat-related activity was generally greater during stranger than partner handholding, F (1, 47.4) = 6.5, p = .01. In the vmPFC, left NAcc, left pallidum, right insula, right pallidum, and right planum polare, main effects of EFT revealed general decreases from pre- to post- therapy in threat activation, regardless of whose hand was held, all Fs (1, 41.1 to 58.6) > 3.9, all ps < .05.

Okay, now we started talking about seemingly serious neuroscience and fMRIs and you are confused. But you ought to be confused. Even a neuroscientist would be confused, because the authors are not providing a transparent presentation of their findings, only a lot of razzle dazzle designed to shock and awe, not really inform.

Magneto, the BS-fighting superhero summoned by Neurocritic
Magneto, the BS-fighting superhero summoned by Neurocritic

In an earlier blog post concerning the PLOS One study, Neurocritic detected nonsense and announced that Magneto, a BS-fighting superhero was being summoned. But even mighty Magneto was thwarted by the confused presentation of ambiguous results and the absence of knowledge of what other results had been examined but were suppressed because they did not support the story the authors wanted to tell.

I’m not sure that I understand this formulation, or that a dissociation between behavioral self-report and dACC activity warrants a reinterpretation of EFT’s therapeutic effects. Ultimately, I don’t feel like a BS-fighting superhero either, because it’s not clear whether Magneto has effectively corrected the misperceptions and overinterpretations that have arisen from this fMRI research.

Some of you may be old enough to recall Ronald Reagan doing advertisements for Generalconfused-man Electric on television. He would always end with “Progress is our most important product.” We have been trying to make sense of neuroscience data being inappropriately used to promote psychotherapy,and have had to  deal with all the confusion, contradictory results, and outright cover-up in an article in PLOS One. To paraphrase Reagan, “Confusion is our most important product.” If you are not confused, you don’t sufficiently grasp what is being done in the PLOS One article and the press coverage and promotional video.