Trusted source? The Conversation tells migraine sufferers that child abuse may be at the root of their problems

Patients and family members face a challenge obtaining credible, evidence-based information about health conditions from the web.

Migraine sufferers have a particularly acute need because their condition is often inadequately self-managed without access to best available treatment approaches. Demoralized by the failure of past efforts to get relief, some sufferers may give up consulting professionals and desperately seek solutions on Internet.

A lot of both naïve and exploitative quackery that awaits them.

Even well-educated patients cannot always distinguish the credible from the ridiculous.

One search strategy is to rely on websites that have proven themselves as trusted sources.

The Conversation has promoted itself as such a trusted source, but its brand is tarnished by recent nonsense we will review concerning the role of child abuse in migraines.

Despite some excellent material that has appeared in other articles in The Conversation, I’m issuing a reader’s advisory:

exclamation pointThe Conversation cannot be trusted because this article shamelessly misinforms migraine sufferers that child abuse could be at the root of their problems.

The Conversation article concludes with a non sequitur that shifts sufferers and their primary care physicians away from getting consultation with the medical specialists who are most able to improve management of a complex condition.

 

The Conversation article tells us:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

This blog post identifies clickbait, manipulation, misapplied buzz terms, and  misinformation – in the The Conversation article.

Perhaps the larger message of this blog post is that persons with complex medical conditions and those who provide formal and informal care for them should not rely solely on what they find on the Internet. This exercise specifically focusing on The Conversation article serves to demonstrate this.

Hopefully, The Conversation will issue a correction, as they promise to do at the website when errors are found.

We are committed to responsible and ethical journalism, with a strict Editorial Charter and codes of conduct. Errors are corrected promptly.

The Conversation article –

Why emotional abuse in childhood may lead to migraines in adulthood

clickbaitA clickbait title offered a seductive  integration of a trending emotionally laden social issue – child abuse – with a serious medical condition – migraines – for which management is often not optimal. A widely circulating estimate is that 60% of migraine sufferers do not get appropriate medical attention in large part because they do not understand the treatment options available and may actually stop consulting physicians.

Some quick background about migraine from another, more credible source:

Migraines are different from other headaches. People who suffer migraines other debilitating symptoms.

  • visual disturbances (flashing lights, blind spots in the vision, zig zag patterns etc).
  • nausea and / or vomiting.
  • sensitivity to light (photophobia).
  • sensitivity to noise (phonophobia).
  • sensitivity to smells (osmophobia).
  • tingling / pins and needles / weakness / numbness in the limbs.

Persons with migraines differ greatly among themselves in terms of the frequency, intensity, and chronicity of their symptoms, as well as their triggers for attacks.

Migraine is triggered by an enormous variety of factors – not just cheese, chocolate and red wine! For most people there is not just one trigger but a combination of factors which individually can be tolerated. When these triggers occur altogether, a threshold is passed and a migraine is triggered. The best way to find your triggers is to keep a migraine diary. Download your free diary now!

Into The Conversation article: What is the link between emotional abuse and migraines?

Without immediately providing a clicklink so that  readers can check sources themselves, The Conversation authors say they are drawing on “previous research, including our own…” to declare there is indeed an association between past abuse and migraines.

Previous research, including our own, has found a link between experiencing migraine headaches in adulthood and experiencing emotional abuse in childhood. So how strong is the link? What is it about childhood emotional abuse that could lead to a physical problem, like migraines, in adulthood?

In invoking the horror of childhood emotional abuse, the authors imply that they are talking about something infrequent – outside the realm of most people’s experience.  If “childhood emotional abuse” is commonplace, how could  it be horrible and devastating?

In their pursuit of click bait sensationalism, the authors have only succeeded in trivializing a serious issue.

A minority of people endorsing items concerning past childhood emotional abuse actually currently meet criteria for a diagnosis of posttraumatic stress disorder. Their needs are not met by throwing them into a larger pool of people who do not meet these criteria and making recommendations based on evidence derived from the combined group.

Spiky_Puffer_Fish_Royalty_Free_Clipart_Picture_090530-025255-184042The Conversation authors employ a manipulative puffer fish strategy [1 and  2 ] They take what is a presumably infrequent condition and  attach horror to it. But they then wildly increase the presumed prevalence by switching to a definition that arises in a very different context:

Any act or series of acts of commission or omission by a parent or other caregiver that results in harm, potential for harm, or threat of harm to a child.

So we are now talking about ‘Any act or series of acts? ‘.. That results in ‘harm, potential for harm or threat’? The authors then assert that yes, whatever they are talking about is indeed that common. But the clicklink to support for this claim takes the reader behind a pay wall where a consumer can’t venture without access to a university library account.

Most readers are left with the authors’ assertion as an authority they can’t check. I have access to a med school library and I checked. The link is  to a secondary source. It is not a systematic review of the full range of available evidence. Instead, it is a  selective search for evidence favoring particular speculations. Disconfirming evidence is mostly ignored. Yet, this article actually contradicts other assertions of The Conversation authors. For instance, the paywalled article says that there is actually little evidence that cognitive behavior therapy is effective for people whose need for therapy is only because they  reported abuse in early childhood.

Even if you can’t check The Conversation authors’ claims, know that adults’ retrospective of childhood adversity are not particularly reliable or valid, especially studies relying on checklist responses of adults to broad categories, as this research does.

When we are dealing with claims that depend on adult retrospective reports of childhood adversity, we are dealing with a literature with seriously deficiencies. This literature grossly overinterprets common endorsement of particular childhood experiences as strong evidence of exposure to horrific conditions. This literature has a strong confirmation bias. Positive findings are highlighted. Negative findings do not get cited much. Serious limitations in methodology and inconsistency and findings generally ignored.

[This condemnation is worthy of a blog post or two itself. But ahead I will provide some documentation.]

The Conversation authors explain the discrepancy between estimates based on administrative data of one in eight children suffering abuse or neglect before age 18 versus much higher estimates from retrospective adult reports on the basis of so much abuse going unreported.

The discrepancy may be because so many cases of childhood abuse, particularly cases of emotional or psychological abuse, are unreported. This specific type of abuse may occur within a family over the course of years without recognition or detection.

This could certainly be true, but let’s see the evidence. A lack of reporting could also indicate a lack of many experiences reaching a threshold prompting reporting. I’m willing to be convinced otherwise, but let’s see the evidence.

The link between emotional abuse and migraines

The Conversation authors provide links only to their own research for their claim:

While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse. Two studies using nationally representative samples of older Americans (the mean ages were 50 and 56 years old, respectively) have found a link.

The first link is to an article that is paywalled except for its abstract. The abstract shows  the study does not involve a nationally representative sample of adults. The study compared patients with tension headaches to patients with migraines, without a no-headache control group. There is thus no opportunity to examine whether persons with migraines recall more emotional abuse than persons who do not suffer headaches.  Any significant associations in a huge sample disappeared after controlling for self-reported depression and anxiety.

My interpretation: There is nothing robust here. Results could be due to crude measurement, confounding of retrospective self-report by current self-report anxious or depressive symptoms. We can’t say much without a no-headache control group.

The second of the authors’ studies is also paywalled, but we can see from the abstract:

We used data from the Adverse Childhood Experiences (ACE) study, which included 17,337 adult members of the Kaiser Health Plan in San Diego, CA who were undergoing a comprehensive preventive medical evaluation. The study assessed 8 ACEs including abuse (emotional, physical, sexual), witnessing domestic violence, growing up with mentally ill, substance abusing, or criminal household members, and parental separation or divorce. Our measure of headaches came from the medical review of systems using the question: “Are you troubled by frequent headaches?” We used the number of ACEs (ACE score) as a measure of cumulative childhood stress and hypothesized a “dose–response” relationship of the ACE score to the prevalence and risk of frequent headaches.

Results — Each of the ACEs was associated with an increased prevalence and risk of frequent headaches. As the ACE score increased the prevalence and risk of frequent headaches increased in a “dose–response” fashion. The risk of frequent headaches increased more than 2-fold (odds ratio 2.1, 95% confidence interval 1.8-2.4) in persons with an ACE score ≥5, compared to persons with and ACE score of 0. The dose–response relationship of the ACE score to frequent headaches was seen for both men and women.

The Conversation authors misrepresent this study. It is about self-reported headaches, not the subgroup of these patients reporting migraines. But in the first of their own studies they just cited, the authors contrast tension headaches with migraine headaches, with no controls.

So the data did not allow examination of the association between adult retrospective reports of childhood emotional abuse and migraines. There is no mention of self-reported depression and anxiety, which wiped out any relationship with childhood adversity in headaches in the first study. I would expect that a survey of ACES would include such self-report. And the ACEs equate either parental divorce and separation (the same common situation likely occur together and so are counted twice) with sexual abuse in calculating an overall score.

The authors make a big deal of the “dose-response” they found. But this dose-response could just represent uncontrolled confounding  – the more ACEs indicates the more confounding, greater likelihood that respondents faced other social, person, economic, and neighborhood deprivations.  The higher the ACE score, the greater likelihood that other background characteristic s are coming into play.

The only other evidence the authors cite is again another one of their papers, available only as a conference abstract. But the abstract states:

Results: About 14.2% (n = 2,061) of the sample reported a migraine diagnosis. Childhood abuse was recalled by 60.5% (n =1,246) of the migraine sample and 49% (n = 6,088) of the non-migraine sample. Childhood abuse increased the chances of a migraine diagnosis by 55% (OR: 1.55; 95% CI 1.35 – 1.77). Of the three types of abuse, emotional abuse had a stronger effect on migraine (OR: 1.52; 95% CI 1.34 – 1.73) when compared to physical and sexual abuse. When controlled for depression and anxiety, the effect of childhood abuse on migraine (OR: 1.32; 95% CI 1.15 – 1.51) attenuated but remained significant. Similarly, the effect of emotional abuse on migraine decreased but remained significant (OR: 1.33; 95% CI 1.16 – 1.52), when controlled for depression and anxiety.

The rates of childhood abuse seem curiously high for both the migraine and non-migraine samples. If you dig a bit on the web for details of the National Longitudinal Study of Adolescent Health, you can find how crude the measurement is.  The broad question assessing emotional abuse covers the full range of normal to abnormal situations without distinguishing among them.

How often did a parent or other adult caregiver say things that really hurt your feelings or made you feel like you were not wanted or loved? How old were you the first time this happened? (Emotional abuse).

An odds ratio of 1.33 is not going to attract much attention from an epidemiologist, particularly when it is obtained from such messy data.

I conclude that the authors have made only a weak case for the following statement: While all forms of childhood maltreatment have been shown to be linked to migraines, the strongest and most significant link is with emotional abuse.

Oddly, if we jump ahead to the closing section of The Conversation article, the authors concede:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

But, as we will  see, they make recommendations that assume a strong link has been established.

Why would emotional abuse in childhood lead to migraines in adulthood?

This section throws out a number of trending buzz terms, strings them together in a way that should impress and intimidate consumers, rather than allow them independent evaluation of what is being said.

got everything

The section also comes below a stock blue picture of the brain.  In web searches, the picture  is associated with social media where the brain is superficially brought into  in discussions where neuroscience is  not relevant.

An Australian neuroscientist commented on Facebook:

Deborah on blowing brains

The section starts out:

The fact that the risk goes up in response to increased exposure is what indicates that abuse may cause biological changes that can lead to migraine later in life. While the exact mechanism between migraine and childhood maltreatment is not yet established, research has deepened our understanding of what might be going on in the body and brain.

We could lost in a quagmire trying to figuring out the evidence for the loose associations that are packed into a five paragraph section.  Instead,  I’ll make some observations that can be followed up by interested readers.

The authors acknowledge that no mechanism has been established linking migraines and child maltreatment. The link for this statement takes the reader to the authors own pay walled article that is explicitly labeled “Opinion Statement ”.

The authors ignore a huge literature that acknowledges great heterogeneity among sufferers of migraines, but points to some rather strong evidence for treatments based on particular mechanisms identified among carefully selected patients. For instance, a paper published in The New England Journal of Medicine with well over 1500 citations:

Goadsby PJ, Lipton RB, Ferrari MD. Migraine—current understanding and treatment. New England Journal of Medicine. 2002 Jan 24;346(4):257-70.

Speculations concerning the connections between childhood adversity, migraines and the HPA axis are loose. The Conversation authors their obviousness needs to be better document with evidence.

For instance, if we try to link “childhood adversity” to the HPA axis, we need to consider the lack of specificity of” childhood adversity’ as defined by retrospective endorsement of Adverse Childhood Experiences (ACEs). Suppose we rely on individual checklist items or cumulative scores based on number of endorsements. We can’t be sure that we are dealing with actual rather than assumed exposure to traumatic events or that there be any consistent correlates in current measures derived from the HPA axis.

Any non-biological factor defined so vaguely is not going to be a candidate for mapping into causal processes or biological measurements.

An excellent recent Mind the Brain article by my colleague blogger Shaili Jain interviews Dr. Rachel Yehuda, who had a key role in researching HPA axis in stress. Dr. Yehuda says endocrinologists would cringe at the kind of misrepresentations that are being made in The Conversation article.

A recent systematic review concludes the evidence for specific links between child treatment and inflammatory markers is of limited and poor quality.

Coelho R, Viola TW, Walss‐Bass C, Brietzke E, Grassi‐Oliveira R. Childhood maltreatment and inflammatory markers: a systematic review. Acta Psychiatrica Scandinavica. 2014 Mar 1;129(3):180-92.

The Conversation article misrepresents gross inconsistencies in the evidence of biological correlates representing biomarkers. There are as yet no biomarkers for migraines in the sense of a biological measurement that reliably distinguishes persons with migraines from other patient populations with whom they may be confused. See an excellent funny blog post by Hilda Bastian.

Notice the rhetorical trick in authors of The Conversation article’s assertion that

Migraine is considered to be a hereditary condition. But, except in a small minority of cases, the genes responsible have not been identified.

Genetic denialists like Oliver James  or Richard Bentall commonly phrased questions in this manner to be a matter of hereditary versus non-hereditary. But complex traits like height, intelligence, or migraines involve combinations of variations in a number of genes, not a single gene or even a few genes.. For an example of the kind of insights that sophisticated genetic studies of migraines are yielding see:

Yang Y, Ligthart L, Terwindt GM, Boomsma DI, Rodriguez-Acevedo AJ, Nyholt DR. Genetic epidemiology of migraine and depression. Cephalalgia. 2016 Mar 9:0333102416638520.

The Conversation article ends with some signature nonsense speculation about epigenetics:

However, stress early in life induces alterations in gene expression without altering the DNA sequence. These are called epigenetic changes, and they are long-lasting and may even be passed on to offspring.

Interested readers can find these claims demolished in Epigenetic Ain’t Magic by PZ Myers, a biologist who attempts to rescue an extremely important development concept from its misuse.

Or Carl Zimmer’s Growing Pains for Field of Epigenetics as Some Call for Overhaul.

What does this mean for doctors treating migraine patients?

The Conversation authors startle readers with an acknowledgment that contradicts what they have been saying earlier in their article:

Childhood maltreatment probably contributes to only a small portion of the number of people with migraine.

It is therefore puzzling when they next say:

But because research indicates that there is a strong link between the two, clinicians may want to bear that in mind when evaluating patients.

Cognitive behavior therapy is misrepresented as an established effective treatment for migraines. A recent systematic review and meta-analysis  had to combine migraines with other chronic headaches and order to get ten studies to consider.

The conclusion of this meta-analysis:

Methodology inadequacies in the evidence base make it difficult to draw any meaningful conclusions or to make any recommendations.

The Conversation article notes that the FDA has approved anti-epileptic drugs such as valproate and topiramate for treatment of migraines. However, the article’s claim that the efficacy of these drugs are due to their effects on epigenetics is quite inconsistent with what is said in the larger literature.

Clinicians specializing and treating fibromyalgia or irritable bowel syndrome would be troubled by the authors’ lumping these conditions with migraines and suggesting that a psychiatric consultation is the most appropriate referral for patients who are having difficulty achieving satisfactory management.

See for instance the links contained in my blog post, No, irritable bowel syndrome is not all in your head.

The Conversation article closes with:

Within a migraine clinic population, clinicians should pay special attention to those who have been subjected to maltreatment in childhood, as they are at increased risk of being victims of domestic abuse and intimate partner violence as adults.

That’s why clinicians should screen migraine patients, and particularly women, for current abuse.

It’s difficult to how this recommendation is relevant to what has preceded it. Routine screening is not evidence-based.

The authors should know that the World Health Organization formerly recommended screening primary care women for intimate abuse but withdrew the recommendation because of a lack of evidence that it improved outcomes for women facing abuse and a lack of evidence that no harm was being done.

I am sharing this blog post with the authors of The Conversation article. I am requesting a correction from The Conversation. Let’s see what they have to say.

Meanwhile, patients seeking health information are advised to avoid The Conversation.

Lucrative pseudoscience at the International Positive Psychology Association meeting

A plenary session dripping with crank science may be an outlier, but it’s on a continuum with the claims of mainstream positive psychology.

 Follow the conference attendees following the money, does it take you to science?

Imagine…

HMI-Bio-Speaker-Rollin-217x300Imagine a PhD student going to her first positive psychology conference, drawn by the opportunity to hear research oriented psychologists such as Richard Davidson and Jonathan Haidt in one place. But at the first plenary session she attends, Rollin McCraty is talking to an enthralled audience about “the science of what connects us.” McCraty says the heart radiates a measurable magnetic field which carries emotional state information, and can be detected by the nervous systems of nearby.”

Puzzled, she googles McCraty and comes to websites and articles making even more bizarre claims, like

 There is compelling evidence to suggest that the heart’s energy field (energetic heart) is coupled to a field of information that is not bound by the classical limits of time and space.

And even better

This evidence comes from a rigorous experimental study conducted to investigate the proposition that the body receives and processes information about a future event before the event actually happens (McCraty et al 2004a, b). The study’s results provide surprising data showing that both the heart and brain receive and respond to pre-stimulus information about a future event. Even more tantalizing are indications that the heart receives intuitive information before the brain, and that the heart sends a different pattern of afferent signals to the brain which modulates the frontal cortex.

“…about a future event before the event actually happens”? Wow, this puts Daryl Bem’s  claim of precognition to shame. But this claim cannot possibly prepare our PhD student for

A Tidal Wave of Kindness

In the fall of 2013, the IHM [Institute of HeartMath, where McCraty is Director of Research] launched the Global Coherence Initiative. The ambitious goals of this campaign are unprecedented: to quantify the impact of human emotion on the earth’s electromagnetic field and tip the global equation toward greater peace. While this may sound like a utopian fantasy, Dr. McCraty points out that science once again supports this possibility. “If the earth’s fields are a carrier, we are all coupled to this field, all the signals are out there,” he says. “So every emotion we experience is coupled to that field. This creates a global humanity field, if you will.” According to Dr. McCraty, this field is continually fed by our feelings, both positive and negative. The goal is to shift the balance toward the positive. “Any time we’re putting out love and kindness, that energy is not wasted,” he adds.

HeartMath graphThis is crank science far beyond the satire of Alan Sokal hoax article, Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity. But we’re not done yet:

Current IHM research demonstrating the interconnectedness between people has Dr. McCraty very excited. Two studies going on in northern California and Saudi Arabia are monitoring HRV 24/7 to help quantify the interconnectivity between people and how it is affected by nervous system dynamics, the earth’s magnetic fields, solar flares, and even radio frequencies.

At the reception that evening, our PhD student desperately searches for familiar faces of other research oriented PhD students. She manages to find only a few among the oppressively bubbly crowd. And none of her colleagues actually went to the McCraty plenary. Some dismissed him as just pushing the merchandise of the very commercial HeartMath.

Who was attending the International Positive Psychology Association meeting?

Advertisements for the conference advised

who should attend

But any research-oriented attendees were disappointed if they sought first-ever reports of breakthrough, but reproducible science. Personal coaching and organizational and executive consulting themes predominated in the preconference workshops and presentations.

Elements of a trade show blended into a revivalist meeting. Hordes of “certified” life coaches and wannabes were seeking new contacts, positive psychology products, and dubious certificates to hang in their offices. These coaches had paid out-of-pocket, without scholarship for degrees from “approved” masters of arts in positive psychology programs (MAPPs) costing as much as $60,000 a year. Many were hungry. But there are inspiring -positive psychology is about inspiring- stories on the Internet of big bucks being made immediately:

make money

  • MAPP programs typically require no background in behavior science and provide very little training in critical appraisal of research or even ethics.
  • Graduates of MAPP programs general lack ability to determine independently whether claims are evidence-based. They are suckers for anything that superficially sounds and looks sciencey. They are as vulnerable as marital and family therapists who can be readily seduced by claims about therapies that are “soothing the brain” hawked by unscrupulous “neuroscientists” and self-promoters.

Indeed, just go to some coaching websites and see claims of being able to provide clients with wondrous transformations take takes little effort from them.

Positive psychology merchandise. Get certified as a trainer now.

The science is often superficial and even quack. Yet, to compete effectively in a crowded field, positive psychology coaches brandish a label of ‘we are more sciencey than the rest’.

the pp scientist

McCraty’s HeartMath promises that big time science backs its claims of effectiveness.

Over the years we have received numerous reports that coherence training has improved performance in a wide range of cognitive capacities, both short and long-term. These include tasks requiring eye-hand coordination, speed and accuracy, and coordination in various sports as well as cognitive tasks involving executive functions associated with the frontal cortex such as maintaining focus and concentration, problem solving, self-regulation, and abstract thinking.

A study of California correctional officers with high workplace stress found reductions in total cholesterol, glucose, and both systolic and diastolic blood pressure (BP), as well as significant reductions in overall stress, anger, fatigue and hostility with projected savings in annual heath care costs of $1179 per employee (McCraty et al 2009).

stand back scienceUnfortunately McCraty et al 2009 turns out to be a rather dodgy source

McCraty R, Atkinson, M., Tomasino, D., & Bradley, R. T (2009) The coherent heart: Heart-brain interactions, psychophysiological coherence, and the emergence of system-wide order. Integral Review 5: 10–115.

But why stop there?

Hospitals implementing HM programs implementing have seen increased personal, team and organizational coherence. The measures most often assessed are staff retention and employee satisfaction. Cape Fear Valley hospital system in Cape Fear, North Carolina, reduced nurse turn over from 24% to 13%, and Delnor Community Hospital in Chicago saw a similar reduction from 27% to 14% – as well as a dramatic improvement in employee satisfaction, results that have been sustained over an eight year period. Similarly, Duke University’s Health System reduced turnover from 38% to 5% in its emergency services division. An analysis of the combined psychometric data from 3,129 matched pre-post HM coherence trainings found that fatigue, anxiety, depression and anger were reduced by almost half. Another workplace study conducted in large chain of retail stores with in-store pharmacies that employed 220 pharmacists across multiple locations found a reduction is medical errors ranging from 40 to 71%, depending on the store location (HeartMath 2009).

Specific statistics, yes, but, alas, these data are not independently peer reviewed claims nor even transparently presented. They call upon our faith in HeartMath.

If your methods are so powerful, HeartMath, submit your evidence for legitimate peer review.

Shame on me for not doing a systematic review of this literature.

When I posted a critical comment about McCraty on my Facebook wall, I was quickly chastised by a “friend” whom I do not actually know:

Have you read the body of research published by HeartMath? Which articles have you critically reviewed and found flawed? Can you discuss that in detail? Do you know what the Global Coherence Project is? Do you know those methods, their datasets? Are you dismissing this on the idea alone, or on the details of their generated body of scientific work? Are you an expert on electrical fields generated by the human body? Do you know all the work on heart rate variability and its associations with human health and communication? Which part of that body of work are you taking issue with?

Dear Facebook “friend,” don’t you realize that the burden of proof lies on the quacks who wish us to believe ridiculous claims with zero obvious scientific basis? Evidence, please. No plausible mechanism means not worth a serious investigation. And by the way, does anyone know ‘their methods, their data sets,’ outside of HeartMath?

There is so much junk out there and so little time to evaluate it. Skeptics should not waste their time, when they quick-screen for plausible mechanism and find none. That eliminates the bulk of the nonsense bombarding us, even from successful academic positive psychology gurus. Sure, we might miss some dramatic breakthroughs, but prior probabilities are on our side.

The positive psychology – corporate – military complex

Touchy question in the positive psychology community: Was US Defense Department grant money used to reward psychologists for involvement in the CIA torture program for those who protected them from ethical sanctions? There has not been much discussion of this on the tightly controlled Friends of Positive Psychology listserv, only swift denials, but can others get in on the money? Can Rollin McCraty help? A good reason to go to his talk. But, first, some background.

Psychologist Stephen Soldz, Ph.D and colleagues produced a report, American Psychological Association’s Secret Complicity with the White House and US Intelligence Community in Support of the CIA’s ”Enhanced” Interrogation Program. The report contained a number of linked emails that included Paul Ekman, James Mitchell and…Marty Seligman.

Blogger Vaughan Bell states

 To be clear, I am not suggesting that Ekman and Seligman were directly involved in CIA interrogations or torture. Seligman has gone as far as directly denying it on record.

But there is something else interesting which links Ekman, Seligman and Mitchell: lucrative multi-million dollar US Government contracts for security programmes based on little evidence.

Seligman was reportedly awarded a $31 million US Army no-bid contract to develop ‘resilience training’ for soldiers to prevent mental health problems. This was surprising to many as he had no particular experience in developing clinical interventions. It was deployed as the $237 million Comprehensive Soldier Fitness programme, the results of which have only been reported in some oddly incompetent technical reports and are markedly under-whelming. Nicholas Brown’s analysis of the first three evaluative technical reports is particularly good where he notes the tiny effects sizes and shoddy design. A fourth report has since been published (pdf) which also notes “small effect sizes” and doesn’t control for things like combat exposure.

Money from the ineffective Comprehensive Soldier Fitness Progam has been an enormous bonanza for positive psychologists – and even critics willing to mute what they say. Is Rollin McCraty a useful way in? Aside from being Director of Research at Institute of HeartMath (IHM), Rollin McCarty is also Director of Military Training – the HeartMath website tells us – he is working with Major Robert A. Bradley (USAF, Ret., Director of Veterans Outreach.

HeartMath once had a million dollar grant from the US Navy. Their grant portfolio has apparently shrunk to a few thousand dollars. But HeartMath offers training and certification in nice sounding programs. Can hungry MAPP graduate attendees get trained and certificates suitable for framing and make big bucks through HeartMath? The hell with the science, there are sciencey claims that must sell.

We cannot tell how much profit HeartMath is making. We can only get the financial details on their not-for-profit institute, not their for-profit wing. The split between profit and nonprofit wings of training institutes making money and the secrecy is common in training enterprises a common organizational structure for entrepreneurs.

An outlier, but on a continuum with positive psychology (pseudo) science?

Rollin McCraty may be an outlier, but he still lies on a continuum with the most recognized scientists of positive psychology.

Barbara Fredrickson is considered a rock star in the positive psychology community. She has an endowed chair, lots of grant money, and numerous publications in journals where you would never find Mcraty. Yet her papers are often tied to her heavily marketed commercial products, though without the requisite declaration of conflict interest in her papers. Some of her claims have not fared so well with strong hints of shaky and even pseudo science.

Positivity ratio. Fredrickson and Losada (2005) infamously applied a mathematical model drawn from nonlinear dynamics and claimed that a ratio of positive to negative affect of exactly 2.9013 separated flourishing people from those who are merely languishing. Nick Brown, Alan Sokal, and Harris Friedman examined this claim and found

no theoretical or empirical justification for the use of differential equations drawn from fluid dynamics, a subfield of physics, to describe changes in human emotions over time; furthermore, we demonstrate that the purported application of these equations contains numerous fundamental conceptual and mathematical errors.

In response, Fredrickson partially retracted her claim, where visitors can take a 2- minute test to determine whether they are flourishing or languishing and watch Youtube videos.

Meaning is healthier than happiness. Fredrickson and colleagues claimed to have used functional genomics to settle the classical philosophical question of whether we pursue meaning (eudaimonism) in our lives or happiness (hedonism). These claims echoed in the popular press  as

People who are happy but have little-to-no sense of meaning in their lives have the same gene expression patterns as people who are enduring chronic adversity.

My colleagues and I (including Nick Brown and Harris Freidman) took a critical look and reanalyzed Fredrickson and colleagues’ data. We concluded

Fredrickson et al.’s article conceptually deficient, but more crucially that their statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.

The journal where the article originally appeared, PNAS has so far resisted a number calls, including one from Neuroskeptic for retraction of the original article.

Better health and relationships through loving kindness meditation. Much like McCarty, some of Fredrickson’s work makes strong claims about transforming people’s lives by changing cardiac vagal tone. She and colleagues claimed to have shown that practicing loving-kindness meditation (LKM) generates an “upward spiral” of mutual enhancement among positive emotions, social connectedness, and physical health. So,

“Advice about how people might improve their physical health . . . can now be expanded to include self-generating positive emotions.”

My group -again with Nick and Harris, but also James Heathers – took a closer look and reanalyzed the data. We found the study was actually a badly reported clinical trial with null results, evidence concerning the association of cardiac vagal tone and established valid parameters of physical health were contradictory, and carrdiac vagal tone was certainly not a suitable proxy outcome for health in a clinical trial, especially for persons of the age included in Fredrickson’s trial.

love 2Nonetheless, the first hit when I googled “Fredrickson loving kindness meditation” was another Fredrickson commercial website, Love 2.0  offering a book and other products with an eye-catching question:

What if everything you know about love is wrong?

It’s time to upgrade your view of love. Love 2.0 offers new lenses for seeing and more fully appreciating micro-moments of connection. Dr. Barbara Fredrickson gives you the lab-tested tools to unlock more love in your life.

Any wonder why the attendees at International Positive Psychology Association had trouble distinguishing between science and nonsense like what McCarty offered?

 

 

How to critique claims of a “blood test for depression”

Special thanks to Ghassan El-baalbaki and John Stewart for their timely assistance. Much appreciated.

“I hope it is going to result in licensing, investing, or any other way that moves it forward…If it only exists as a paper in my drawer, what good does it do?” – Eva Redei, PhD, first author.

video screenshotMedia coverage of an article in Translational Psychiatry uniformly passed on the authors’ extravagant claims in a press release from Northwestern University that declared that a simple blood test for depression had been found. That is, until I posted a critique of these claims at my secondary blog. As seen on Twitter, the tide of opinion suddenly shifted and considerable skepticism was expressed.

I am now going to be presenting a thorough critique of the article itself. More importantly,translational psychiatry I will be pointing to how, with some existing knowledge and basic tools, many of you can learn to critically examine the credibility of such claims that will inevitably arise in the future. Biomarkers for depression are a hot topic, and John Ioannidis has suggested that means a lot of exaggerated claims about flawed studies are more likely to be the result than real progress.

The article can be downloaded here and the Northwestern University press release here. When I last blogged about this article, I had not seen the 1:58 minute video that is embedded in the press release. I encourage you to view it before my critique and then view it again if you believe that it has any remaining credibility. I do not know where the dividing line is between unsubstantiated claims about scientific research and sheer quackery, but this video tests the boundaries, when evaluated in light of the evidence actually presented in the article.

I am sure that many journalists, medical and mental health professionals, laypersons were intimidated by the mention of “blood transcriptomic biomarkers” in the title of this peer-reviewed article. Surely, the published article had survived evaluation by an editor and reviewers with better, relevant expertise. What is there for an unarmed person to argue about?

Start with the numbers and basic statistics

Skepticism about the study is encouraged by a look at the small numbers of patients involved in the study, which was limited to

  • 64 total participants, 32 depressed patients from a clinical trial and 32 controls.
  • 5 patients were lost from baseline  to follow up.
  • 5 more were lost  from 18 week blood draws, leaving
  • 22 remaining patients –
  • 9 classified as in remission, 13 not in remission.

The authors were interested in differences in 20 blood transcriptomic biomarkers in 2 comparisons: the 32 depressed patients versus 32 controls and the 9 patients who remitted at the end of the trial versus 13 who did not. The authors committed themselves to looking for a clinically significant difference or effect size, which, they tell readers, is defined as .45. We can use a program readily available on the web for a power analysis, which indicates the likelihood of obtaining a statistically significant result (p <.05) for any one of these biomarkers, if differences existed between depressed patients and controls or between the patients who improved in the study versus those who did not. Before even putting these numbers into the calculator, we would expect the likelihood is low because of the size of the sample.

We find that there is only a power of 0.426 for finding one of these individual biomarkers significant, even if it really distinguishes between depressed patients and controls and a power of 0.167 for finding a significant difference in the comparison of the patients who improved versus those who did not.

Bottom line is that this is much too small a sample to address the questions in which the authors are interested – less than 50-50 for identifying a biomarker that actually distinguished between depressed patients and controls and less than 1 in 6 in finding a biomarker actually distinguishing those patients who improved versus those who did not. So, even if the authors really have stumbled upon a valid biomarker, they are unlikely to detect it in these samples.

But there are more problems. For instance, it takes a large difference between groups to achieve statistical significance with such small numbers, so any significant result will be quite large. Yet, with such small numbers, statistical significance is unstable: dropping or adding a few or even a single patient or control or reclassifying a patient as improved or not improved will change the results. And notice that there was some loss of patients to follow-up and to determining whether they improved or not. Selective loss to follow-up is a possible explanation of any differences between the patients considered improved and those who are not considered improved. Indeed, near the end of the discussion, the authors note that patients who were retained for a second blood draw differed in gene transcription from those who did not. This should have tempered claims of finding differences in improved versus unimproved patients, but it did not.

So what I am getting at is that this small sample is likely to produce strong results that will not be replicated in other samples. But it gets still worse –

Samples of 32 depressed patients and 32 controls chosen because they match on age, gender, and race – as they were selected in the current study – can still differ on lots of variables.  The depressed patients are probably more likely to be smokers and to be neurotic. So the authors made only be isolating blood transcriptomic biomarkers associated with innumerable such variables, not depression.

There can be single, unmeasured variables that are the source of any differences or some combination of multiple variables that do not make much difference by themselves, but do so when they are together present in a sample. So,  in such a small sample a few differences affecting a few people can matter greatly. And it does no good to simply do a statistical test between the two groups, because any such test is likely to be underpowered and miss influential differences that are not by themselves so extremely strong that they meet conditions for statistical significance in a small sample.

The authors might be tempted to apply some statistical controls – they actually did in a comparison of the nine versus 13 patients – but that would only compound the problem. Use of statistical controls requires much larger samples, and would likely produce spurious – erroneous – results in such a small sample. Bottom line is that the authors cannot rule out lots of alternative explanations for any differences that they find.

The authors nonetheless claim that 9 of the 20 biomarkers they examined distinguish depressed patients and 3 of these distinguish patients who improve. This is statistically improbable and unlikely to be replicated in subsequent studies.

And then there is the sampling issue. We are going to come back to that later in the blog, but just consider how random or systematic differences can arise between this sample of 32 patients versus 32 controls and what might be obtained with another sampling of the same or a different population. The problem is even more serious when we get down to the 9 versus 13 comparison of patients who completed the trial. A different intervention or a different sample or better follow-up could produce very different results.

So, just looking at the number of available patients and controls, we are not expecting much good science to come out of this study that is pursuing significance levels to define results. I think that many persons familiar with these issues would simply dismissed this paper out of hand after looking at these small numbers.

The authors were aware of the problems in examining 20 biomarkers in such small comparisons. They announced that they would commit themselves to adjusting significance levels for multiple comparisons. With such low ratios of participants in the comparison groups to variables examined, this remains a dubious procedure.  However, when this correction eliminated any differences between the improved and unimproved patients, they simply ignored having done this procedure and went on to discuss results as significant. If you return to the press release and the video, you can see no indication that the authors had applied a procedure that eliminated their ability to claim results as significant. By their own standards, they are crowing about being able to distinguish ahead of time patients who will improve versus those who will not when they did not actually find any biomarkers that did so.

What does the existing literature tell us we should expect?

Our skepticism aroused, we might next want to go to Google Scholar and search for topicspull down menu such as genetics depression, biomarkers depression, blood test depression, etc. [Hint: when you put a set of terms into the search box and click, then pull down the menu on the far right to get an advanced search.]

I could say this takes 25 minutes because that is how much time I spent, but that would be misleading. I recall a jazz composer who claim to write a song in 25 minutes. When the interviewer expressed skepticism, the composer said “Yeah, 25 minutes and 25 years of experience.” I had the advantage of knowing what I was looking for.”

The low heritability of liability for MDD implies an important role for environmental risk factors. Although genotype X environment interaction cannot explain the so-called ‘missing heritability’,52 it can contribute to small effect sizes. Although genotype X environment studies are conceptually attractive, the lessons learned from the most studied genotype X environment hypothesis for MDD (5HTTLPR and stressful life event) are sobering.

And

Whichever way we look at it, and whether risk variants are common or rare, it seems that the challenge for MDD will be much harder than for the less prevalent more heritable psychiatric disorders. Larger samples are required whether we attempt to identify associated variants with small effect across average backgrounds or attempt to enhance detectable effects sizes by selection of homogeneity of genetic or environmental background. In the long-term, a greater understanding of the etiology of MDD will require large prospective, longitudinal, uniformly and broadly phenotyped and genotyped cohorts that allow the joint dissection of the genetic and environmental factors underlying MDD.

[Update suggested on Twitter by Nese Direk, MD] A subsequent even bigger search for the elusive depression gene reported

We analyzed more than 1.2 million autosomal and X chromosome single-nucleotide polymorphisms (SNPs) in 18 759 independent and unrelated subjects of recent European ancestry (9240 MDD cases and 9519 controls). In the MDD replication phase, we evaluated 554 SNPs in independent samples (6783 MDD cases and 50 695 controls)…Although this is the largest genome-wide analysis of MDD yet conducted, its high prevalence means that the sample is still underpowered to detect genetic effects typical for complex traits. Therefore, we were unable to identify robust and replicable findings. We discuss what this means for genetic research for MDD.

So, there is not much encouragement for the present tiny study.

baseline gene expression may contain too much individual variation to identify biomarkers with a given disease, as was suggested by the studies’ authors.

Furthermore it noted that other recent studies had identified markers that either performed poorly in replication studies or were simply not replicated.

Again, not much encouragement for the tiny present study.

[According to Wiktionary, Omics refers to  related measurements or data from such interrelated fields as genomics, proteomics. transcriptomic or other fields.]

The report came about because of numerous concerns expressed by statisticians and bioinformatics scientists concerning the marketing of gene expression-based tests by Duke University. The complaints concerned the lack of an orderly process for validating such tests and the likelihood that these test would not perform as advertised. In response, the IOM convened an expert panel, which noted that many of the studies that became the basis for promoting commercial tests were small, methodological flawed, and relied on statistics that were inappropriate for the size of the samples and the particular research questions.

The committee came up with some strong recommendations for discovering, validating, and evaluating such tests in clinical practice. By these evidence-based standards, the efforts of the authors of the Translational Psychiatry are woefully inadequate and irresponsible in jumping from their modest size study to the claims they are making to the media and possible financial backers, particularly from such a preliminary small study without further replication in an independent sample.

Given that the editor and reviewers of Translational Psychiatry nonetheless accepted this paper for publication, they should be required to read the IOM report. And all of the journalists who passed on ridiculous claims about this article should also read the IOM book.

If we google the same search terms, we come up with lots of press coverage of what work previously claimed as breakthroughs. Almost none of them pan out in replication, despite the initial fanfare. Failures to replicate are much less newsworthy than false discoveries, but once in a while a statement of resignation makes it into the media. For instance,

Depression gene search disappoints

 

 

Click to expand

Looking for love biomarkers in all the wrong places

The existing literature suggests that the investigators have a difficult task looking for what is probably a weak signal with a lot of false positives in the context of a lot of noise. Their task would be simpler if they had a well-defined, relatively homogeneous sample of depressed patients. That is so these patients would be relatively consistent in whatever signal they each gave.

With those criteria, the investigators chose was probably the worst possible sample. They obtained their small sample of 32 depressed patients from a clinical trial comparing face-to-face to Internet cognitive behavioral therapy in a sample recruited from primary medical care.

Patients identified as depressed in primary care are a very mixed group. Keep in mind that the diagnostic criteria require that five of nine symptoms be present for at least two weeks. Many depressed patients in primary care have only five or six symptoms, which are mild and ambiguous. For instance, most women experience sleep disturbance weeks after given birth to an infant. But probing them readily reveals that their sleep is being disturbed by the infant. Similarly, one cardinal symptom of depression is the loss of the ability to experience pleasure, but that is confusing item for primary care patients who do not understand that the loss of the ability is supposed to be due to not being able to experience pleasure, rather than not been able to do things that are previously given them pleasure.

And two weeks is not a long time. It is conceivable that symptoms can be maintained that long in a hostile, unsupportive environment but immediately dissipate when the patient is removed from that environment.

Primary care physicians, if they even adhere to diagnostic criteria, are stuck with the challenge of making a diagnosis based on patients having the minimal number of symptoms, with the required  symptoms often being very mild and ambiguous in themselves.

So, depression in primary care is inherently noisy in terms of its inability to give a clear signal of a single biomarker or a few. It is likely that if a biomarker ever became available, many patients considered depressed now, would not have the biomarker. And what would we make of patients who had the biomarker but did not report symptoms of depression. Would we overrule them and insist that they were really depressed? Or what about patients who exhibited classic symptoms of depression, but did not have the biomarker. When we tell them they are merely miserable and not depressed?

The bottom line is that depression in primary care can be difficult to diagnose and to do so requires a careful interview or maybe the passage of time. In Europe, many guidelines discourage aggressive treatment of mild to moderate depression, particularly with medication. Rather, the suggestion is to wait a few weeks with vigilant monitoring of symptoms and  encouraging the patient to try less intensive interventions, like increased social involvement or behavioral activation. Only with the failure of those interventions to make a difference and the failure of symptoms to resolve the passive time, should a diagnosis and initiation of treatment be considered.

Most researchers agree that rather than looking to primary care, we should look to more severe depression in tertiary care settings, like inpatient or outpatient psychiatry. Then maybe go back and see the extent to which these biomarkers are found in a primary care population.

And then there is the problem by which the investigators defined depression. They did not make a diagnosis with a gold standard, semi structured interview, like the Structured Clinical Interview for DSM Disorders (SCID) administered by trained clinicians. Instead, they relied on a rigid simple interview, the Mini International Neuropsychiatry Interview, more like a questionnaire, that was administered by bachelor-level research assistants. This would hardly pass muster with the Food and Drug Administration (FDA). The investigators had available scores on the interview-administered Hamilton Depression Scale (HAM-D), to measure improvement, but instead relied on the self-report Personal Health Questionnaire (PHQ-9). The reason why they chose this instrument is not clear, but it would again not pass muster with the FDA.

Oh, and finally, the investigators talk about a possible biomarker predicting improvement in psychotherapy. But most of the patients in this study were also receiving antidepressant medication. This means we do not know if the improvement was due to the psychotherapy or the medication, but the general hope for a biomarker is that it can distinguish which patients will respond to one versus the other treatment. The bottom line is that this sample is hopelessly confounded when it comes to predicting response to the psychotherapy.

Why get upset about this study?

I could go on about other difficulties in the study, but I think you can get the picture that this is not a credible study and one that can serve as the basis in search for a blood base, biomarker for depression. It simply absurd to present it as such. But why get upset?

  1. Publication of such low quality research and high profile attempts to pass it off as strong evidence of damage the credibility of all evidence-based efforts to establish the efficacy of diagnostic tools and treatments. This study adds to the sense that much of what we read in the scientific journals and is echoed in the media is simply exaggerated or outright false.
  2. Efforts to promote this article are particularly pernicious in suggesting that primary care physicians can make diagnoses of depression without careful interviewing of patients. The physicians do not need to talk to the patients, they can simply draw blood or give out questionnaires.
  3. Implicit in the promotion of their results has evidence for a blood test of depression is the assumption that depression is a biological phenomenon, strongly influenced by genetic expression, not the environment. Aside from being patently wrong and inconsistent with available evidence, it leads to a reliance on biomedical treatments.
  4. Wide dissemination of the article and press release’s claims serve to reinforce laypersons and clinicians’ belief in the validity of commercially available blood tests of dubious value. These tests can cost as much as $475 per administration and there is no credible evidence, by IOM standards, that they perform superior to simply talking to patients.

At the present time, there is no strong evidence that antidepressants are on average superior in their effects on typical primary care patients, relative to, say, interpersonal psychotherapy (IPT). IPT assumes that regardless of how depression comes about, patient improvement can come about by understanding and renegotiating significant interpersonal relationships. All of the trash talk of these authors contradicts this evidence-based assumption. Namely, they are suggesting that we may soon be approaching an era where even the mild and moderate depression of primary care can be diagnosed and treated without talking to the patient. I say bollocks and shame on the authors who should know better.

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure

NOTE: After I wrote this blog post, I received via PNAS the reply from Steve Cole and Barbara Fredrickson to our article.  I did not have time to thoroughly digest it, but will address it in a future blog post. My preliminary impression is that their reply is, ah…a piece of work. For a start, they attack our mechanical bitmapping of their data as an unvalidated statistical procedure. But calling it a statistical procedure is like Sarah Palin calling Africa a country. And they again assert the validity of  their scoring of a self-report questionnaire without documentation. As seen below, I had already offered to donate $100 to charity if they can produce the unpublished analyses that justified this idiosyncratic scoring. The offer stands. They claim that our factor analyses were in appropriate because the sample size was too small, but we used their data, which they claimed to have factor analyzed. Geesh. But more on their reply later.

Our new PNAS article questions the reliability of results and interpretations in a high profile previous PNAS article.

Fredrickson, Barbara L., Karen M. Grewen, Kimberly A. Coffey, Sara B. Algoe, Ann M. Firestine, Jesusa MG Arevalo, Jeffrey Ma, and Steven W. Cole. “A functional genomic perspective on human well-being.” Proceedings of the National Academy of Sciences 110, no. 33 (2013):   13684-13689.

 

From http://theoaklandjournal.com/oaklandnj/health-happiness-vs-meaning/
Oakland Journal http://tinyurl.com/lpbqqn6
Click to enlarge

Was the original article a matter of “science” made for press release? Our article poses issues concerning the gullibility of the scientific community and journalists regarding claims of breakthrough discoveries from small studies with provocative, but fuzzy theorizing and complicated methodologies and statistical analyses that apparently even the authors themselves do not understand.

  •  Multiple analyses of original data do not find separate factors indicating striving for pleasure versus purpose
  • Random number generators yield best predictors of gene expression from the original data

[Warning, numbers ahead. This blog post contains some excerpts from the results section that contain lots of numbers and require some sophistication to interpret. I encourage readers to at least skim these sections, to allow independent evaluation of some of things that I will say in the rest of the blog.]

A well-orchestrated media blitz for the PNAS article had triggered my skepticism. The Economist, CNN, The Atlantic Monthly and countless newspapers seemingly sang praise in unison for the significance of the article.

objecrtive approach to moralMaybe the research reported in PNAS was, as one the authors, Barbara Fredrickson claimed, a major breakthrough in behavioral genomics, a science-based solution to an age-old philosophical problem of how to lead one’s life.  Or, as she has later claimed in a July 2014 talk in Amsterdam, the PNAS article provided an objective basis for moral philosophy.

Maybe it showed

People who are happy but have little to no sense of meaning in their lives—proverbially, simply here for the party—have the same gene expression patterns as people who are responding to and enduring chronic adversity.

Skeptical? Maybe you are paying too much attention to your conscious mind. What does it know? According to author Steve Cole

What this study tells us is that doing good and feeling good have very different effects on the human genome, even though they generate similar levels of positive emotion… “Apparently, the human genome is much more sensitive to different ways of achieving happiness than are conscious minds.”

Or maybe this PNAS article was an exceptional example of the kind of nonsense, pure bunk, you can find in a prestigious journal.

Assembling a Team.

I blogged about the PNAS article. People whom I have yet to meet expressed concerns similar to mine. We began collaborating, overcoming considerable differences in personal style but taking advantage of complementary skills and background.

It all started with a very tentative email exchange with Nick Brown. He brought on his co-author from his American Psychologist article demolishing any credibility to a precise positivity ratio, Harris Friedman. Harris in turn brought on Doug McDonald to examine Fredrickson and Cole’s claims that factor analysis supported their clean distinction between two forms of well-being with opposite effects on health.

Manoj Samanta found us by way of my blog post and then a Google search that took him electric fishto Nick and Harris’ article with Alan Sokal. Manoj cited my post in his own blog. When Nick saw it, he contacted him. Manoj was working in genomics, attempting to map the common genomic basis for the evolution of electric organs in fish from around the world, but was a physicist in recovery. He was delighted to work with a couple of guys who had a co-authored a paper with his hero from grad school, Alan Sokal. Manoj interpreted Fredrickson and Cole’s seeming unnecessarily complicated approach to genomic analysis. Nick set off to deconstruct and reproduce Cole’s regression analyses predicting genomic expression.  He discovered that Cole’s procedure generated statistically significant (but meaningless) results from over two-thirds of the thousands of ways of splitting the psychometric data.  Even using random numbers produced huge numbers of junk results.

The final group was Nick, Doug, Manoj, Harris, and myself. Others came and went from our email exchanges, some accepting our acknowledgment in the paper, while others asked us explicitly not to acknowledge them.

The team gave an extraordinarily careful look at the article, noting its fuzzy theorizing and conceptual deficiencies, but we did much more than that. We obtained the original data and asked the authors of the original paper about their complex analytic methods. We then reanalyzed the data, following their specific advice. We tried alternative analyses and even re-did the same analyses with randomly generated data. Overall, our hastily assembled group performed and interpreted 1000s of analyses, more than many productive labs do in a year.

The embargo on our paper in PNAS is now off.

I can report our conclusion that

Not only is Fredrickson et al.’s article conceptually deficient, but more crucially statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.

A summary of our PNAS article is available here and the final draft is here.

Fuzzy thinking creates theoretical and general methodological  problems

Fractal_FunFredrickson et al. claimed that two types of strivings for well-being, eudaimonic and hedonic have distinct and opposite effects on physical health, by way of “molecular signaling pathways” or genomic expression, despite an unusually high correlation for two supposedly different variables. I had challenged the authors about the validity of their analyses in my earlier blog post and then in a letter to PNAS, but got blown off. Their reply dismissed my concerns, citing analyses that they have never shown, either in the original article or the reply.

In our article, we noted a subtlety in the distinction between eudamonia and hedonia.

Eudaimonic well-being, generally defined (including by Fredrickson et al.) in terms of tendencies to strive for meaning, appears to be trait-like, since such striving for meaning is typically an ongoing life strategy.

Hedonic well-being, in contrast is typically defined in terms of a person’s (recent) affective experiences, and is state-like; regardless of the level of meaning in one’s life, everyone experiences “good” and “bad” days.

The problem is

If well-being is a state, then a person’s level of well-being will change over time and perhaps at a very fast rate.  If we only measure well-being at one time point, as Fredrickson et al. did, then unless we obtain a genetic sample at the same time, the likelihood that the well-being score will actually accurately reflect level of genomic expression will be diminished if not eliminated.

In an interview with David Dobbs, Steven Cole seems to suggest an irreversibility to thecole big slide changes that eudaimonic and hedonic strivings produce:

“Your experiences today will influence the molecular composition of your body for the next two to three months,” he tells his audience, “or, perhaps, for the rest of your life. Plan your day accordingly.”

Hmm. Really? Evidence?

Eudaimonic and hedonic well-being constructs may have a long history in philosophy, but empirically separating them is an unsolved problem. And taken together, the two constructs by no means capture the complexity of well-being.

Is a scientifically adequate taxonomy of well-being on which to do research even possible? Maybe, but doubts are raised when one considers the overcrowded field of well-being concepts available in the literature—

General well-being, subjective well-being, psychological well-being, ontological well-being, spiritual well-being, religious well-being, existential well-being, chaironic well-being, emotional well-being, and physical well-being—along with the various constructs which treated as essentially synonymous with well-being, such as self-esteem, life-satisfaction, and, lest we forget, happiness.

No one seems to be paying attention to this confusing proliferation of similar constructs and how they are supposed to relate to each other. But in the realm of negative emotion, the problem is well known and variously referred to as the “big mush” or “crud factor”. Actually, there is a good deal of difficulty separating out positive well-being concepts from their obverse concepts, negative well-being.

Fredrickson and colleagues found that eudaimonia and especially hedonic well-being were strongly, but negatively related to depression. Their measures of depression qualified as a covariate or confound for their analyses, but somehow disappeared from further consideration. If it had been retained, it would have further reduced the analyses to gobbledygook. Technically speaking, the residual of hedonia-controlling-for (highly correlated)-eudaimonia-and-depression does not even have a family resemblance to hedonia and is probably nonsense.

Fredrickson et al. measured well-being with which they called the Short Flourishing Scale, AKA and better known in the literature as the Mental Health Continuum-Short Form (MHC-SF).

We looked and we were not able to identify any published evidence of a two factor solution in which distinct eudaimonic and hedonic well-being factors adequately characterized MHC-SF data.

The closest thing we could find was

Keyes et al. (10) referred to these groupings of hedonic and eudaimonic items as “clusters,” an ostensibly neutral term that seems to deliberately avoid the word “factor.”

However, his split of the MHC-SF items into hedonic and eudaimonic categories appears to have been made mainly to to allow arbitrary classifying of persons as “languishing” versus “flourishing.” Yup, positive psychology is now replacing the stigma of conventional psychology’s deficiency model of depressed versus not depressed with a strength model of languishing versus flourishing.

In contrast to the rest of the MHC-SF literature,  Fredrickson at el referred to a factor analysis of – implicitly in their original PNAS paper, and then explicitly in reply to my PNAS letter – yielding two distinct factors (“Hedonic” and “Eudaimonic”), corresponding to Keyes’ languishing versus flourishing diagnoses (i.e., items SF1–SF3 for Hedonic and SF4–SF14 for Eudaimonic).

The data from Fredrickson et al were mostly in the public domain. After getting further psychometric data from Fredrickson’s lab, we set off set off on a thorough reanalysis that should have revealed whatever basis for their claims there might be.

In exploratory factor analyses, which we ran using different extraction (e.g., principal axis, maximum likelihood) and rotation (orthogonal, oblique) methods, we found two factors with eigenvalues greater than 1 with all items producing a loading of .50 on at least one factor.

That’s lots of analyses, but results were consistent:

Examination of factor loading coefficients consistently showed that the first factor was comprised of elevated loadings from 11 items (SF1, SF2, SF3, SF4, SF5, SF9, SF10, SF11, SF12, SF13, and SF14), while the second factor housed high loadings from 3 items (SF6, SF7, and SF8).

Click to enlarge
Click to enlarge

If this is the factor structure Fredrickson and colleagues claim, eudaimonic well-being would have to be the last three items. But look at them in the figure on the left and particularly look at the qualification below. The items seem to reflect living in a particular kind of environment that is safe and supportive of people like the respondent. Actually, these results seem to lend support to my complaint that positive psychology is mainly for rich people: to flourish, one must live in a special environment. If you languish, it is your fault.

Click to enlarge
Click to enlarge

Okay, we did not find much support for the claims of Fredrickson and colleagues, but we gave them another chance with a confirmatory factor analysis (CFA). With this analysis, we would not be looking for the best solution, only learning if either one or two factor models are defensible.

For the one-factor model, goodness-of-fit statistics indicated grossly inadequate fit (χ2 = 227.64, df = 77, GFI = .73, CFI = .83, RMSEA = .154).  Although the equivalent statistics for the correlated two-factor model were slightly better, they still came out as poor (χ2 = 189.40, df = 76, GFI = .78, CFI = .87, RMSEA = .135).

Thus, even though our findings tended to support the view that well-being is best represented as at least a two dimensional construct, we did not confirm Fredrickson et al.’s claim (6) that the MHC-SF produces two factors conforming to hedonic and eudaimonic well-being.

Hey Houston, we’ve got a problem.

As Ryff and Singer (15) put it, “Lacking evidence of scale validity and reliability, subsequent work is pointless” (p. 276).

Maybe we should have thrown in the towel. But if Fredrickson and colleagues could

From Hilda Bastian
From Hilda Bastian

nonetheless proceed to multivariate analyses relating the self-report data to genomic expression, we decided that we would follow in the same path.

Relating self-report data to genomic expression: Random can be better

Fredrickson et al. analytic approach to genomic expression seemed unnecessarily complicated. They repeated regression analyses 53 times (which we came to call RR53) in which they regressed each of 53 genes of interest on eudaimonic and hedonic well-being and a full range of confounding/control variables.  Recall that they had only 80 participants. This approach leaves them lots of room for capitalizing on chance.

So, why not simply regress

the scores for hedonic and eudaimonic well-being on the average expression of the 53 genes of interest, after changing the sign of the values of those genes that were expected to be down-regulated. [?]

After all the authors had said

[T]he goal of this study is to test associations between eudaimonic and hedonic well-being and average levels of expression of specific sets of genes” (p. 1)

We started with our simpler approach.

We conducted a number of such regressions, using different methods of evaluating the “average level of expression” of the 53 CTRA genes of interest (e.g., taking the mean of their raw values, or the mean of their z-scores), but in all cases the model ANOVA was not statistically significant.

Undaunted, we next applied the RR53 regression procedure to see whether it could, in contrast to our simpler “naive” approach, yield such highly significant results with the factors we had derived.

You can read the more technical description of our procedures in our article and its supplementary materials, but our results were

The t-tests for the regression coefficients corresponding to the predictor variables of interest, namely hedonic and eudaimonic well-being, were almost all non-significant (p > .05 in 104 out of 106 cases; mean p = .567, SD = 0.251), and in the two remaining cases (gene FOSL1, for both “hedonic,” p = .047, and “eudaimonic,” p = .030), the overall model ANOVA was not statistically significant (p = .146).

We felt that drawing any substantive conclusions from these coefficients is inappropriate.

Nonetheless, we continued….

We…created two new variables, which we named PWB (corresponding to items SF1–SF5 and SF9–SF14) and EPSE (corresponding to items SF6–SF8).  When we applied Fredrickson et al.’s regression procedure using these variables as the two principal predictor variables of interest (replacing the Hedonic and Eudaimonic factor variables), we discovered that the “effects” of this factor pair were about twice as high as those for the Hedonic and Eudaimonic pair (PWB: up-regulation by 13.6%, p < .001; EPSE: down-regulation by 18.0%, p < .001; see Figures 3 and 4 in the Supporting Information).

Wow, if we accept statistical significance over all other considerations, we actually did better than Fredrickson et al.

Taken seriously, it suggests that the participants’ genes are not only expressing “molecular well-being” but even more vigorously, some other response that we presume Fredrickson et al. might call “molecular social evaluation.”

Or we might conclude that living in a particular kind of environment, is good for your genomic expression.

But we were skeptical about whether we could give substantive interpretations of any kind and so we went wild, using the RR53 procedure with every possible way of splitting up the self-report data. Yup, that is a lot of analyses.

Excluding duplicates due to symmetry, there are 8,191 possible such combinations.  Of these, we found that 5,670 (69.2%) gave statistically significant results using the method described on pp. 1–2 of Fredrickson et al.’s Supporting Information (7) (i.e., the t-tests of the fold differences corresponding to the two elements of the pair of pseudo-factors were both significant at the .05 level), with 3,680 of these combinations (44.9% of the total) having both components significant at the .001 level.

Furthermore, 5,566 combinations (68.0%) generated statistically significant pairs of fold difference values that were greater in magnitude than Fredrickson et al.’s (6, figure 2A) Hedonic and Eudaimonic factors.

While one possible explanation of these results is that differential gene expression is associated with almost any factor combination of the psychometric data, with the study participants’ genes giving simultaneous “molecular expression” to several thousand factors which psychologists have not yet identified, we suspected that there might be a more parsimonious explanation.

But we did not stop there. Bring on the random number generator.

As a further test of the validity of the RR53 procedure, we replaced Fredrickson et al.’s psychometric data (6) with random numbers (i.e., every item/respondent cell was replaced by a random integer in the range 0–5) and re-ran the R program.  We did this in two different ways.  First, we replaced the psychometric data with normally-distributed random numbers, such that the item-level means and standard deviations were close to the equivalent values for the original data.  With these pseudo-data, 3,620 combinations of pseudo-factors (44.2%) gave a pair of fold difference values having t-tests significantly different from zero at the .05 level; of these, 1,478 (18.0% of the total) were both statistically significant at the .001 level.  (We note that, assuming independence of up- and down-regulation of genes, the probability of the latter result occurring by chance with random psychometric data if the RR53 regression procedure does indeed identify differential gene expression as a function of psychometric factors, ought to be—literally—one in a million, i.e. 0.001², rather than somewhere between one in five and one in six.)  Second, we used uniformly-distributed random numbers (i.e., all “responses” were equally likely to appear for any given item and respondent).  With these “white noise” data, we found that 2,874 combinations of pseudo-factors (35.1%) gave a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 893 (10.9% of the total) were both significant at the .001 level.  Finally, we re-ran the program once more, using the same uniformly distributed random numbers, but this time excluding the demographic data and control genes; thus, the only non-random elements supplied to the RR53 procedure were the expression values of the 53 CTRA genes.  Despite the total lack of any information with which to correlate these gene expression values, the procedure generated 2,540 combinations of pseudo-factors (31.0%) with a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 235 (2.9% of the total) were both significant at the .001 level.

Thus, in all cases, we obtained far more statistically significant results using Fredrickson et al.’s methods (6) than would be predicted by chance alone for truly independent variables (i.e., .052 × 8191 ≈ 20), even when the psychometric data were replaced by meaningless random numbers.  To try to identify the source of these puzzling results, we ran simple bivariate correlations on the gene expression variables, which revealed moderate to strong correlations between many of them, suggesting that our significant results were mainly the product of shared variance across criterion variables.  We therefore went back to the original psychometric data, and “scrambled” the CTRA gene expression data, reassigning each cell value for a given gene to a participant selected at random, thus minimizing any within-participants correlation between these values.  When we re-ran the regressions with these data, the number of statistically significant results dropped to just 44 (.54%).

The punchline

To summarize: even when fed entirely random psychometric punchlinedata, the RR53 regression procedure generates large numbers of results that appear, according to these authors’ interpretation, to establish a statistically significant relationship between self-reported well-being and gene expression.  We believe that this regression procedure is, simply put, totally lacking in validity.  It appears to be nothing more than a mechanism for producing apparently statistically significant effects from non-significant regression coefficients, driven by a high degree of correlation between many of the criterion variables.

poof1Despite exhaustive efforts, we could not replicate the authors’ simple factor structure differentiating hedonic versus eudaimonic well-being, upon which their genomic analyses so crucially depended. Then we showed that the complicated RR53 procedure turned random nonsense into statistically significant results. Poof, there is no there there (as Gertrude Stein once said about Oakland, California) in their paper, no evidence of “molecular signaling pathways that transduce positive psychological states into somatic physiology,” just nonsense.

How in the taxonomy of bad science, do we classify first this slipup and the earlier one in American Psychologist? Poor methodological habits, run-of-the-mill scientific sloppiness, innocent probabilistic error, injudicious hype, or simply an unbridled enthusiasm with inadequate grasp of methods and statistics?

Play nice and avoid the trap of negative psychology?

keep-calm-and-radiate-positivityOur PNAS article exposed the unreliability of the results and interpretation offered in a paper claimed to be a game changing breakthrough in our understanding of how positive psychology affects health by way of genomic expression. Science is slow and incomplete in self-correcting. But corrections, even of outright nonsense, seldom garner the attention of the original error. It is just not as newsworthy to find that claims of minor adjustments in everyday behavior modifying gene expression are nonsense as to make unsustainable claims in the first place.

Given the rewards offered by media coverage and even prestigious journals, authors can be expected to be incorrigible in terms of giving in to the urge to orchestrate media attention for ill understood results generated by dubious methods applied in small samples. But the rest of the scientific community and journalists need to keep in mind that most breakthrough discoveries are false, unreplicable, or at least wildly exaggerated.

The authors were offered a chance to respond to my muted and tightly constrained letter to PNAS. Cole and Fredrickson made references to analyses they have never presented and offered misinterpretations of the literature that I cited. I consider their response disingenuous and dismissive of any dialogue. I am willing to apologize for this assessment if they produce the factor analyses of the self-report data to which they pointed. I will even donate $100 to the American Cancer Society if they can produce it. I doubt they will.

Concerns about the unreliability of the scientific and biomedical literature have risen tothanks_for_not_kvetching_small_mugs the threshold of precipitating concern fromthe director of NIMH, Francis Collins. On the other hand,a backlash has  called out critics for encouraging a negative psychology warned to temper our criticism. Evidence of the excesses of critics include “’voodoo correlation’ claims, ‘p-hacking’ investigations, websites like Retraction Watch, Neuroskeptic, [and] a handful of other blogs devoted to exposing bad science”, to caution us that “moral outrage has been conflated with scientific rigor.” We are told we are damaging the credibility of science with criticism and that we should engage authors in clarification rather than criticize them. But I think our experience with this PNAS article demonstrates just how much work it takes to deconstruct outrageous claims based on methods and results that authors poorly understand but nonetheless promote in social media campaigns.. Certainly, there are grounds for skepticism based on prior probabilities, and to be skeptical is not cynical. But is not cynical to construct the pseudoscience of a positivity ratio and then a faux objective basis for moral philosophy?