Flawed meta-analysis reveals just how limited the evidence is mapping meditation into specific regions of the brain

The article put meaningless, but reassuring effect sizes into the literature where these numbers will be widely and uncritically cited.

mind the brain logo

“The only totally incontrovertible conclusion is that much work remains to be done…”.

lit up brain not in telegraph article PNG

Authors of a systematic review and meta-analysis of functional neuroanatomical studies (fMRI and PET) of meditation were exceptionally frank in acknowledging problems relating the practice of meditation to differences in specific regions of the brain. However, they did not adequately deal with problems hiding in plain sight. These problems should have discouraged integration of this literature into a meta-analysis and the authors’ expressing the strength of the association between meditation and the brain in terms of a small set of moderate effect sizes.

The article put meaningless, but reassuring effect sizes into the literature where these numbers will be widely and uncritically cited.

An amazing set of overly small studies with evidence that null findings are being suppressed.

Many in the multibillion mindfulness industry are naive or simply indifferent to what constitutes quality evidence. Their false confidence that “meditation changes the brain*” can be bolstered by selective quotes from this review seemingly claiming that the associations are well-established and practically significant. Readers who are more sophisticated may nonetheless be mislead by this review, unless they read beyond the abstract and with appropriate skepticism.

Read on. I suspect you will be surprised as I was about the small quantity and poor quality of the literature relating the practice of meditation to specific areas of the brain. The colored pictures of the brain widely used to illustrate discussions of meditation are premature and misleading.

As noted in another article :

Brightly coloured brain scans are a media favourite as they are both attractive to the eye and apparently easy to understand but in reality they represent some of the most complex scientific information we have. They are not maps of activity but maps of the outcome of complex statistical comparisons of blood flow that unevenly relate to actual brain function. This is a problem that scientists are painfully aware of but it is often glossed over when the results get into the press.

The article is

Fox KC, Dixon ML, Nijeboer S, Girn M, Floman JL, Lifshitz M, Ellamil M, Sedlmeier P, Christoff K. Functional neuroanatomy of meditation: A review and meta-analysis of 78 functional neuroimaging investigations. Neuroscience & Biobehavioral Reviews. 2016 Jun 30;65:208-28.

Abstract.

Keep in mind how few readers go beyond an abstract in forming an impression of what an article shows. More readers “know” what the meta analysis found solely based on their reading the abstract , relative to the fewer people who read both the article and the supplementary material).

Meditation is a family of mental practices that encompasses a wide array of techniques employing distinctive mental strategies. We systematically reviewed 78 functional neuroimaging (fMRI and PET) studies of meditation, and used activation likelihood estimation to meta-analyze 257 peak foci from 31 experiments involving 527 participants. We found reliably dissociable patterns of brain activation and deactivation for four common styles of meditation (focused attention, mantra recitation, open monitoring, and compassion/loving-kindness), and suggestive differences for three others (visualization, sense-withdrawal, and non-dual awareness practices). Overall, dissociable activation patterns are congruent with the psychological and behavioral aims of each practice. Some brain areas are recruited consistently across multiple techniques—including insula, pre/supplementary motor cortices, dorsal anterior cingulate cortex, and frontopolar cortex—but convergence is the exception rather than the rule. A preliminary effect-size meta-analysis found medium effects for both activations (d = 0.59) and deactivations (d = −0.74), suggesting potential practical significance. Our meta-analysis supports the neurophysiological dissociability of meditation practices, but also raises many methodological concerns and suggests avenues for future research.

The positive claims in the abstract

“…Found reliably dissociable patterns of brain activation and deactivation for four common styles of meditation.”

“Dissociable activation patterns are congruent with the psychological and behavioral aims of each practice.”

“Some brain areas are recruited consistently across multiple techniques”

“A preliminary effect-size meta-analysis found medium effects for both activations (d = 0.59) and deactivations (d = −0.74), suggesting potential practical significance.”

“Our meta-analysis supports the neurophysiological dissociability of meditation practices…”

 And hedges and qualifications in the abstract

“Convergence is the exception rather than the rule”

“[Our meta-analysis] also raises many methodological concerns and suggests avenues for future research.

Why was this systematic review and meta-analysis undertaken now?

A figure provided in the article showed a rapid accumulation of studies of mindfulness in the brain in the past few years, with over 100 studies now available.

However, the authors systematic search yielded “78 functional neuroimaging (fMRI and PET) studies of meditation, and used activation likelihood estimation to meta-analyze 257 peak foci from 31 experiments involving 527 participants.” About a third of the studies identified in a search provided usable data.

What did the authors want to accomplish?

Taken together, our central aims were to: (i) comprehensively review and meta-analyze the existing functional neuroimaging studies of meditation (using the meta-analytic method known as activation likelihood estimation, or ALE), and compare consistencies in brain activation and deactivation both within and across psychologically distinct meditation techniques; (ii) examine the magnitude of the effects that characterize these activation patterns, and address whether they suggest any practical significance; and (iii) articulate the various methodological challenges facing the emerging field of contemplative neuroscience (Caspi and Burleson, 2005; Thompson, 2009; Davidson, 2010; Davidson and Kaszniak, 2015), particularly with respect to functional neuroimaging studies of meditation.

Said elsewhere in the article:

Our central hypothesis was a simple one: meditation practices distinct at the psychological level (Ψ) may be accompanied by dissociable activation patterns at the neurophysiological level (Φ). Such a model describes a ‘one-to-many’ isomorphism between mind and brain: a particular psychological state or process is expected to have many neurophysiological correlates from which, ideally, a consistent pattern can be discerned (Cacioppo and Tassinary, 1990).

The assumption is meditating versus not-meditating brains should be characterized by  distinct, observable neurophysiological pattern. There should also be distinct, enduring changes in the brain in people who have been practicing meditation for some time.

I would wager that many meditation enthusiasts believe that links to specific regions are already well established. Confronted with evidence to the contrary, they would suggest that links between the experience of meditating and changes in the brain are predictable and are waiting to be found. It is that kind of confidence that leads to the significance chasing and confirmatory bias currently infecting this literature.

Types of meditation available for study

Quantitative analyses focused on four types of meditation. Additional terms of meditation did not have sufficient studies and so were examined qualitatively. Some studies of the four provided within-group effect size, whereas other studies provided between-group effect sizes.

Focused attention (7 studies)

Directing attention to one specific object (e.g., the breath or a mantra) while monitoring and disengaging from extraneous thoughts or stimuli (Harvey, 1990, Hanh, 1991, Kabat-Zinn, 2005, Lutz et al., 2008b, Wangyal and Turner, 2011).

Mantra recitation (8 studies)

Repetition of a sound, word, or sentence (spoken aloud or silently in one’s head) with the goals of calming the mind, maintaining focus, and avoiding mind-wandering.

Open monitoring (10 studies)

Bringing attention to the present moment and impartially observing all mental contents (thoughts, emotions, sensations, etc.) as they naturally arise and subside.

Loving-kindness/compassion (6 studies)

L-K involves:

Generating feelings of kindness, love, and joy toward themselves, then progressively extend these feelings to imagined loved ones, acquaintances, strangers, enemies, and eventually all living beings (Harvey, 1990, Kabat_Zinn, 2005, Lutz et al., 2008a).

Similar but not identical, compassion meditation

Takes this practice a step further: practitioners imagine the physical and/or psychological suffering of others (ranging from loved ones to all humanity) and cultivate compassionate attitudes and responses to this suffering.

In addition to these four types of meditation, three others can be identified, but so far have only limited studies of the brain: Visualization, Sense-withdrawal and Non-dual awareness practices.

A dog’s breakfast: A table of the included studies quickly reveals a meta-analysis in deep trouble

studies included

This is not a suitable collection of studies to enter into a meta-analysis with any expectation that a meaningful, generalizable effect size will be obtained.

Most studies (14) furnish only pre-post, within-group effects for mindfulness practiced by long time practitioners. Of these 14 studies, there are two outliers with 20 and 31 practitioners. Otherwise the sample size ranges from 4 to 14.

There are 11 studies furnishing between-group comparisons between experienced and novice meditators. The number of participants in the smaller cell is key for the power of between-group effect sizes, not the overall sample size. In these 11 studies, this ranged from 10 to 22.

It is well-known that one should not combine within- and between- group effect sizes in meta analysis.  Pre-/post-within-group differences capture not only the effects of the active ingredients of an intervention, but nonspecific effects of the conditions under which data are gathered, including regression to the mean. These within-group differences will typically overestimate between-group differences. Adding a  comparison group and calculating between-group differences has the potential for  controlling nonspecific effects, if the comparison condition is appropriate.

The effect sizes based on between-group differences in these studies have their own problems as estimates of the effects of meditation on the brain. Participants were not randomized to the groups, but were selected because they were already either experienced or novice meditators. Yet these two groups could differ on a lot of variables that cannot be controlled: meditation could be confounded with other lifestyle variables: sleeping better or having a better diet. There might be pre-existing differences in the brain that made it easier for the experienced meditators to have committed to long term practice. The authors acknowledge these problems late in the article, but they do so only after discussing the effect sizes they obtained as having substantive importance.

There is good reason to be skeptical that these poorly controlled between-group differences are directly comparable to whatever changes would occur in experienced meditators’ brains in the course of practicing meditation.

It has been widely appreciated that neuroimaging studies are typically grossly underpowered, and that the result is low reproducibility of findings. Having too few participants in a  study will likely yield false negatives because of an inability to achieve the effects needed to obtain significant findings. Small sample size means a stronger association is needed to be significant.

Yet, what positive findings (i.e., significant) are obtained will of necessity be larger likely to be exaggerated and not reproducible with a larger sample.

Another problem with such small cell sizes is that it cannot be assumed that effects are due to one or more participants’ differences in brain size or anatomy. One or a small subgroup of outliers could drive all significant findings in an already small sample. The assumption that statistical techniques can smooth these interindividual differences depends on having much larger samples.

It has been noted elsewhere:

Brains are different so the measure in corresponding voxels across subjects may not sample comparable information.

How did the samples get so small? Neuroanatomical studies are expensive, but why did Lazar et al (2000) have 5 rather 6 participants, or only the 4 participants that Davanger et had? Were from some participants dropped after a peeking at the data? Were studies compromised by authors not being able to recruit intended numbers of participants and having to relax entry criteria? What selection bias is there in these small samples? We just don’t know.

I am reminded of all the contentious debate that has occurred when psychoanalysts insisted on mixing uncontrolled case-series with randomized trials in the same meta-analyses of psychotherapy. My colleagues and I showed this introduces great distortion  into the literature . Undoubtedly, the same is occurring in these studies of meditation, but there is so much else wrong with this meta analysis.

The authors acknowledge that in calculating effect sizes, they combined studies measuring cerebral blood flow (positron emission tomography; PET) and blood oxygenation level (functional magnetic resonance imaging; fMRI). Furthermore, the meta-analyses combined studies that varied in the experimental tasks for which neuroanatomical data were obtained.

One problem is that even studies examining a similar form of meditation might be comparing a meditation practice to very different baseline or comparison tasks and conditions. However, collapsing across numerous different baselines or control conditions is a common (in fact, usually inevitable) practice in meta_analyses of functional neuroimaging studies…

So, there are other important sources of heterogeneity between these studies.

Generic_forest_plot
A generic forest plot. This article did not provide one.

It’s a pity that the authors did not provide a forest plot [How to read  a forest plot.]  graphically showing the confidence intervals around the effect sizes being entered into the meta-analysis.

But the authors did provide a funnel plot that I found shocking. [Recommendations for examining and interpreting funnel plot] I have never seen one like, except when someone has constructed an artificial funnel plot to make a point.

funnel plot

Notice two things about this funnel plot. Rather than a smooth, unbroken distribution, studies with effect sizes between -.45 and +.45 are entirely missing. Studies with smaller sample sizes have the largest effect sizes, whereas the smallest effect sizes all come from the larger samples.

For me, this adds to the overwhelming evidence there is something gone wrong in this literature and any effect sizes should be ignored. There must have been considerable suppression of null findings so large effects from smaller studies will not generalize. Yet, the authors find the differences between small and larger sample studies encouraging

This suggests, encouragingly, that despite potential publication bias or inflationary bias due to neuroimaging analysis methods, nonetheless studies with larger samples tend to converge on similar and more reasonable (medium) effect sizes. Although such a conclusion is tentative, the results to date (Fig. 6) suggest that a sample size of approximately n = 25 is sufficient to reliably produce effect sizes that accord with those reported in studies with much larger samples (up to n = 46).

I and others have long argued that studies of this small sample size in evaluating psychotherapy should be left as pilot feasibility studies and not used to generate effect sizes. I think the same logic applies to this literature.

Distinctive patterns of regional activation and deactivation

The first part of the results section is devoted to studies examining particular forms of meditation. Seeing the apparent consistency of results, one needs to keep in mind the small number of studies being examined and the considerable differences among them. For instance, results presented for focused attention combine three between-group comparisons with four within-group studies. Focused attention includes both pre-post meditation differences from experienced Tibetan Buddhist practitioners to differences between novice and experienced practitioners of mindfulness-based stress reduction (MBSR). In almost all cases, meaningful statistically significant differences are found in both activation and deactivation regions that would make a lot of sense in terms of the functions that are known to be associated with them. There is not much noting of anomalous brain regions being identified by significant effects There is a high ratio of significant findings to number of participants comparisons. There is little discussion of anomalies.

Meta-analysis of focused attention studies resulted in 2 significant clusters of activation, both in prefrontal cortex (Table 3;Fig. 2). Activations were observed in regions associated with the voluntary regulation of thought and action, including the premotor cortex (BA 6; Fig. 2b) and dorsal anterior cingulate cortex (BA24; Fig. 2a). Slightly sub-threshold clusters were also observed in the dorsolateral prefrontal cortex (BA 8/9; Fig. 2c) and left midinsula (BA 13; Fig. 2e); we display these somewhat sub-threshod results here because of the obvious interest of these findings in practices that involve top-down focusing of attention, typically focused on respiration. We also observed clusters of deactivation in regions associated with episodic memory and conceptual processing, including the ventral posterior cingulate cortex (BA 31; Fig. 2d)and left inferior parietal lobule (BA 39; Fig. 2f).

How can such meaningful, practically significant findings obtains when so many conditions mitigate against finding them? John Ioannidis once remarked that in hot areas of research, consistency of positive findings from small studies often reflects only the strength of bias with which they are sought. The strength of findings will decrease when larger, more methodologically sophisticated studies become available, conducted by investigators who are less committed to having to get confirmation.

The article concludes:

Many have understandably viewed the nascent neuroscience of meditation with skepticism (Andresen, 2000; Horgan, 2004), burecent years have seen an increasing number of high-quality, controlled studies that are suitable for inclusion in meta-analyses and that can advance our cumulative knowledge of the neural basis of various meditation practices (Tang et al., 2015). With nearly a hundred functional neuroimaging studies of meditation now reported, we can conclude with some confidence that different practices show relatively distinct patterns of brain activity, and that the magnitude of associated effects on brain function may have some practical significance. The only totally incontrovertible conclusion, however, is that much work remains to be done to confirm and build upon these initial findings.

“Increasing number of high-quality, controlled studies that are suitable for inclusion in meta-analyses” ?…” “Conclude with some confidence…? “Relatively distinct patterns”?… “Some practical significance”?

In all of this premature enthusiasm about findings relating the practice of meditation to activation of particular regions of the brain and deactivation of others, we should not lose track of some other issues.

Although the authors talk about mapping one-to-one relationships between psychological states and regions of the brain, none of the studies would be of sufficient size to document some relationships, given the expected size of the relationship, based on what is typically found between psychological states and other biological variables.

Many differences between techniques could be artifactual –due to the technique altering breathing, involving verbalization, or focused attention. Observed differences in the brain regions activated and deactivated might simply reflect these differences without them being related to psychological functioning.

Even if the association were found, it would be a long way to establishing that the association reflected a causal mechanism, rather than simply being correlational or even artifactual. Think of the analogy of discovering a relationship between the amount of sweat while exercising in concluding that any weight loss was due to sweating it out.

We still have not established that meditation has more psychological and physical health benefits than other active interventions with presumably different mechanisms. After lots of studies, we still don’t know whether mindfulness meditation is anything more than a placebo. While I was finishing up this blog post, I came across a new study:

The limited prosocial effects of meditation: A systematic review and meta-analysis. 

Although we found a moderate increase in prosociality following meditation, further analysis indicated that this effect was qualified by two factors: type of prosociality and methodological quality. Meditation interventions had an effect on compassion and empathy, but not on aggression, connectedness or prejudice. We further found that compassion levels only increased under two conditions: when the teacher in the meditation intervention was a co-author in the published study; and when the study employed a passive (waiting list) control group but not an active one. Contrary to popular beliefs that meditation will lead to prosocial changes, the results of this meta-analysis showed that the effects of meditation on prosociality were qualified by the type of prosociality and methodological quality of the study. We conclude by highlighting a number of biases and theoretical problems that need addressing to improve quality of research in this area. [Emphasis added].

 

 

 

Jane Brody promoting the pseudoscience of Barbara Fredrickson in the New York Times

Journalists’ coverage of positive psychology and health is often shabby, even in prestigious outlets like The New York Times.

Jane Brody’s latest installment of the benefits of being positive on health relied heavily on the work of Barbara Fredrickson that my colleagues and I have thoroughly debunked.

All of us need to recognize that research concerning effects of positive psychology interventions are often disguised randomized controlled trials.

With that insight, we need to evaluate this research in terms of reporting standards like CONSORT and declarations of conflict of interests.

We need to be more skeptical about the ability of small changes in behavior being able to profoundly improve health.

When in doubt, assume that much of what we read in the media about positivity and health is false or at least exaggerated.

Jane Brody starts her article in The New York Times by describing how most mornings she is “grinning from ear to ear, uplifted not just by my own workout but even more so” by her interaction with toddlers on the way home from where she swims. When I read Brody’s “Turning Negative Thinkers Into Positive Ones.” I was not left grinning ear to ear. I was left profoundly bummed.

I thought real hard about what was so unsettling about Brody’s article. I now have some clarity.

I don’t mind suffering even pathologically cheerful people in the morning. But I do get bothered when they serve up pseudoscience as the real thing.

I had expected to be served up Brody’s usual recipe of positive psychology pseudoscience concocted  to coerce readers into heeding her Barnum advice about how they should lead their lives. “Smile or die!” Apologies to my friend Barbara Ehrenreich for my putting the retitling of her book outside of North America to use here. I invoke the phrase because Jane Brody makes the case that unless we do what she says, we risk hurting our health and shortening our lives. So we better listen up.

What bummed me most this time was that Brody was drawing on the pseudoscience of Barbara Fredrickson that my colleagues and I have worked so hard to debunk. We took the trouble of obtaining data sets for two of her key papers for reanalysis. We were dismayed by the quality of the data. To start with, we uncovered carelessness at the level of data entry that undermined her claims. But her basic analyses and interpretations did not hold up either.

Fredrickson publishes exaggerated claims about dramatic benefits of simple positive psychology exercises. Fredrickson is very effective in blocking or muting the publication of criticism and getting on with hawking her wares. My colleagues and I have talked to others who similarly met considerable resistance from editors in getting detailed critiques and re-analyses published. Fredrickson is also aided by uncritical people like Jane Brody to promote her weak and inconsistent evidence as strong stuff. It sells a lot of positive psychology merchandise to needy and vulnerable people, like self-help books and workshops.

If it is taken seriously, Fredrickson’s research concerns health effects of behavioral intervention. Yet, her findings are presented in a way that does not readily allow their integration with the rest of health psychology literature. It would be difficult, for instance, to integrate Fredrickson’s randomized trials of loving-kindness meditation with other research because she makes it almost impossible to isolate effect sizes in a way that they could be integrated with other studies in a meta-analysis. Moreover, Fredrickson has multiply published contradictory claims from the sae data set without acknowledging the duplicate publication. [Please read on. I will document all of these claims before the post ends.]

The need of self-help gurus to generate support for their dramatic claims in lucrative positive psychology self-help products is never acknowledged as a conflict of interest.  It should be.

Just imagine, if someone had a contract based on a book prospectus promising that the claims of their last pop psychology book would be surpassed. Such books inevitably paint life too simply, with simple changes in behavior having profound and lasting effects unlike anything obtained in the randomized trials of clinical and health psychology. Readers ought to be informed that these pressures to meet demands of a lucrative book contract could generate a strong confirmation bias. Caveat emptor auditor, but how about at least informing readers and let them decide whether following the money influences their interpretation of what they read?

Psychology journals almost never require disclosures of conflicts of interest of this nature. I am campaigning to make that practice routine, nondisclosure of such financial benefits tantamount to scientific misconduct. I am calling for readers to take to social media when these disclosures do not appear in scientific journals where they should be featured prominently. And holding editors responsible for non-enforcement . I can cite Fredrickson’s work as a case in point, but there are many other examples, inside and outside of positive psychology.

Back to Jane Brody’s exaggerated claims for Fredrickson’s work.

I lived for half a century with a man who suffered from periodic bouts of depression, so I understand how challenging negativism can be. I wish I had known years ago about the work Barbara Fredrickson, a psychologist at the University of North Carolina, has done on fostering positive emotions, in particular her theory that accumulating “micro-moments of positivity,” like my daily interaction with children, can, over time, result in greater overall well-being.

The research that Dr. Fredrickson and others have done demonstrates that the extent to which we can generate positive emotions from even everyday activities can determine who flourishes and who doesn’t. More than a sudden bonanza of good fortune, repeated brief moments of positive feelings can provide a buffer against stress and depression and foster both physical and mental health, their studies show.

“Research…demonstrates” (?). Brody is feeding stupid-making pablum to readers. Fredrickson’s kind of research may produce evidence one way or the other, but it is too strong a claim, an outright illusion, to even begin suggesting that it “demonstrates” (proves) what follows in this passage.

Where, outside of tabloids and self-help products, do the immodest claims that one or a few poor quality studies “demonstrate”?

Negative feelings activate a region of the brain called the amygdala, which is involved in processing fear and anxiety and other emotions. Dr. Richard J. Davidson, a neuroscientist and founder of the Center for Healthy Minds at the University of Wisconsin — Madison, has shown that people in whom the amygdala recovers slowly from a threat are at greater risk for a variety of health problems than those in whom it recovers quickly.

Both he and Dr. Fredrickson and their colleagues have demonstrated that the brain is “plastic,” or capable of generating new cells and pathways, and it is possible to train the circuitry in the brain to promote more positive responses. That is, a person can learn to be more positive by practicing certain skills that foster positivity.

We are knee deep in neuro-nonsense. Try asking a serious neuroscientists about the claims that this duo have “demonstrated that the brain is ‘plastic,’ or that practicing certain positivity skills change the brain with the health benefits that they claim via Brody. Or that they are studying ‘amygdala recovery’ associated with reduced health risk.

For example, Dr. Fredrickson’s team found that six weeks of training in a form of meditation focused on compassion and kindness resulted in an increase in positive emotions and social connectedness and improved function of one of the main nerves that helps to control heart rate. The result is a more variable heart rate that, she said in an interview, is associated with objective health benefits like better control of blood glucose, less inflammation and faster recovery from a heart attack.

I will dissect this key claim about loving-kindness meditation and vagal tone/heart rate variability shortly.

Dr. Davidson’s team showed that as little as two weeks’ training in compassion and kindness meditation generated changes in brain circuitry linked to an increase in positive social behaviors like generosity.

We will save discussing Richard Davidson for another time. But really, Jane, just two weeks to better health? Where is the generosity center in brain circuitry? I dare you to ask a serious neuroscientist and embarrass yourself.

“The results suggest that taking time to learn the skills to self-generate positive emotions can help us become healthier, more social, more resilient versions of ourselves,” Dr. Fredrickson reported in the National Institutes of Health monthly newsletter in 2015.

In other words, Dr. Davidson said, “well-being can be considered a life skill. If you practice, you can actually get better at it.” By learning and regularly practicing skills that promote positive emotions, you can become a happier and healthier person. Thus, there is hope for people like my friend’s parents should they choose to take steps to develop and reinforce positivity.

In her newest book, “Love 2.0,” Dr. Fredrickson reports that “shared positivity — having two people caught up in the same emotion — may have even a greater impact on health than something positive experienced by oneself.” Consider watching a funny play or movie or TV show with a friend of similar tastes, or sharing good news, a joke or amusing incidents with others. Dr. Fredrickson also teaches “loving-kindness meditation” focused on directing good-hearted wishes to others. This can result in people “feeling more in tune with other people at the end of the day,” she said.

Brody ends with 8 things Fredrickson and others endorse to foster positive emotions. (Why only 8 recommendations, why not come up with 10 and make them commandments?) These include “Do good things for other people” and “Appreciate the world around you. Okay, but do Fredrickson and Davidson really show that engaging in these activities have immediate and dramatic effects on our health? I have examined their research and I doubt it. I think the larger problem, though, is the suggestion that physically ill people facing shortened lives risk being blamed for being bad people. They obviously did not do these 8 things or else they would be healthy.

If Brody were selling herbal supplements or coffee enemas, we would readily label the quackery. We should do the same for advice about psychological practices that are promised to transform lives.

Brody’s sloppy links to support her claims: Love 2.0

Journalists who talk of “science”  and respect their readers will provide links to their actual sources in the peer-reviewed scientific literature. That way, readers who are motivated can independently review the evidence. Especially in an outlet as prestigious as The New York Times.

Jane Brody is outright promiscuous in the links that she provides, often secondary or tertiary sources. The first link provide for her discussion of Fredrickson’s Love 2.0 is actually to a somewhat negative review of the book. https://www.scientificamerican.com/article/mind-reviews-love-how-emotion-afftects-everything-we-feel/

Fredrickson builds her case by expanding on research that shows how sharing a strong bond with another person alters our brain chemistry. She describes a study in which best friends’ brains nearly synchronize when exchanging stories, even to the point where the listener can anticipate what the storyteller will say next. Fredrickson takes the findings a step further, concluding that having positive feelings toward someone, even a stranger, can elicit similar neural bonding.

This leap, however, is not supported by the study and fails to bolster her argument. In fact, most of the evidence she uses to support her theory of love falls flat. She leans heavily on subjective reports of people who feel more connected with others after engaging in mental exercises such as meditation, rather than on more objective studies that measure brain activity associated with love.

I would go even further than the reviewer. Fredrickson builds her case by very selectively drawing on the literature, choosing only a few studies that fit.  Even then, the studies fit only with considerable exaggeration and distortion of their findings. She exaggerates the relevance and strength of her own findings. In other cases, she says things that have no basis in anyone’s research.

I came across Love 2.0: How Our Supreme Emotion Affects Everything We Feel, Think, Do, and Become (Unabridged) that sells for $17.95. The product description reads:

We all know love matters, but in this groundbreaking book positive emotions expert Barbara Fredrickson shows us how much. Even more than happiness and optimism, love holds the key to improving our mental and physical health as well as lengthening our lives. Using research from her own lab, Fredrickson redefines love not as a stable behemoth, but as micro-moments of connection between people – even strangers. She demonstrates that our capacity for experiencing love can be measured and strengthened in ways that improve our health and longevity. Finally, she introduces us to informal and formal practices to unlock love in our lives, generate compassion, and even self-soothe. Rare in its scope and ambitious in its message, Love 2.0 will reinvent how you look at and experience our most powerful emotion.

There is a mishmash of language games going on here. Fredrickson’s redefinition of love is not based on her research. Her claim that love is ‘really’ micro-moments of connection between people  – even strangers is a weird re-definition. Attempt to read her book, if you have time to waste.

You will quickly see that much of what she says makes no sense in long-term relationships which is solid but beyond the honeymoon stage. Ask partners in long tem relationships and they will undoubtedly lack lots of such “micro-moments of connection”. I doubt that is adaptive for people seeking to build long term relationships to have the yardstick that if lots of such micro-moments don’t keep coming all the time, the relationship is in trouble. But it is Fredrickson who is selling the strong claims and the burden is on her to produce the evidence.

If you try to take Fredrickson’s work seriously, you wind up seeing she has a rather superficial view of a close relationships and can’t seem to distinguish them from what goes on between strangers in drunken one-night stands. But that is supposed to be revolutionary science.

We should not confuse much of what Fredrickson emphatically states with testable hypotheses. Many statements sound more like marketing slogans – what Joachim Kruger and his student Thomas Mairunteregger identify as the McDonaldalization of positive psychology. Like a Big Mac, Fredrickson’s Love 2.0 requires a lot of imagination to live up to its advertisement.

Fredrickson’s love the supreme emotion vs ‘Trane’s Love Supreme

Where Fredrickson’s selling of love as the supreme emotion is not simply an advertising slogan, it is a bad summary of the research on love and health. John Coltrane makes no empirical claim about love being supreme. But listening to him is an effective self-soothing after taking Love 2.0 seriously and trying to figure it out.  Simply enjoy and don’t worry about what it does for your positivity ratio or micro-moments, shared or alone.

Fredrickson’s study of loving-kindness meditation

Jane Brody, like Fredrickson herself depends heavily on a study of loving kindness meditation in proclaiming the wondrous, transformative health benefits of being loving and kind. After obtaining Fredrickson’s data set and reanalyzing it, my colleagues – James Heathers, Nick Brown, and Harrison Friedman – and I arrived at a very different interpretation of her study. As we first encountered it, the study was:

Kok, B. E., Coffey, K. A., Cohn, M. A., Catalino, L. I., Vacharkulksemsuk, T., Algoe, S. B., . . . Fredrickson, B. L. (2013). How positive emotions build physical health: Perceived positive social connections account for the upward spiral between positive emotions and vagal tone. Psychological Science, 24, 1123-1132.

Consolidated standards for reporting randomized trials (CONSORT) are widely accepted for at least two reasons. First, clinical trials should be clearly identified as such in order to ensure that the results are a recognized and available in systematic searches to be integrated with other studies. CONSORT requires that RCTs be clearly identified in the titles and abstracts. Once RCTs are labeled as such, the CONSORT checklist becomes a handy tallying of what needs to be reported.

It is only in supplementary material that the Kok and Fredrickson paper is identify as a clinical trial. Only in that supplement is the primary outcome is identified, even in passing. No means are reported anywhere in the paper or supplement. Results are presented in terms of what Kok and Fredrickson term “a variant of a mediational, parallel process, latent-curve model.” Basic statistics needed for its evaluation are left to readers’ imagination. Figure 1 in the article depicts the awe-inspiring parallel-process mediational model that guided the analyses. We showed the figure to a number of statistical experts including Andrew Gelman. While some elements were readily recognizable, the overall figure was not, especially the mysterious large dot (a causal pathway roundabout?) near the top.

So, not only might study not be detected as an RCT, there isn’t relevant information that could be used for calculating effect sizes.

Furthermore, if studies are labeled as RCTs, we immediately seek protocols published ahead of time that specify the basic elements of design and analyses and primary outcomes. At Psychological Science, studies with protocols are unusual enough to get the authors awarded a badge. In the clinical and health psychology literature, protocols are increasingly common, like flushing a toilet after using a public restroom. No one runs up and thanks you, “Thank you for flushing/publishing your protocol.”

If Fredrickson and her colleagues are going to be using the study to make claims about the health benefits of loving kindness meditation, they have a responsibility to adhere to CONSORT and to publish their protocol. This is particularly the case because this research was federally funded and results need to be transparently reported for use by a full range of stakeholders who paid for the research.

We identified a number of other problems and submitted a manuscript based on a reanalysis of the data. Our manuscript was promptly rejected by Psychological Science. The associate editor . Batja Mesquita noted that two of my co-authors, Nick Brown and Harris Friedman had co-authored a paper resulting in a partial retraction of Fredrickson’s, positivity ratio paper.

Brown NJ, Sokal AD, Friedman HL. The Complex Dynamics of Wishful Thinking: The Critical Positivity Ratio American Psychologist. 2013 Jul 15.

I won’t go into the details, except to say that Nick and Harris along with Alan Sokal unambiguously established that Fredrickson’s positivity ratio of 2.9013 positive to negative experiences was a fake fact. Fredrickson had been promoting the number  as an “evidence-based guideline” of a ratio acting as a “tipping point beyond which the full impact of positive emotions becomes unleashed.” Once Brown and his co-authors overcame strong resistance to getting their critique published, their paper garnered a lot of attention in social and conventional media. There is a hilariously funny account available at Nick Brown Smelled Bull.

Batja Mesquita argued that that the previously published critique discouraged her from accepting our manuscript. To do, she would be participating in “a witch hunt” and

 The combatant tone of the letter of appeal does not re-assure me that a revised commentary would be useful.

Welcome to one-sided tone policing. We appealed her decision, but Editor Eric Eich indicated, there was no appeal process at Psychological Science, contrary to the requirements of the Committee on Publication Ethics, COPE.

Eich relented after I shared an email to my coauthors in which I threatened to take the whole issue into social media where there would be no peer-review in the traditional outdated sense of the term. Numerous revisions of the manuscript were submitted, some of them in response to reviews by Fredrickson  and Kok who did not want a paper published. A year passed occurred before our paper was accepted and appeared on the website of the journal. You can read our paper here. I think you can see that fatal problems are obvious.

Heathers JA, Brown NJ, Coyne JC, Friedman HL. The elusory upward spiral a reanalysis of Kok et al.(2013). Psychological Science. 2015 May 29:0956797615572908.

In addition to the original paper not adhering to CONSORT, we noted

  1. There was no effect of whether participants were assigned to the loving kindness mediation vs. no-treatment control group on the key physiological variable, cardiac vagal tone. This is a thoroughly disguised null trial.
  2. Kok and Frederickson claimed that there was an effect of meditation on cardiac vagal tone, but any appearance of an effect was due to reduced vagal tone in the control group, which cannot readily be explained.
  3. Kok and Frederickson essentially interpreted changes in cardiac vagal tone as a surrogate outcome for more general changes in physical health. However, other researchers have noted that observed changes in cardiac vagal tone are not consistently related to changes in other health variables and are susceptible to variations in experimental conditions that have nothing to do with health.
  4. No attention was given to whether participants assigned to the loving kindness meditation actually practiced it with any frequency or fidelity. The article nonetheless reported that such data had been collected.

Point 2 is worth elaborating. Participants in the control condition received no intervention. Their assessment of cardiac vagal tone/heart rate variability was essentially a test/retest reliability test of what should have been a stable physiological characteristic. Yet, participants assigned to this no-treatment condition showed as much change as the participants who were assigned to meditation, but in the opposite direction. Kok and Fredrickson ignored this and attributed all differences to meditation. Houston, we have a problem, a big one, with unreliability of measurement in this study.

We could not squeeze all of our critique into our word limit, but James Heathers, who is an expert on cardiac vagal tone/heart rate variability elaborated elsewhere.

  • The study was underpowered from the outset, but sample size decreased from 65 to 52 to missing data.
  • Cardiac vagal tone is unreliable except in the context of carefully control of the conditions in which measurements are obtained, multiple measurements on each participant, and a much larger sample size. None of these conditions were met.
  • There were numerous anomalies in the data, including some participants included without baseline data, improbable baseline or follow up scores, and improbable changes. These alone would invalidate the results.
  • Despite not reporting  basic statistics, the article was full of graphs, impressive to the unimformed, but useless to readers attempting to make sense of what was done and with what results.

We later learned that the same data had been used for another published paper. There was no cross-citation and the duplicate publication was difficult to detect.

Kok, B. E., & Fredrickson, B. L. (2010). Upward spirals of the heart: Autonomic flexibility, as indexed by vagal tone, reciprocally and prospectively predicts positive emotions and social connectedness. Biological Psychology, 85, 432–436. doi:10.1016/j.biopsycho.2010.09.005

Pity the poor systematic reviewer and meta analyst trying to make sense of this RCT and integrate it with the rest of the literature concerning loving-kindness meditation.

This was not our only experience obtained data for a paper crucial to Fredrickson’s claims and having difficulty publishing  our findings. We obtained data for claims that she and her colleagues had solved the classical philosophical problem of whether we should pursue pleasure or meaning in our lives. Pursuing pleasure, they argue, will adversely affect genomic transcription.

We found we could redo extremely complicated analyses and replicate original findings but there were errors in the the original entering data that entirely shifted the results when corrected. Furthermore, we could replicate the original findings when we substituted data from a random number generator for the data collected from study participants. After similar struggles to what we experienced with Psychological Science, we succeeded in getting our critique published.

The original paper

Fredrickson BL, Grewen KM, Coffey KA, Algoe SB, Firestine AM, Arevalo JM, Ma J, Cole SW. A functional genomic perspective on human well-being. Proceedings of the National Academy of Sciences. 2013 Aug 13;110(33):13684-9.

Our critique

Brown NJ, MacDonald DA, Samanta MP, Friedman HL, Coyne JC. A critical reanalysis of the relationship between genomics and well-being. Proceedings of the National Academy of Sciences. 2014 Sep 2;111(35):12705-9.

See also:

Nickerson CA. No Evidence for Differential Relations of Hedonic Well-Being and Eudaimonic Well-Being to Gene Expression: A Comment on Statistical Problems in Fredrickson et al.(2013). Collabra: Psychology. 2017 Apr 11;3(1).

A partial account of the reanalysis is available in:

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure. PLOS Blogs Mind the Brain

Wrapping it up

Strong claims about health effects require strong evidence.

  • Evidence produced in randomized trials need to be reported according to established conventions like CONSORT and clear labeling of duplicate publications.
  • When research is conducted with public funds, these responsibilities are increased.

I have often identified health claims in high profile media like The New York Times and The Guardian. My MO has been to trace the claims back to the original sources in peer reviewed publications, and evaluate both the media reports and the quality of the primary sources.

I hope that I am arming citizen scientists for engaging in these activities independent of me and even to arrive at contradictory appraisals to what I offer.

  • I don’t think I can expect to get many people to ask for data and perform independent analyses and certainly not to overcome the barriers my colleagues and I have met in trying to publish our results. I share my account of some of those frustrations as a warning.
  • I still think I can offer some take away messages to citizen scientists interested in getting better quality, evidence-based information on the internet.
  • Assume most of the claims readers encounter about psychological states and behavior being simply changed and profoundly influencing physical health are false or exaggerated. When in doubt, disregard the claims and certainly don’t retweet or “like” them.
  • Ignore journalists who do not provide adequate links for their claims.
  • Learn to identify generally reliable sources and take journalists off the list when they have made extravagant or undocumented claims.
  • Appreciate the financial gains to be made by scientists who feed journalists false or exaggerated claims.

Advice to citizen scientists who are cultivating more advanced skills:

Some key studies that Brody invokes in support of her claims being science-based are poorly conducted and reported clinical trials that are not labeled as such. This is quite common in positive psychology, but you need to cultivate skills to even detect that is what is going on. Even prestigious psychology journals are often lax in labeling studies as RCTs and in enforcing reporting standards. Authors’ conflicts of interest are ignored.

It is up to you to

  • Identify when the claims you are being fed should have been evaluated in a clinical trial.
  • Be skeptical when the original research is not clearly identified as clinical trial but nonetheless compares participants who received the intervention and those who did not.
  • Be skeptical when CONSORT is not followed and there is no published protocol.
  • Be skeptical of papers published in journals that do not enforce these requirements.

Disclaimer

I think I have provided enough details for readers to decide for themselves whether I am unduly influenced by my experiences with Barbara Fredrickson and her data. She and her colleagues have differing accounts of her research and of the events I have described in this blog.

As a disclosure, I receive money for writing these blog posts, less than $200 per post. I am also marketing a series of e-books,  including Coyne of the Realm Takes a Skeptical Look at Mindfulness and Coyne of the Realm Takes a Skeptical Look at Positive Psychology.

Maybe I am just making a fuss to attract attention to these enterprises. Maybe I am just monetizing what I have been doing for years virtually for free. Regardless, be skeptical. But to get more information and get on a mailing list for my other blogging, go to coyneoftherealm.com and sign up.

Misleading systematic review of mindfulness studies used to promote Bensen Institute for Mind-Body Medicine services

A seriously flawed overview “systematic review “ of systematic reviews and meta-analyses of the effects of mindfulness on health and well-being alerts readers how they need to be skeptical of what they are told about the benefits of mindfulness.

Especially when the information comes those benefiting enormously from promoting the practice.

The glowing evaluation of the benefits of mindfulness presented in a PLOS One review is contradicted by a more comprehensive and systematic review which was cited but summarily dismissed. As we will see, the PLOS One article sidesteps substantial confirmation bias and untrustworthiness in the mindfulness literature.

The review was prepared by authors associated with the Benson-Henry Institute for Mind-Body Medicine, which is tied to Massachusetts General Hospital and Harvard Medical School. The institute directly markets mindfulness treatment to patients and training to professionals and organizations.  Its website provides links to research articles such as this one, which are used to market a wide range of programs –

being calm

Recently PLOS One published corrections to five articles from this group concerning previous statements about the authors having no conflicts of interest to declare. The corrections acknowledged extensive conflicts of interest.

The Competing Interests statement is incorrect. The correct Competing Interests statement is: The following authors hold or have held positions at the Benson-Henry Institute for Mind-Body Medicine at Massachusetts General Hospital, which is paid by patients and their insurers for running the SMART-3RP and related relaxation/mindfulness clinical programs, markets related products such as books, DVDs, CDs and the like, and holds a patent pending (PCT/US2012/049539 filed August 3, 2012) entitled “Quantitative Genomics of the Relaxation Response.”

While the review we will be discussing was not corrected, it should have been.

The same conflicts of interest should have been disclosed to readers evaluating the trustworthiness of what is being presented to them.

Probing this review will demonstrate just how hard it is to uncover the bias and distortions that routinely is provided by promoters of mindfulness wanting to demonstrate the evidence base for what they offer.

The article is

Gotink, R.A., Chu, P., Busschbach, J.J., Benson, H., Fricchione, G.L. and Hunink, M.M., 2015. Standardised mindfulness-based interventions in healthcare: an overview of systematic reviews and meta-analyses of RCTs. PLOS One, 10(4), p.e0124344.

The abstract offers the conclusion:

The evidence supports the use of MBSR and MBCT to alleviate symptoms, both mental and physical, in the adjunct treatment of cancer, cardiovascular disease, chronic pain, depression, anxiety disorders and in prevention in healthy adults and children.

This evaluation is more emphatically stated near the end of the article:

This review provides an overview of more trials than ever before and the intervention effect has thus been evaluated across a broad spectrum of target conditions, most of which are common chronic conditions. Study settings in many countries across the globe contributed to the analysis, further serving to increase the generalizability of the evidence. Beneficial effects were mostly seen in mental health outcomes: depression, anxiety, stress and quality of life improved significantly after training in MBSR or MBCT. These effects were seen both in patients with medical conditions and those with psychological disorders, compared with many types of control interventions (WL, TAU or AT). Further evidence for effectiveness was provided by the observed dose-response relationship: an increase in total minutes of practice and class attendance led to a larger reduction of stress and mood complaints in four reviews [18,20,37,54].

Are you impressed? “More than ever before”? “Generalizability of the evidence”? Really?

And in wrap up summary comments:

Although there is continued scepticism in the medical world towards MBSR and MBCT, the evidence indicates that MBSR and MBCT are associated with improvements in depressive symptoms, anxiety, stress, quality of life, and selected physical outcomes in the adjunct treatment of cancer, cardiovascular disease, chronic pain, chronic somatic diseases, depression, anxiety disorders, other mental disorders and in prevention in healthy adults and children.

Compare and contrast these conclusions with a more balanced and comprehensive review.

The US Agency for Healthcare Research and Quality (AHCRQ) commissioned a report from Johns Hopkins University Evidence-based Practice Center.

The 439 page report is publicly available:

Goyal M, Singh S, Sibinga EMS, Gould NF, Rowland-Seymour A, Sharma R, Berger Z, Sleicher D, Maron DD, Shihab HM, Ranasinghe PD, Linn S, Saha S, Bass EB, Haythornthwaite JA. Meditation Programs for Psychological Stress and Well-Being. Comparative Effectiveness Review No. 124. (Prepared by Johns Hopkins University Evidence-based Practice Center under Contract No. 290-2007-10061–I.) AHRQ Publication No. 13(14)-EHC116-EF. Rockville, MD: Agency for Healthcare Research and Quality; January 2014.

A companion, less detailed article was also published in JAMA: Internal Medicine:

Goyal, M., Singh, S., Sibinga, E.M., Gould, N.F., Rowland-Seymour, A., Sharma, R., Berger, Z., Sleicher, D., Maron, D.D., Shihab, H.M. and Ranasinghe, P.D., 2014. Meditation programs for psychological stress and well-being: a systematic review and meta-analysis. JAMA Internal Medicine, 174(3), pp.357-368.

Consider how conclusions of this article were characterized in the Bensen-Henry PLOS One article. The article is briefly mentioned without detailing its methods and conclusions.

Recently, Goyal et al. published a review of mindfulness interventions compared to active control and found significant improvements in depression and anxiety[7].

And

A recent review compared meditation to only active control groups, and although lower, also found a beneficial effect on depression, anxiety, stress and quality of life. This review was excluded in our study for its heterogeneity of interventions [7].

What the Goyal et JAMA: Internal Medicine actually said:

After reviewing 18 753 citations, we included 47 trials with 3515 participants. Mindfulness meditation programs had moderate evidence of improved anxiety (effect size, 0.38 [95% CI, 0.12-0.64] at 8 weeks and 0.22 [0.02-0.43] at 3-6 months), depression (0.30 [0.00-0.59] at 8 weeks and 0.23 [0.05-0.42] at 3-6 months), and pain (0.33 [0.03- 0.62]) and low evidence of improved stress/distress and mental health–related quality of life. We found low evidence of no effect or insufficient evidence of any effect of meditation programs on positive mood, attention, substance use, eating habits, sleep, and weight. We found no evidence that meditation programs were better than any active treatment (ie, drugs, exercise, and other behavioral therapies).

The review also notes that evidence of the effectiveness mindfulness interventions is largely limited to trials in which it is compared to no treatment, wait list, or a usually ill-defined treatment as usual (TAU).

In our comparative effectiveness analyses (Figure 1B), we found low evidence of no effect or insufficient evidence that any of the meditation programs were more effective than exercise, progressive muscle relaxation, cognitive-behavioral group therapy, or other specific comparators in changing any outcomes of interest. Few trials reported on potential harms of meditation programs. Of the 9 trials reporting this information, none reported any harms of the intervention.

This solid JAMA: Internal Medicine review explains why its conclusions may differ from past reviews:

Reviews to date report a small to moderate effect of mindfulness and mantra meditation techniques in reducing emotional symptoms (eg, anxiety, depression, and stress) and improving physical symptoms (eg, pain).7– 26 These reviews have largely included uncontrolled and controlled studies, and many of the controlled studies did not adequately control for placebo effects (eg, waiting list– or usual care–controlled studies). Observational studies have a high risk of bias owing to problems such as self-selection of interventions (people who believe in the benefits of meditation or who have prior experience with meditation are more likely to enroll in a meditation program and report that they benefited from one) and use of outcome measures that can be easily biased by participants’ beliefs in the benefits of meditation. Clinicians need to know whether meditation training has beneficial effects beyond self-selection biases and the nonspecific effects of time, attention, and expectations for improvement.27,28

Basically, this article insists that mindfulness be evaluated in a  head-to- head comparison to an active treatment. Failure to provide such a comparison means not being able to rule out that apparent effects of mindfulness are nonspecific, i.e.,  not due to any active ingredient of the practice.

An accompanying editorial commentary raised troubling issues about the state of the mindfulness literature. It noted that limiting inclusion to RCTs with an active control condition and a patient population experiencing mental or physical health problems left only 3% (47/18,753 of the citations that had been retrieved. Furthermore:

The modest benefit found in the study by Goyal et al begs the question of why, in the absence of strong scientifically vetted evidence, meditation in particular and complementary measures in general have become so popular, especially among the influential and well educated…What role is being played by commercial interests? Are they taking advantage of the public’s anxieties to promote use of complementary measures that lack a base of scientific evidence? Do we need to require scientific evidence of efficacy and safety for these measures?

How did the Bensen-Henry review arrive at a more favorable assessment?

The issue that dominated the solid Goyal et al systematic review and meta analysis is not prominent in the Bensen-Henry review. The latter article hardly mentions the importance of whether mindfulness is compared to an active treatment. It doesn’t mention if any difference in effect size for mindfulness can be expected when the comparison is an active treatment.

The Bensen-Henry review stated that it excluded systematic reviews and meta analyses if they did not focus on MBCT or MBSR. One has to search the supplementary materials to find that Goyal et al was excluded because it did not calculate separate effect sizes for mindfulness-based stress reduction (MBSR).

However, Bensen-Henry review included narrative systematic reviews that did not calculate effect sizes at all. Furthermore, the excluded Goyal et al JAMA: Internal Medicine article summarized MBSR separate from other forms of meditation and the more comprehensive AHCQR report provided detailed forest plots of effect sizes for MBSR with specific outcomes and patient populations.

Hmm, keeping out evidence that does fit with the sell-job story?

We need to keep in mind the poor manner in which MBSR was specified, particularly in the early studies that dominate the reviews covered by the Bensen – Henry article. Many of the treatments were not standardized and certainly not manualized. They sometimes, but not always incorporate psychoeducation, other cognitive behavioral techniques, and varying types of yoga.

The Bensen-Henry authors claimed to have performed quality assessments  of the reviews  included using a checklist based on the validated PRISMA guidelines. However, PRISMA evaluates the quality of reporting in reviews, not the quality of how the review was done. The checklist used by the Bensen-Henry authors was highly selective in terms of which PRISMA items it chose to include, left unvalidated, and simply eccentric. For instance, one item evaluated a review favorably if it interpreted studies “independent of funding source.”

A lack of independence of a study from its funding source is generally considered a high risk of bias.  There is ample documentation of  industry-funded studies and reviews exaggerating the efficacy of interventions supported by industry.

Our group received the Bill Silverman Prize from the Cochrane Collaboration for our identifying funding source as an overlooked source of bias in many meta analyses and, in particular, in Cochrane reviews. The Bensen-Henry checklist scores a review ignoring funding source as a virtue, not a vice! These authors are letting trials and reviews from promoters of mindfulness off the hook for potential conflict of interest, including their own studies and this review.

Examination of the final sample of reviews included in the Bensen-Henry analysis reveals that some are narrative reviews and could not contribute effect sizes. Some are older reviews that depend on a less developed literature. While optimistic about the promise of mindfulness, authors of these reviews frequently complained about the limits on the quantity and quality of available studies, calling for larger and better quality studies. When integrated and summarized by the Bensen-Henry authors, these reviews were given a more positive glow than the original authors conveyed.

Despite claims of being an “overview of more trials than ever before”, Bensen-Henry excluded all but 23 reviews. Some of those included do not appear to be recent or rigorous, particularly when contrasted with the quality and rigor of the excluded Goyal et al:

MJ, Norris RL, Bauer-Wu SM (2006) Mindfulness meditation for oncology patients: A discussion and critical review. Integr Cancer Ther 5: 98–108. pmid:16685074

Shennan C, Payne S, Fenlon D (2011) What is the evidence for the use of mindfulness-based interventions in cancer care? A review. Psycho-Oncology 20: 681–697.

Veehof MM, Oskam MJ, Schreurs KMG, Bohlmeijer ET (2011) Acceptance-based interventions for the treatment of chronic pain: A systematic review and meta-analysis. Pain 152: 533–542

Coelho HF, Canter PH, Ernst E (2007) Mindfulness-Based Cognitive Therapy: Evaluating Current Evidence and Informing Future Research. J Consult Clin Psychol 75: 1000–1005.

Ledesma D, Kumano H (2009) Mindfulness-based stress reduction and cancer: A meta-analysis. Psycho-Oncology 18: 571–579.

Ott MJ, Norris RL, Bauer-Wu SM (2006) Mindfulness meditation for oncology patients: A discussion and critical review. Integr Cancer Ther 5: 98–108.

Burke CA (2009) Mindfulness-Based Approaches with Children and Adolescents: A Preliminary Review of Current Research in an Emergent Field. J Child Fam Stud.

Do we get the most authoritative reviews of mindfulness from  Holist Nurs Pract, Integr Cancer Ther, and Psycho-Oncology?

To cite just one example of the weakness of evidence being presented as strong, take the bold Bensen-Henry conclusion:

Further evidence for effectiveness was provided by the observed dose-response relationship: an increase in total minutes of practice and class attendance led to a larger reduction of stress and mood complaints in four reviews [18,20,37,54].

“Observed dose-response relationship”? This claim is  based [check out with respect to the citations just above] on Ott et al, 18, Smith et al 20, Burke 37 and Proulx 54, which makes the evidence neither recent nor systematic. I am confident that other examples will not hold up if scrutinized.

Further contradiction of the too perfect picture of mindfulness therapy conveyed by the Bensen – Henry review.

A more recent PLOS One review of mindfulness studies exposed the confirmation bias in the published mindfulness literature. It suggested a too perfect picture has been created of uniformly positive studies.

Coronado-Montoya, S., Levis, A.W., Kwakkenbos, L., Steele, R.J., Turner, E.H. and Thombs, B.D., 2016. Reporting of positive results in randomized controlled trials of mindfulness-based mental health interventions. PLOS One, 11(4), p.e0153220.

A systematic search yielded 124 RCTs of mindfulness-based treatments:

108 (87%) of 124 published trials reported >1 positive outcome in the abstract, and 109(88%) concluded that mindfulness-based therapy was effective, 1.6 times greater than the expected number of positive trials based on effect size d = 0.55 (expected number positivetrials = 65.7). Of 21 trial registrations, 13 (62%) remained unpublished 30 months post-trial completion.

Furthermore:

None of the 21 registrations, however, adequately specified a single primary outcome (or multiple primary outcomes with an appropriate plan for statistical adjustment) and specified the outcome measure, the time of assessment, and the metric (e.g., continuous, dichotomous). When we removed the metric requirement, only 2 (10%) registrations were classified as adequate.

And finally:

There were only 3 trials that were presented unequivocally as negative trials without alternative interpretations or caveats to mitigate the negative results and suggest that the treatment might still be an effective treatment.

What we have is a picture of trials of mindfulness-based treatment having an excess of positive studies, given the study sample sizes. Selective reporting of positive outcomes likely contributed to this excess of published positive findings in the published literature. Most of the trials were not preregistered and so it’s unclear whether the positive outcomes that were reported were hypothesized to be the primary outcomes of interest. Most of the trials that were preregistered remained unpublished 30 months after the trials were completed.

The Goyal et al. study originally planned to conduct quantitative analyses of publication biases, but abandoned the effort when they couldn’t find sufficient numbers of the 47 studies that that reported most of the outcomes they evaluated.

Conclusion

 The Bensen-Henry review produces a glowing picture of the quality of RCTs evaluating MSBR and the consistency of positive findings across diverse outcomes and populations. This is consistent with the message that they want to promote in marketing their products to patients, clinicians, and institutions. In this blog post I’ve uncovered substantial problems in internal to the Bensen-Henry review in terms of the studies that were included and the manner in which they were evaluated. But now we have external evidence in two reviews without obvious conflicts of interest come into markedly different appraisals of a literature that lacks appropriate control groups and seems to be reporting findings with a distinct confirmation bias.

I could have gone further, but what I found about the Bensen-Henry review seems sufficient for a serious challenge to the validity of its conclusions.  Investigation of the claims made about dose-response relationships between amount of mindfulness practice and outcomes should encourage probing of other specific claims.

The larger issue is that we should not rely on promoters of MSBR products to provide unbiased estimates of their efficacy. This issue recalls very similar problems in the evaluation of Triple P Parenting Programs. Evaluations in which promoters were involved produce markedly more positive results than from independent evaluations. Exposure by my colleagues and me led to over 50 corrections and corrigendum to articles that previously had no conflicts of interest. But the process did not occur without fierce resistance from those whose livelihood was being challenged.

A correction to the Bensen-Henry PLOS One review is in order to clarify the obvious conflicts of interest of the authors. But the problem is not limited to reviews or original studies from Benson-Henry Institute for Mind-Body Medicine. It’s time that authors be required to answer more explicit questions about conflict of interest. Ruling out a conflict of interest should be based on authors having to endorse explicitly no conflicts, rather than on their basis of their not disclosing a conflict and then being able to claim it was an oversight that they did not report one.

Postscript Who was watching at PLOS One to keep out infomercials from promoters associated with Massachusetts General Hospital and Harvard Medical School? The Academic Editor was To avoid the appearance of  a conflict of interest,  should he have recused him from serving as editor?

This is another flawed paper for which I’d love to see the reviews.

eBook_Mindfulness_345x550I will soon be offering e-books providing skeptical looks at mindfulness and positive psychology, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

 

Hazards of pointing out bad meta-analyses of psychological interventions

 

A cautionary tale

Psychology has a meta-analysis problem. And that’s contributing to its reproducibility problem. Meta-analyses are wallpapering over many research weaknesses, instead of being used to systematically pinpoint them. – Hilda Bastian

  • Meta-analyses of psychological interventions are often unreliable because they depend on a small number of poor quality, underpowered studies.
  • It is surprisingly easy to screen the studies being assembled for a meta-analysis and quickly determine that the literature is not suitable because it does not have enough quality studies. Apparently, the authors of many published meta-analyses did not undertake such a brief assessment or were undeterred by it from proceeding anyway.
  • We can’t tell how many efforts at meta-analyses were abandoned because of the insufficiencies of the available literature. But we can readily see that many published meta-analyses offer summary effect sizes for interventions that can’t be expected to be valid or generalizable.
  • We are left with a glut of meta-analyses of psychological interventions that convey inflated estimates of the efficacy of interventions and on this basis, make unwarranted recommendations that broad classes of interventions are ready for dissemination.
  • Professional organizations and promoters of particular treatments have strong vested interests in portraying their psychological interventions as effective. They will use their resources to resist efforts to publish critiques of their published meta-analyses and even fight the teaching of basic critical skills for appraising meta-analysis.
  • Publication of thorough critiques has little or no impact on the subsequent citation or influence of meta-analyses. Furthermore, such critiques are largely ignored.
  • Debunking bad meta-analyses of psychological interventions can be frustrating at best, and, at worst, hazardous to careers.
  • You should engage in such activities if you feel it is right to do so. It will be a valuable learning experience. And you can only hope that someone at some point will take notice.

3 Simple screening questions to decide whether a meta analysis is worth delving into.

I’m sick and tired of spending time trying to make sense of meta-analyses of psychological interventions that should have been dismissed out of hand. The likelihood of any contribution to the literature was ruled out by repeated, gross misapplication of meta-analysis by some authors  or, more often, the pathetic quality and quantity of literature available for meta-analysis.

Just recently, Retraction Watch reported the careful scrutiny of a pair of meta-analyses by two psychology graduate students, Paul-Christian Bürkner and Donald Williams. Coverage in Retraction Watch focused on their inability to get credit for the retraction of one of the papers that had occurred because of their critique.

But I was more saddened by their having spent so much time on the second meta-analysis, “A meta-analysis and theoretical critique of oxytocin and psychosis: Prospects for attachment and compassion in promoting recovery, The authors of this meta-analysis  had themselves acknowledged the literature was quite deficient, but proceeded anyway and published a paper that has already been cited 13 times.

The graduate students, as well as the original authors could simply have taken a quick look at the study’s Table 1: the seven included studies had from 9 to 35 patients exposed to oxytocin.  The study  with 35 patients was an outlier. This study also provided only a within-subject effect size, which should not have been entered into the meta-analysis with the results of the other studies.

The six remaining studies had an average sample size of 14 in the intervention group. I doubt that anyone would have undertaken a study of psychotic patients inhaling oxytocin to generate a robust estimate of effect size with only 9, 10, or 11 patients. It’s unclear why the original investigators stopped accruing patients when they did.

Without having specified their sample size ahead of time (there is no evidence that the investigators did), original investigators could simply have stopped when a peek at the data revealed statistically significant findings or they could have kept accruing patients when a peek revealed only nonsignificant findings. Or they could have dropped some patients. Regardless, the reported samples are so small that adding only one or two more patients could substantially change the results.

Furthermore, if the investigators were struggling to get enough patients, the study was probably under-resourced and compromised in other ways. Small sample sizes compound the problems posed by poor methodology and reporting. The authors conducting this particular meta-analysis could only confirm for one of the studies that data from all patients who were randomized were analyzed, i.e., that there was intention to treat analyses. Reporting was that bad, and worse. Again, think of the effects of the loss of data from the analysis of one or a few patients- it could be decisive for results –  particularly when the loss was not random.

Overall, the authors of the original meta-analysis conceded that the seven studies they were entering into the meta-analyses had a high risk of bias.

It should be apparent that authors cannot take a set of similarly flawed studies and integrate their effect sizes with a meta-analysis and expect to get around the limitations. Bottom line – readers should just dismiss the meta-analysis and get on to other things…

These well-meaning graduate students were wasting their time and talent carefully scrutinizing a pair of meta-analyses that were unworthy of their sustained attention. Think of what they could be doing more usefully. There is so much other bad science out there to uncover.

Everybody – I recommend not putting a lot of effort into analyzing obviously flawed meta-analysis, other than maybe posting a warning notice on PubMed Commons  or ranting in a blog post or both.

Detecting Bad Meta Analyses

Over a decade ago, I developed some quick assessment tools by which I can reliably determine that some meta-analyses are not worth our attention. You can see more about the quickly answered questions here.

To start such an assessment, directly to the table describing studies that were included in a published meta-analysis.

  1. Ask: “To what extent are the studies dominated by cell sample sizes less than 35?” Studies of this size have only a power of .50 to detect a moderate size effect. So, even if an effect were present, it would only be detected 50% of the time of all studies were being reported.
  2. Next, check to see whether whoever did the meta-analysis rated the included studies for risk of bias and how, if at all, risk of bias was taken into account in the meta-analyses.
  3. Finally, does the meta analysis adequately deal with clinical heterogeneity of included studies? Is there a basis for giving a meaningful interpretation to a single summary effect size?

Combining studies may be inappropriate for a variety of the following reasons: differences in patient eligibility criteria in the included trials, different interventions and outcomes, and other methodological differences or missing information.  Moher et al., 1998

I have found this quick exercise often reveals that meta-analyses of psychological interventions are dominated by underpowered studies of low methodological quality that produce positive effects for interventions at a greater rate than would be expected. There is little reason to proceed to calculate a summary effect size.

Pothole-FinalThe potholed road from a presentation to a publication.

My colleagues and I applied these criteria in a 2008 presentation to a packed audience at the European Health Psychology Conference in Bath. My focus was Undertook a similar exercise with four meta-analyses of behavioral interventions for adults (Dixon, Keefe, Scipio, Perri, & Abernethy, 2007; Hoffman, Papas, Chatkoff, & Kerns, 2007 ; Irwin, Cole, & Nicassio, 2006; and Jacobsen, Donovan, Vadaparampil, & Small, 2007) that appeared in a new section of Health Psychology, Evidence Based Treatment Reviews.

A sampling of what we found::

Irwin et al. The Irwin et al meta analysis had the stated objective of

comparing responses in studies that exclusively enrolled persons who were 55 years of age or older versus outcomes in randomized controlled trials that enrolled adults who were, on average, younger than 55 years of age(p. 4).

A quick assessment revealed exclusion of small trials (n < 35) would have eliminated all studies of older adults; five studies included 15 or fewer participants per condition. For the studies including younger adults, only one of the 15 studies would have remained.

Hoffman et al. We found that 17 of the 22 included fell below n = 35 per group. Response to our request, the authors graciously shared a table of the methodological quality of the included studies.

Intervention and control groups were not comparable In 60% of the studies on key variables at baseline.

Less than half provided adequate information concerning number of patients enrolled, treatment drop-out and reasons for drop-outs.

Only 15% of trials provided intent-to-treat analyses.

In a number of studies, the psychological intervention was part of the multicomponent package so that its unique contribution could not be determined. Often the psychological intervention was minimal. For instance, one study noted: “a lecture to give the patient an understanding that ordinary physical activity would not harm the disk and a recommendation to use the back and bend it.”

The only studies comparing a psychological intervention to an active control condition consisted of three underpowered studies into in which effects of the psychological component cannot be separated from the rest of the package in which it was embedded. In one of the studies, massage was the psychological intervention, but in another, it was the control group.

Nonetheless,  Hoffman et al. concluded ““The robust nature of these findings should encourage confidence among clinicians and researchers alike.”

As I readily demolished the meta-analyses  to the delight of the audience, I remarked something to the effect that I’m glad the editor of Health Psychology is not here to hear what I am saying about articles published in the journal he edits.

But Robert Kaplan was there. He invited me for a beer as I left the symposium. He said that such critical probing was sorely lacking in the journal. He invited that my colleagues and I submit an invited article. Eventually it would be published as:

Coyne JC, Thombs BD, Hagedoorn M. Ain’t necessarily so: Review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology. 2010 Mar;29(2):107.

However, Kaplan first had an Associate Editor send out the manuscript for review. The manuscript was rejected  based on a pair of reviews that were not particularly informative . One reviewer stated:

The authors level very serious accusations against fellow scientists and claim to have identified significant shortcomings in their published work. When this is done in public, the authors must have done their homework, dotted all the i’s, and crossed all the t’s. Instead, they reveal “we do not redo these meta-analyses or offer a comprehensive critique, but provide a preliminary evaluation of the adequacy of the conduct, reporting and clinical recommendations of these meta-analyses”. To be frank, this is just not enough when one accuses colleagues of mistakes, poor judgment, false inferences, incompetence, and perhaps worse.

In what he would later describe as the only time he did this in his term as editor of Health Psychology, Bob Kaplan overruled the unanimous recommendations of his associate editor and the two reviewers. He accepted a revision of our manuscript in which we try to be clearer about the bases of our judgments.

According to Google Scholar, our “Ain’t necessarily so…” has been cited 53 times. Apparently it had little effect on the reception of the four meta-analyses. Hoffman et al. has been cited 599 times.

From a well-received workshop to a workshop canceled in order to celebrate a bad meta-analysis.

Mariet Hagedorn and I gave a well-received workshop at the annual meeting of The Society for Behavioral Medicine the next year. A member of SBM’s Evidence-based Behavioral Medicine Committee invited us to their committee meeting held immediately after the workshop. We were invited to give the workshop again in two years. I also became a member of the committee. I offered to be involved in future meta-analyses, learning that a number were planned.

I actually thought that I was involved in a meta-analysis of interventions for depressive symptoms among cancer patients. I immediately identified a study of problem-solving therapy for cancer patients that had such improbably large effect sizes that should be excluded from any meta-analysis as an extreme outlier. The suggestion was appreciated.

But I heard nothing further about the meta-analyses and to I was contacted by one of the authors who said that my permission was needed to be acknowledged in the accepted manuscript. I refused. When I finally saw the published version of the manuscript in the prestigious Journal of the National Cancer Institute, I published a scathing critique, which you can read here. My critique has so far been cited once, the meta-analysis in eighty times.

Only a couple of months before our workshop had been scheduled to occur I was told it was canceled in order to clear the schedule for full press coverage of a new meta-analysis. I only learned of this when I emailed the committee concerning the specific timing of the workshop.  The reply came from the first author of the new meta-analysis.

I have subsequently made the case that that meta-analysis was horribly done and horribly misleading of consumers in two blog posts:

Faux Evidence-Based Behavioral Medicine at Its Worst (Part I)

Faux Evidence-Based Behavioral Medicine Part 2

Some highlights:

The authors boasted of “robust findings” of “substantial rigor” in a meta-analysis that provided “strong evidence for psychosocial pain management approaches.” They claimed their findings supported the “systematic implementation” of these techniques.

The meta-analysis depended heavily on small trials. Of the 38 trials, 19 studies had less than 35 patients in the intervention or control group and so would be excluded with application of this criterion.

Some of the smaller trials were quite small. One had 7 patients receiving an education intervention;  another had 10 patients getting hypnosis; another, 15 patients getting education; another, 15 patients getting self hypnosis; and still another, 8 patients getting relaxation and eight patients getting CBT plus relaxation.

Two of what were by far the largest trials should have been excluded because they involved complex intervention. Patients received telephone-based collaborative care, which had a number of components, including support for adherence to medication.

It appears that listening to music, being hypnotized during a medical procedure, and being taught self hypnosis over 52 sessions, are all under the rubric of skills training. Similarly, interactive educational sessions are considered equivalent to passing out informational materials and simply pamphleteering.

But here’s what most annoyed me about clinical and policy decisions being made on the basis of this meta-analysis:

Perhaps most importantly from a cancer pain control perspective, there was no distinguishing of whether the cancer pain was procedural, acute, or chronic. These types of pain take very different management strategies. In preparation for surgery or radiation treatment, it might be appropriate to relax or hypnotize the patient or provide soothing music. The efficacy could be examined in a randomized trial. But the management of acute pain is quite different and best achieved with medication. Here is where the key gap exists between the known efficacy of medication and the poor control in the community, due to professional and particularly patient attitudes. Control of chronic pain, months after any painful procedures, is a whole different matter, and based on studies of noncancer pain, I would guess that here is another place for psychosocial intervention, but that should be established in randomized trials.

shushedGetting shushed about the sad state of couples interventions for cancer patients research

One of the psychologists present at the SBM meeting published a meta-analysis of couples interventions   in which I was thanked for my input in an acknowledgment. I did not give permission and this notice was subsequently retracted.

Ioana Cristea and Nilufer Kafescioglu and I subsequently submitted a critique to Psycho-Oncology. We were initially told it would be accepted as a letter to the editor, but then it was subject to an extraordinary six uninformative reviews and rejected. The article that we critiqued was given special status as a featured article and distributed free by the otherwise pay walled journal.

A version of our critique was relegated to a blog post.

The complicated politics of meta-analyses supported by professional organizations.

Starting with our “Ain’t necessarily so..” effort, we were taking aim at meta-analyses making broad, enthusiastic claims about the efficacy and readiness for dissemination of psychological interventions. Society for Behavioral Medicine was enjoying a substantial increase in membership, but like other associations dominated by psychologists, the new members were clinicians, not primarily academic researchers. SBM wanted to offer a branding of “evidence-based” to the psychological interventions for which the clinicians were seeking reimbursement. At the time, insurance companies were challenging that licensed psychologists would get reimbursed for psychological interventions that would not administered to patients with psychiatric diagnoses.

People involved with the governance of SBM at the time cannot help but be aware of an ugly side to the politics back then. A small amount of money had been given by NCI to support meta-analyses and it was quite a struggle to control its distribution. That the SBM-sponsored meta-analyses were oddly published in the APA journal, Health Psychology, rather than SBM’s Annals of Behavioral Medicine reflected the bid for presidency of APA’s Division of Health Psychology by someone who had been told that she could not run for president of SBM. But worse, there was a lot of money and undeclared conflicts of interest in play.

Someone originally involved in the meta-analysis of interventions for depressive symptoms among cancer patients had received a $10 million grant from Pfizer to develop a means of monitoring cancer surgeons’ inquiring about psychological distress and their offering of interventions. The idea (which was actually later mandated) was that cancer surgeons could not close their electronic records until they had indicated that they had asked the patient about psychological distress. If patient reported distress, the surgeons had to indicate what intervention was offered to the patient. Only then could they close the medical record. Of course, these requirements could be met simply by asking if a breast cancer patient was distressed and offering her antidepressant without any formal diagnosis or follow-up. These procedures were mandated as part of accreditation of facilities providing cancer care.

Psycho-Oncology, the journal with which we skirmished about the meta-analysis of couples interventions was the official publication of the International Psycho-Oncology Society, another organization dominated by commission seeking reimbursement for services to cancer patients.

You can’t always get what you want.

I nonetheless encourage others, particularly early career investigators, to take the tools that I offer. Please scrutinize meta-analyses that otherwise would have clinical and public policy recommendations attached to their findings. You may have trouble getting published, and you will be slowly disappointed if you expect to influence the reception of already published meta-analysis. You can always post your critiques at PubMed Commons.

You will learn important skills and the politics of trying to publish critiques of papers that are protected as having been “peer reviewed.” If enough of you do this and visibly complain about how ineffectual your efforts have been, we may finally overcome the incumbent advantage and protection from further criticism that goes with getting published.

And bloggers like myself and Hilda Bastian will recognize you and express appreciation.

 

 

BMC Medicine gets caught up in Triple P Parenting promoters’ war on critics and null findings

Undeclared conflicts of interest constitute scientific misconduct.

Why we should be as concerned about conflicts of interest in evaluations of nonpharmacological treatments, like psychotherapy.

whackWhack! Triple P promoters (3P) Cassandra L Tellegen and Kate Sofronoff struck again against critics and null findings, this time in BMC Medicine. As usual, there was an undisclosed financial conflict of interest.

Until recently, promoters of the multimillion-dollar enterpriseNothing_to_Declare controlled perception of their brand of treatment. They authored most reports of implementations and also systematic reviews and meta-analyses. They did not report financial conflicts of interest and denied any conflict when explicitly queried.

The promoters were able to insist on the official website:

No other parenting program in the world has an evidence base as extensive as that of Triple P. It is number one on the United Nations’ ranking of parenting programs, based on the extent of its evidence base.

At least two of the developers of 3P and others making money from it published a systematic review and meta-analysis they billed as comprehensive:

Sanders, M. R., Kirby, J. N., Tellegen, C. L., & Day, J. J. (2014). The Triple P-Positive Parenting Program: A systematic review and meta-analysis of a multi-level system of parenting support. Clinical Psychology Review, 34(4), 337-357.

Promoters of 3P are still making extravagant claims, but there has been noticeable change in the view from elsewhere. An independently conducted meta-analyses in BMC Medicine  demonstrated that previous evaluations depended heavily on flawed, mostly small studies that very often had undeclared conflicts of interest. I echoed and amplified the critique of the 3P Parenting literature, first in blog posts [1 , 2]  and then in an invited commentary in BMC Medicine.

The sordid history of the promoters’ “comprehensive” meta-analysis was revealed  and its overwhelming flaws were scrutinized.

Over 30 errata, addenda, and  corrigenda have been attached to previously published 3P articles and more keep accumulating. Just try Google scholar with “triple P parenting” and “erratum” or “addendum” or “corrigendum.” We will be seeing more errata as more editors are contacted.

Please click to enlarge
Please click to enlarge
Please click to enlarge
Please click to enlarge

There were reports in social media of how studies with null findings have been previously sandbagged in anonymous peer review or how authors were pressured by peer reviewers to spin results. Evidence surfaced of 3P founder Matt Sanders attempting to influence the reporting of a supposedly independently conducted evaluation. It is unclear how frequently this occurs, but represents a weakening of the important distinction between independent evaluations and those with conflicts of interest.

The Belgian government announced defunding of 3P programs. Doubts whether 3P was the treatment of choice were raised in 3P’s home country. 3p is a big ticket item in Australia, with New South Wales alone spending $6.6 million on it.

A detailed critique called into question the positive results claimed for one of the largest and influential population-based 3P interventions, and the non-disclosed conflicts of interest of the authors and the editorial board of the journal in which it appeared – Prevention Sciencewere exposed.

Are we witnessing the decline effect  in the evaluation of 3P? Applied to intervention studies, the term refers to the recurring pattern when weaker results accumulate from larger, more sophisticated studies not conducted by promoters of the intervention who initially had produced glowing reports of efficacy and effectiveness.

But the 3P promoters viciously and unethically fought back. Paid spokespersons took to the media to denounce independently conducted negative evaluations. Critics were threatened in their workplace, letters of complaint were written to their universities. Programs threatened with withdrawal of 3P resources if the critics weren’t silenced. Publications with undisclosed conflicts of interest authored by paid promoters of 3P continue to appear, despite the erratum and addendum apologizing for what had occurred in the past.

In this issue of Mind the Brain, I review the commentary in BMC Medicine. I raise the larger issue of whether the promoters of 3P’s recurring undeclared conflicts of interests represents actionable scientific misconduct. And I deliver a call to action.

My goal is to get BMC Medicine to change its policies concerning disclosure of conflict of interest and its sanctions for nondisclosure. I am not accusing the editorial board of BMC Medicine of wrongdoing.

The journal was the first to publish serious doubts about the effectiveness of 3P. Scottish GP  Phil Wilson and colleagues went there after his meta analysis was trashed in anonymous peer review at Elsevier’s Clinical Psychology Review (CPR). He faced retaliation from the workplace after he was contacted directly by the founder of 3P immediately after his submission to CPR. Matt Sanders sent him papers published after the end date Wilson had set for the papers included in his meta analysis. Bravo for BMC Medicine for nevertheless getting Wilson’s review into print. But the BMC Medicine editors have been repeatedly duped by 3P promoters and they now have the opportunity to serve as a model for academic publishing in mounting an effective response.

Stepping Stones Triple P: the importance of putting the findings into context

The BMC Medicine commentary by Tellegen and Sofronoff  is available here. The commentary first appeared without a response from the authors who were being criticized, but that has now been rectified.

Tellegen and Sofronoff chastised  the authors of a recent randomized trial [d], also published in BMC Medicine that evaluated the interventions with parents of children with Borderline to Mild Intellectual Ability (BMD).

Firstly, the authors present a rationale for conducting the study that does not accurately represent the current state of evidence for SSTP. Secondly, the authors present an impoverished interpretation of the findings within the paper.

The “current state of evidence for SSTP” about which Tellegen and Sofronoff complain refers to a systematic review and meta-analysis authored by Tellegen and Matt Saunders. I previously told how

  • An earlier version of this review was circulated on the Internet labeled as under review at Monographs of the Society of Research in Child Development. It’s inappropriate to distribute manuscripts indicating that they are “under review” at particular journals. APA guidelines explicitly forbid it. This may have led to the manuscript’s rejection.
  • The article nonetheless soon appeared in Clinical Psychology Review in a version that differed little from the manuscript previously available on the Internet, suggesting weak peer-review.
  • The article displays numerous instances of meta analysis malpractice. It is so bad and violates so many standards, that I recommend its use in seminars as an example of bad practices.
  • This article had no declared conflicts of interests.

Tellegen and Sofronoff’s charge of ”impoverished interpretation of the findings within the paper” refers to the investigators failing to cite 4 quite low quality studies that were not randomized trials but were treated as equivalent to RCTs in Tellegen and Sanders own meta-analyses.

In their response to the commentary from 3P, three of the authors – Sijmen A Reijneveld, Marijke Kleefman, and Daniëlle EMC Jansen of the original trial calmly and effectively dismissed these criticisms. They responded a lot more politely than I would have.

is youThe declarations of conflict of interest of 3P promoters in BMC Medicine: Is you is or ain’t you is making money?

An earlier commentary in BMC Medicine whose authors included 3P developer Matt Sanders and Kate Sofronoff – an author of the commentary under discussion – stated in the text:

Triple P is not owned by its authors, but by The University of Queensland. Royalty payments from dissemination activities, principally the sale of books, are paid by the publisher (Triple P International) to the University of Queensland’s technology transfer company (UniQuest), and distributed to the university’s Faculty of Social and Behavioural Sciences, School of Psychology, Parenting and Family Support Centre and contributory authors in accordance with the university’s intellectual property policy. None of the program authors own shares in Triple P International, the company licensed by the University of Queensland to disseminate the program worldwide.

What is one to make of this? It seems to answer “no” to the usual question of whether authors own stock or share ownership in a company. It doesn’t say directly about what happens to the royalties from the sale of books. Keep in mind, that the multimillion dollar enterprise of 3P involves selling lots of books, training materials, workshops, and government contracts. But a reader would have to go to the University of Queensland’s intellectual property policy to make sense of this disclaimer.

The formal COI statement in the article does not clarify much, but should arouse curiosity and skepticism –

…Royalties stemming from this dissemination work are paid to UniQuest, which distributes payments to the University of Queensland Faculty of Social and Behavioural Sciences, School of Psychology, Parenting and Family Support Centre, and contributory authors in accordance with the University’s intellectual property policy.

No author has any share or ownership in Triple P International. MS is the founder and lead author of the Triple P-Positive Parenting Program, and is a consultant to Triple P International. JP has no competing interests. JK is a co-author of Grandparent Triple P. KT is a co-author of many of the Triple P interventions and resources for families of children up to 12 years of age. AM is a co-author of several Triple P interventions for young children including Fuss-Free Mealtime Triple P. TM is a co-author of Stepping Stones Triple P for families of children with disabilities. AR is a co-author of Teen Triple P for parents of adolescents, and is Head of Training at Triple P International. KS has no competing interests.

omgThe authors seem to be acknowledging receiving money as “contributory authors” but there is still a lot of beating around the bush. Again, one needs to know what more about the university’s intellectual properties policy. Okay, take the trouble to go to the website for the University of Queensland to determine just how lucrative the arrangements are. You will surely say “Wow!” If you keep in mind the multimillion dollar nature of the 3P enterprise.

Please click to expand
Please click to expand

The present commentary in BMC Medicine seems to improve transparency –

The Triple P – Positive Parenting Program is owned by The University of Queensland (UQ). The University through its main technology transfer company, UniQuest Pty Ltd, has licensed Triple P International Pty Ltd to publish and disseminate the program worldwide. Royalties stemming from published Triple P resources are distributed to the Faculty of Health and Behavioural Sciences at UQ, Parenting and Family Support Centre, School of Psychology at UQ, and contributory authors. No author has any share or ownership in Triple P International Pty Ltd. Cassandra Tellegen and Kate Sofronoff are employees of the UQ and members of the Triple P Research Network

But the disclosure remains evasive and misleading. One has to look elsewhere to find out that there is only a single share of Triple P International Pty Ltd, owned by Mr Des McWilliam. He was awarded a 2009 honorary doctorate by the University of Queensland in 2009. The citation … acknowledged that

Mr McWilliam’s relationship with Triple P had provided grant leveraging, both nationally and internationally, for ongoing research by the PFSC and had supported ongoing international trials of the program.

another wedding photoInteresting, but there is still an undeclared COI that is required for adherence to the International Committee of Medical Journal Editors (ICMJE) to which BMC Medicine subscribes. Just as Matt Sanders is married to Patricia Sanders, Cassandra L Tellegen is married to James Kirby, a psychologist who has written at least 12 articles with Sanders on 3 P and a 3P workbook for grandparents. Aha, both Sanders and Tellegen are married to persons financially benefiting from 3P programs. All in the family. And spousal relationships are reportable conflicts of interest.

I don’t know about you, but I’m getting damn sick and tired of all the shuck ‘n jiving from triple P parenting when they’re required to disclose conflicts of interest.

shark-life-guardWhy get upset about conflict of interests in evaluations of nonpharmacological trials and reviews?

My colleagues and I played a role in improving the tracking of conflicts of interest going from industry-supported clinical trials to inclusion in meta-analyses. Our criticism prompted Cochrane Collaboration to close a loophole in investigator conflict of interest not having been identified as a formal risk of bias. Prior to the change, results of an industry sponsored pharmacological trial could be entered into a meta-analysis where the origins were no longer apparent. The collaboration awarded us the Bill Silverman Award for pointing out the problem.

It’s no longer controversial that in the evaluation of pharmacological interventions involving financial conflicts of interest are associated with inflated claims for efficacy. But the issue is ignored in evaluating nonpharmacological interventions, like psychotherapies or social programs like 3P.

Undeclared conflicts of interest in nonpharmacological trials threaten the trustworthiness of the psychological literature.

Readers are almost never informed about conflicts of interest in the trials evaluating psychotherapy evaluations and their integration in meta-analyses. Yet, “investigator allegiance” a.k.a. undeclared conflict of interest is one of the most robust predictors of effect size. Indeed, knowing the allegiance of an investigator more reliably predicts the direction of results than the particular psychotherapy being evaluated.

As reviewed in my numerous blog posts  [1,2,3], there are no doubts that evaluations of 3P are inflated with a strong confirmation bias associated with undeclared complex of interest.

But the problem is bigger than that when it comes to 3P. Millions of dollars are being invested in on claims that improvement in parenting skills resulting from parents’ participation in 3P are a solution for pressing larger social problems. The money that could be being wasted on 3P is diverted from other solutions. And participation of parents in 3P programs is often not voluntary. They participate to avoid other adverse outcomes like removal of the children from their home by enrollment in 3P. That’s not a fair choice, when 3P may not provide them any other benefit and certainly not what it is advertised as providing.

HMarriage-image2We should learn from the results of President George W. Bush committing hundreds of millions of dollars to promote stable and healthy marriages. The evidence for the programs selected for implementation were almost entirely from small-scale, methodologically flawed studies conducted by their developers who typically did not publish with declared conflicts of interest. Later evaluations showed the programs to be grossly ineffective. An independent evaluation  showed positive findings of the particular programs did not occurred more than would be expected by chance. What a waste, but I doubt President Bush cared. As part of a larger package, he was able to slash welfare payments to the poor and shorten the allowable time for unemployment payments.

Politicians will accept ineffective social programs if they are in the service of being able to claim that they are not just doing nothing, they are offering solutions. And the ineffective social programs are particularly attractive when they cost less than a serious effort to address the social problems.

must declare
Please click to enlarge

goods to declare2pgWhat I’m asking of BMC Medicine: A model response

  • Consistent with Committee on Publication Ethics (COPE) recommendations, persons with conflict of interest should not be invited to write commentaries. I’m not sure that wanting to respond to null findings for their prized product is a justifiable override of this restriction. But if a commentary is deemed justified, there needs to be no ambiguity about the declaration of conflict of interest by the authors.
  • If journals have a policy of commentaries not undergoing peer review, it should be indicated at each and every commentary that is the case. That would be consistent with COPE recommendations concerning non-peer-reviewed papers in journals identifying themselves as peer-reviewed.
  • Consistent with the opinion of many universities, failure to declare conflicts of interest constitutes scientific misconduct.
  • Scientific misconduct is grounds for retraction. Saying “Sorry, we forgot” in an erratum is an inadequate response. We need some sort of expanded pottery barn rule by which journals don’t just allow author to publish an apology when the journal discovers an undeclared conflict of interest.
  • Articles for which authors declare conflicts of interest should be subject to particular editorial scrutiny, given the common association of conflicts of interest and spinning of results and other confirmatory bias.
  • Obviously, 3P promoters have had problems figuring out what conflicts of interest they have to declare. How about requiring all articles to require a statement that I first saw in a BMJ article, something like

I have read all ICMJE standards and on that basis declare the following:

If authors are going to lie, let’s make it obvious and more actionable.

Please listen Up, PLOS One

I am grateful to PLOS One for carefully investigating my charges that the authors of an article had substantial undeclared conflicts of interest.

The situation was outrageous. Aside from the conflicts of interest, the article was – as I documented in my blog post – neurobalm. The appearance of positive results was obtained by selective reporting of the data from analyses redone after previous analyses did not produce positive results. A misleading video was released on the internet accompanied by soft music and claims to demonstrate scientific evidence in PLOS One that a particular psychotherapy “soothed the threatened brain.” Yup, that was also in the title of the PLOS One article. The highly spun article was part of a marketing of workshops to psychotherapists who likely had little or no research training.

I volunteer as an Academic Editor for PLOS One and I resent the journal being caught up in misleading clinicians – and the patients they treat.

Upon investigation, the journal added an elaborate conflict of interest statement to the article. I’m impressed with the diligence with which the investigation was conducted.

Yet, the absence of a previous statement meant that the authors had denied any conflicts of interest in response to a standard query from the journal during the submission process.I think their failure to make an appropriate disclosure is scientific misconduct. Retraction should be considered.

Given the strong association between conflicts of interests or investigator allegiance in outcomes of psychosocial research, revelation of the undisclosed conflict of interest should have at least precipitated a careful re-review with heightened suspicion of spin and bias. And not by an editor who had not been informed of the conflict of interest and had missed the flaws the first time the article was reviewed. Editors are humans, they get defensive when embarrassed.

Disclaimer: The opinions I express here are my own, and not necessarily those of the PLOS One or other members of the editorial board. Thankfully, at Mind the Brain, bloggers are free to speak out for themselves without censorship or even approval from the sponsoring journal. Remember what happened at Psychology Today and how I came to blog here.

 

 

Failing grade for highly cited meta-analysis of positive psychology interventions

The many sins of Sin and  Lyubomirsky

failing gradeI recently blogged about Linda Bolier and colleagues’  meta-analysis of positive psychology interventions [PPIs] in BMC Public Health. It is the new kid on the block. Sin and Lyubomirsky’s  meta analysis is accepted as the authoritative summary of the evidence and has been formally identified by Web of Science as among the top 1% in terms of citations of papers in psychology and psychiatry for 2009, with 187 citations according to Web of Science ,487 citations according to Google Scholar.

This meta-analysis ends on a resoundingly positive note:

Do positive psychology interventions effectively boost well-being and ameliorate depression? The overwhelming evidence from our meta-analysis suggests that the answer is ‘‘yes.’’ The combined results of 49 studies revealed that PPIs do, in fact, significantly enhance WB, and the combined results of 25 studies showed that PPIs are also effective for treating depressive symptoms. The magnitude of these effects is medium-sized (mean r =.29 for WB, mean r= .31 for depression), indicating that not only do PPIs work, they work well.

According to Sin and  Lyubomirsky , the strength of evidence justifies PPIs be disseminated and implemented in the community:

The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.

The authors also claimed to have dispensed with concerns that clinically depressed persons may be less able to benefit from PPIs.  Hmm…

In this blog post I will critically review Sin and  Lyubomirsky’s meta-analysis, focusing on effects of PPIs on  depressive symptoms, as I did in the  earlier blog post concerning Bolier and colleagues’  meta-analysis. As the title of this blog post suggests, I found the Sin and  Lyubomirsky meta-analysis misleading, falling far short of accepted standards for doing and reporting meta-analyses. I hope to convince you that authors who continue to cite this meta-analysis are either naïve, careless, or eager to promote PPIs in defiance of the available evidence. And I will leave you with the question of what its uncritical acceptance and citation says about that the positive psychology community’s standards.

Read on and I will compare and contrast the Sin and  Lyubomirsky and meta analyses and you will get a chance to see how to grade the meta-analysis using the validated checklist, AMSTAR.

stop sign[If you are interested in using AMSTAR yourself  to evaluate the Sin and  Lyubomirsky and Bolier and colleagues’  meta-analysis independently, this would be a good place to stop and get the actual checklist and the article explaining it.].

The Sin and  Lyubomirsky meta-analysis

The authors indicate the purpose of the meta-analysis was to

Provide guidance to clinical practitioners by answering the following vital questions:

  • Do PPIs effectively enhance WB and ameliorate depression relative to control groups and, if so, with what magnitude?
  • Which variables—with respect to both the characteristics of the participants and the methodologies used—moderate the effectiveness of PPIs?

Similar to Bolier and colleagues, this meta-analysis focused primarily on interventions

aimed at increasing positive feelings, positive behaviors, or positive cognitions, as opposed to ameliorating pathology or fixing negative thoughts or maladaptive behavior patterns.

However, Sin and  Lyubomirsky’s  meta-analysis was less restrictive than Bolier et al in including interventions such as mindfulness, life review therapy, and forgiveness therapy.  These approaches were not developed explicitly within the positive psychology framework, even if they’ve been appropriated by positive psychology.

Positive psychologists have a bad habit of selectively claiming older interventions as their own, as they did with specific interventions from Aaron T Beck’s cognitive therapy for depression. We need to ask if what is considered effective in “positive psychology interventions” is new and distinctly positive psychology or if what is effective is mainly what is old and borrowed from elsewhere.

worse than itSin and  Lyubomirsky’s  meta-analysis also differs from Bolier et al in including nonrandomized trials, although that was nowhere explicitly acknowledged. Sin and  Lyubomirsky included studies in which what was done to student participants depended on what classrooms they were in, not on their individually being randomized. Lots of problems are introduced. For instance, any pre-existing differences associated with students being in particular classrooms are attributed to the participants having gotten PPIs. One should not combine studies with randomization by individual with studies in which interventions depended on being in particular classrooms – unless perhaps, a check is been made statistically of whether they can be considered in the same class of interventions.

[I know, I’m getting into technical details that casual readers of the meta-analysis might want to ignore, but the validity of authors’ conclusions depend on such details. Time and time again, we will see Sin and  Lyubomirsky not providing them.]

Using AMSTAR

If authors have done a meta-analysis and want to submit it to a journal like PLOS One, they must accompany their submission with a completed PRISMA checklist. That is to allow the editor and reviewers to determine whether you’ve provided basic details need for them and for future readers to evaluate for themselves what you actually did. PRISMA is a checklist about transparency in reporting, and does not evaluate the appropriateness or competency of what authors do. Authors can do meta-analysis badly and still score points on PRISMA because readers got the details have the details to see for themselves.

In contrast, AMSTAR evaluates both what is reported and what was done. So, authors don’t get points for reporting how  they did the meta-analyses inappropriately. And unlike a lot of checklists, the items of AMSTAR has been externally validated.

One final thing, before we start, is that you can add up the number of items for which he meta-analysis meets AMSTAR criteria, but a higher score does not indicate that one meta-analysis is better than another. That’s because some items are more important than others in terms of what the authors of meta-analysis have done and whether they’ve given enough details to readers. So, two meta-analyses may get the moderate score using AMSTAR, but may differ in whether the items which they didn’t meet are fatal to the meta-analyses being able to make a valid contribution to the literature.

Some of the problems of Sin and Lyubomirsky’s meta-analysis revealed by AMSTAR

5. Was a list of studies (included and excluded) provided?

While a list of it included studies was provided, there was no list of excluded studies. It is confusing, for instance, why Barbara Fredrickson et al.’s (2008) study of loving kindness meditation with null findings is never mentioned. The study is never identified as a randomized trial in the original article, but is subsequently cited by Barbara Fredrickson and many others within positive psychology as such. That’s a serious problem with the positive psychology literature: you never know when an experimental manipulation is a randomized trial or whether a study will be later cited as evidence of the effectiveness of positive psychology interventions.

Most of the rest of the psychological intervention literature adheres to CONSORT and one of the first requirements is that articles indicate either in their title or abstract that a randomized trial is being discussed. So, when it comes to a meta-analysis of PPIs, it, is particularly important to know what studies were excluded so that readers can judge how that might have affected the effect size that was obtained.

6. Were the characteristics of the included studies provided?

Sin and  Lyubomirsky’s Table 1 is incomplete and misleading in reporting characteristics of the included studies. It doesn’t indicate whether or not studies involved randomization. It is misleading in indicating that studies selected for depression, because it lumps together studies that used a self-report measure of mildly depressed students selected on the basis of self-report questionnaires who were not necessarily clinically depressed in with patients with more severe who meet criteria for formal clinical diagnoses.  The table indicates sample size, but it is not sample size that matters most, but the size of the smallest group, whether intervention or control. A number of positive psychology studies have a big imbalance in the size of the intervention versus the control group. So, there may be a seemingly sufficient number of participants in the study, but the size of the control group would make the study underpowered, with a suspicion that effect sizes were exaggerated.

7. Was the scientific quality of the included studies assessed and documented?

card_3_monkeys_see_no_evil_hear_no_evil_see-ra33d04ad8edf4f008e5230ac381ec8b0_xvuak_8byvr_512Sin and  Lyubomirsky made no effort to evaluate the quality of the included studies! That is a serious, fatal flaw.

On this basis alone, I would judge the meta-analyses either to have somehow evaded adequate peer review or that the editor of Journal of Clinical Psychology and reviewers of this particular paper were incompetent. Certainly this problem would not have been missed at PLOS One and I would hope that other journals were readily picked it up.

Bolier and colleagues explained their rating system and presented its application in evaluating the individual trials included in the meta-analysis. Readers had the opportunity to examine the rating system and its application. We were able to see that the studies evaluating positive psychology interventions tend to be of low quality. We can also see that the studies producing the largest effect sizes tend to be those of the lowest quality and small size.

I was somewhat critical of Bolier and colleagues in an earlier blog, because they liberalized the quality rating scales in order to even be able to conduct a meta-analysis. Nonetheless, they were transparent enough to allow me to make that independent evaluation. Because we have their readings available, we can extrapolate to the studies included in Sin and Lyubomirsky and be warned that this analysis is likely to provide an overly positive evaluation of PPIs. But we have to go outside of what in Sin and Lyubomirsky provides.

8. Was the scientific quality of the included studies used appropriately in formulating conclusions?

AMSTAR indicates

The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.

Sin and Lyubomirsky could not take quality into account in interpreting their meta-analysis because they did not rate quality. And so they didn’t allow readers a chance to use quality ratings to independently evaluate for themselves.  We are now further in the realm of fatal flaws. We know from other sources that much of the “evidence” for positive psychology interventions comes from small, underpowered studies likely to produce exaggerated estimates of effects. If this is not taken into account, conclusions are invalid.

9. Were the methods used to combine the findings of studies appropriate?

AMSTAR indicates

For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).

Sin and Lyubomirsky used an ordinary chi-squared test and found

the set of effect sizes was heterogeneous (c2(23) = 146:32, one-tailed p < 2 x 10-19), indicating that moderators may account for the variation in effect sizes.

[I’ll try to be as non-technical as possible in explaining a vital point. Do try to struggle through this, rather than simply accepting my conclusion this one statistic alone indicates a meta-analysis seriously in trouble. Think of it like a warning message on your car dashboard that should compel you to immediately drive to the side of the road, sure the engine, and call a tow truck].

Tests for heterogeneity basically tell you whether there are enough similarities between the effect sizes for individual studies to warrant combining them. A test for heterogeneity examines whether  the likelihood of too much variation can be rejected within certain limits. The Cochrane collaboration specifically warns against using an ordinary chi-squared test to test for heterogeneity, because it is low powered in situations where the studies vary greatly in sample size, with some of them being small sized. The Cochrane collaboration percent the number of alternatives derived from the chi-square which quantify inconsistency in effect sizes, such as Q and I2. Sin and Lyubomirsky didn’t use either of these, but instead use the standard chi-square, which is prone to miss problems in inconsistency between studies.

wowBut don’t worry, the results are so wild that serious problems are indicated. Look above to the significance of the chi-square that  Sin and Lyubomirsky report. Have you ever seen anything so highly significant : p<. 0000000000000000002?

Rather than panicking like they should have, Sin and Lyubomirsky simply proceeded to examine moderators of effect size and concluded that most of them did not matter for depressive symptoms, including initial depression status of participants and whether participants individually volunteered to be in the study, rather than being assigned because they were in a particular classroom.

Sin and Lyubomirsky’s moderator analyses are not much help in figuring out what was going wrong. If they had examined quality of the studies and sample size, they would’ve gotten on the right path. But they really don’t have many studies, and so they can’t carefully examine these factors. They are basically left with a very serious warning not to proceed, but do so anyway. Once again, where the hell was the editor and reviewers when they could have saved Sin and Lyubomirsky from embarrassing themselves and misleading readers?

10. Was the likelihood of publication bias assessed?

AMSTAR indicates

An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).

Bolier and colleagues provided a funnel plot of effect sizes in gave a clear indication that small studies with negative or null effects were somehow missing from the studies they had selected for the meta-analysis. Readers with some familiarity meta-analysis can interpret for themselves.

Sin and Lyubomirsky did no such thing. Instead they used Rosenthal’s failsafe N to give readers a false reassurance that hundreds of unpublished null studies of PPIs had to be lurking in drawers in order for their glowing assessment to be unseeded. Perhaps they should be forgiven for using failsafe N because they acknowledged Rosenthal has a consultant. But outside of psychology, experts on meta-analysis reject failsafe N as providing false reassurance.

11. Was the conflict of interest stated?

AMSTAR indicates

Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.

Lyubomirsky had already published The How of Happiness:  A New Approach to Getting the Life You Want. Its extravagant claims prompted a rare display of negativity from within the positive psychology community, an insightful negative review from the editor of Journal of Happiness Studies.

goods to declare_redConflict of interest in the authors – many of whom are also involved in the sale of positive psychology products – of the actual studies was ignored. We certainly know from analyses of studies conducted by pharmaceutical companies that the prospect of financial gain tends to lead to exaggerated effect sizes. Indeed, my colleagues and I were awarded the Bill Silverman award from the Cochrane collaboration for alerting them to its lack of attention to conflict of interest as a formal indicator of risk of bias. The collaboration is now in the process of revising their risk of bias tool to incorporate conflict of interest is a consideration.

Conclusion

omgSin and  Lyubomirsky provides a biased and seriously flawed assessment of  the efficacy of positive psychology interventions. Anyone who uncritical cites this paper is either naïve, careless, or bent on presenting a positive evaluation of positive psychology interventions in defiance of available evidence.  Whatever limitations I pointed out to the meta-analysis of Bolier and colleagues, I prefer it to this one. Yet just watch. I predict Sin and  Lyubomirsky will continue to be cited without acknolwedging Bolier and colleagues. If so, it will add to lots of other evidence of the confirmatory bias and lack of critical thinking within the positive psychology community.

Postscript

Presumably if you’re reading this postscript, you’ve read through my scathing analysis. But I noticed something was wrong in my initial 15 minute casual reading of the meta-analysis after completion of my blog post concerning  about Linda Bolier and colleagues. Among the things I noted was

  1. In their introduction, Sin and Lyubomirsky made positive statements about the efficacy of PPIs based on two underpowered, flawed studies (Fava et al., 2005; Seligman et al., 2006 ) that were outliers in Bolier and colleagues’ analyses. Citing these two studies as positive evidence suggests both prejudgment and a lack of application of critical skills that foreshadowed what followed.
  2. Their method section gave no indication of attention to quality of studies they were going to review. Bad, bad.
  3. Their method section declared that they would use one tailed tests for the significance of effect sizes. Since the 1950s, psychologists consistently rely on two-tailed tests. Unwary readers might except one tailed tests of p<.05 with a more customary two-tailed test would be p<.10  The same results. Reliance on one tailed test is almost always an indication of a bias towards finding significant results or attempts to mislead readers.
  4. The article included no forest plot that would’ve allowed a quick assessment of the distribution of effect sizes, whether they differed greatly, and whether some were outliers. As I analyzed in a earlier blog post, Bolier and colleagues’ inclusion of a forest plot, along with details in the table 1, allowed quick assessment that the overall effect size for positive psychology interventions was strongly influenced by outlier small studies of poor methodological quality.
  5. The wild chi-square concerning heterogeneity was glossed over.
  6. The resounding positive assessment of positive psychology interventions that open the discussion was subsequently contradicted by acknowledgment of some, but not the most serious limitations of the meta-analysis. Other conclusions in the discussion section were not based on any results of the meta-analysis.

I speak only for myself, and not for the journal PLOS One or the other Academic Editors.  I typically take 15 minutes or so to decide whether to send a paper out for review. My perusal of this one would have led to sending it back to the authors, requesting that they attempt to adhere to basic standards for conducting and reporting meta-analyses, before even considering resubmission to me. If they did resubmit, I would check again before even sending out to reviewers. We need to protect reviewers and subsequent readers from meta-analyses that are not only poorly conducted, but that lack transparency in to promoting interventions with undisclosed conflicts of interest.

 

 

Positive psychology interventions for depressive symptoms

POSITIVE THINKINGI recently  talked with a junior psychiatrist about whether she should undertake a randomized trial of positive psychology interventions with depressed primary care patients. I had concerns about whether positive psychology interventions would be acceptable to clinically depressed primary care patients or offputting and even detrimental.

Going back to my first publication almost 40 years ago, I’ve been interested in the inept strategies that other people adopt to try to cheer up depressed persons. The risk of positive psychology interventions is that depressed primary care patients would perceive the exercises as more ineffectual pressures on them to think good thoughts, be optimistic and snap out of their depression. If depressed persons try these exercises without feeling better, they are accumulating more failure experiences and further evidence that they are defective, particularly in the context of glowing claims in the popular media of the power of simple positive psychology interventions to transform lives.  Some depressed people develop acute sensitivity to superficial efforts to make them feel better. Their depression is compounded by their sense of coercion and invalidation of what they are so painfully feeling. This is captured in the hilarious Ren & Stimpy classic

happy happy 2

 

Happy Helmet Joy Joy song video

 

Something borrowed, something blue

By positive psychology interventions, my colleague and I didn’t have in mind techniques that positive psychology borrowed from cognitive therapy for depression. Ambitious positive psychology school-based interventions like the UK Resilience Program incorporate these techniques. They have been validated for use with depressed patients when part of Beck’s cognitive therapy, but are largely ineffective when used with nonclinical populations that are not sufficiently depressed to register an improvement. Rather, we had had in mind interventions and exercises that are distinctly positive psychology.

Dr. Joan Cook, Dr.Beck, and Jim Coyne
Dr. Joan Cook, Dr.Beck, and Jim Coyne

I surveyed the positive psychology literature to get some preliminary impressions, forcing myself to read the Journal of Positive Psychology and even the Journal of Happiness Studies. I sometimes had to take breaks and go see dark movies as an antidote, such as A Most Wanted Man and The Drop, both of which I heartily recommend. I will soon blog about the appropriateness of positive psychology exercises for depressed patients. But this post concerns a particular meta-analysis that I stumbled upon. It is open access and downloadable anywhere in the world. You can obtain the article and form your own opinions before considering mine or double check mine:

Bolier, L., Haverman, M., Westerhof, G. J., Riper, H., Smit, F., & Bohlmeijer, E. (2013). Positive psychology interventions: a meta-analysis of randomized controlled studies. BMC Public Health, 13(1), 119.

I had thought this meta analysis just might be the comprehensive, systematic assessment of the literature for which I searched. I was encouraged that it excluded positive psychology interventions borrowed from cognitive therapy. Instead, the authors sought studies that evaluated

the efficacy of positive psychology interventions such as counting your blessings [29,30], practicing kindness [31], setting personal goals [32,33], expressing gratitude [30,34] and using personal strengths [30] to enhance well-being, and, in some cases, to alleviate depressive symptoms [30].

But my enthusiasm was dampened by the wishy-washy conclusion prominently offered in the abstract:

The results of this meta-analysis show that positive psychology interventions can be effective in the enhancement of subjective well-being and psychological well-being, as well as in helping to reduce depressive symptoms. Additional high-quality peer-reviewed studies in diverse (clinical) populations are needed to strengthen the evidence-base for positive psychology interventions.

Can be? With apologies to Louis Jordan, is they or ain’t they effective? And just why is additional high-quality research needed to strengthen conclusions? Because there are only a few studies or because there are many studies, but mostly of poor quality?

I’m so disappointed when authors devote the time and effort that meta-analysis requires and then beat around the bush such wimpy, noncommittal conclusions.

A first read alerted me to some bad decisions that the authors had made from the outset. Further reads showed me how effects of these decisions were compounded by the poor quality of the literature of which they had to make sense.

I understand the dilemma the authors faced. The positive psychology intervention literature  has developed in collective defiance of established standards for evaluating interventions intended to benefit people and especially interventions to be sold to people who trust they are beneficial. To have something substantive to say about positive psychology interventions, the authors of this meta analysis had to lower their standards for selecting and interpreting studies. But they could have done a better job of integrating acknowledgement of problems in the quality of this literature into their evaluation of it. Any evaluation should come with a prominent warning label about the poor quality of studies and evidence of publication bias.

The meta-analysis

Meta-analyses involve (1) systematic searches of the literature; (2) selection of studies meeting particular criteria; and (3) calculation of standardized effect sizes to allow integration of results of studies with different measures of the same construct. Conclusions are qualified by (4) quality ratings of the individual studies and by (5) calculation of the overall statistical heterogeneity of the study results.

The authors searched

PsychInfo, PubMed and the Cochrane Central Register of Controlled Trials, covering the period from 1998 (the start of the positive psychology movement) to November 2012. The search strategy was based on two key components: there should be a) a specific positive psychology intervention, and b) an outcome evaluation.

They also found additional studies by crosschecking references of previous evaluations of positive psychology interventions.

To be selected, a study had to

  • Be developed within the theoretical tradition of positive psychology.
  • Be a randomized controlled study.
  • Measure outcomes of subjective well-being (such as positive affect), personal well-being (such as hope), or depressive symptoms (Such as Beck Depression Inventory).
  • Have results reported in a peer-reviewed journal.
  • Provide sufficient statistics to allow calculation of standardized effect sizes.

I’m going to focus on evaluation of interventions in terms of their ability to reduce depressive symptoms. But I think my conclusions hold for the other outcomes.

The authors indicated their way of assessing the quality of studies (0 to 6) was based on a count derived from an adaptation of the risk of bias items of the Cochrane collaboration. I’ll discuss their departures from the Cochrane criteria later, but these authors’ six criteria were

  • Adequacy of concealment of randomization.
  • Blinding of subjects to which condition they had been assigned.
  • Baseline comparability of groups at the beginning of the study.
  • Whether there was an adequate power analysis or  at least 50 participants in the analysis.
  • Completeness of follow up data: clear attrition analysis and loss to follow up < 50%.
  • Handling of missing data: the use of intention-to-treat analysis, as opposed to analysis of only completers.

The authors used two indicators to assess heterogeneity

  • The Q-statistic. When significant it calls for rejection of null-hypothesis of homogeneity and indicates that the true effect size probably does vary from study to study.
  • The  I2-statistic, which is a percentage indicating the study-to-study dispersion of effect sizes due to real differences, beyond sampling error.

[I know, this is getting technical, but I will try to explain as we go. Basically, the authors estimated the extent to which the effect size they obtained could generalize back to the individual studies. When individual studies vary very much, an overall effect size for a set of studies can be very different  from any for an individual intervention. So without figuring out the nature of this heterogeneity and resolving it, the effect sizes do not adequately represent individual studies or interventions.]

One way of reducing heterogeneity is to identify outlier studies that have much larger or smaller effect sizes than the rest. These studies can simply be removed from consideration or sensitivity analyses can be conducted, in which analyses are compared that retain or remove outlier studies.

The authors expected big differences across the studies and so adopted the criteria for keeping a study  of Cohen’s d (standardized difference) between intervention and control group of 2.5 standard deviations. That is huge. The average psychological intervention for depression differs from a waitlist or no treatment group by .62, but from another active treatment by only d = .20. How could these authors think that even an effect size of 1.0 with largely nonclinical populations could be expected for positive psychology interventions? They are at risk of letting in a lot of exaggerated and nonreplicable results. But stay tuned.

The authors also examined the likelihood that there was a publication bias in the studies that they were able to find, using funnel plots, the Orwin’s fail-safe number and the Trim and Fill method. I will focus on the funnel plot because it is graphic, but the other approaches provide similar results.  The authors of this meta analysis state

A funnel plot is a graph of effect size against study size. When publication bias is absent, the observed studies are expected to be distributed symmetrically around the pooled effect size.

Hypothetical funnel plot indicating bias CLICK TO ENLARGE
Hypothetical funnel plot indicating bias
CLICK TO ENLARGE

 

Results

At the end of the next two sections, I will conclude that the authors were overly generous in their evaluation of positive psychology interventions. The quality of the available studies precludes deciding whether positive psychology interventions are effective. But don’t accept this conclusion without me having to document my reasons for it. Please read on.

Click to enlarge
Click to enlarge

The systematic search identified 40 articles presenting results of 39 studies. The overall quality ratings of the studies were quite low [See Table 1 in the article]. There was a mean score of 2.5 (SD = 1.25). Twenty studies were rated of low quality (<3), 18 of medium quality (3-4), one received a rating of 5. The studies with the lowest quality had the largest effect sizes (Table 4).

Fourteen effect sizes were available for depressive symptoms. The authors report an overall small effect size of positive psychology interventions on depressive symptoms of .23. Standards for evaluating effect sizes are arbitrary, but this one would generally be considered small.

There was multiple indications  of publication bias, including  funnel plots of these effect sizes, and it was estimated that 5 negative findings were missing. According to the authors

Funnel plots were asymmetrically distributed in such a way that the smaller studies often showed the more positive results (in other words, there is a certain lack of small insignificant studies).

When the effect sizes for the missing studies were imputed (estimated), the adjusted overall effect size for depressive symptoms was reduced to a nonsignificant .19.

To provide some perspective, let’s examine the statistics for approximately the effect size of .20. There is a 56% probability (as opposed to a 50/50 probability) that a person assigned to a positive psychology intervention would be better off than a person assigned to the control group.

Created by Kristoffer Magnusson. http://rpsychologist.com/d3/cohend/
Created by Kristoffer Magnusson. http://rpsychologist.com/d3/cohend/

But let’s give a closer look to a forest plot of the studies with depressive symptoms as an outcome.

As can be seen in the figure below, each study has a horizontal line in the forest plot and most have a square box in the middle. The line represents the 95% confidence interval for the standard mean difference between the positive psychology intervention and its control group, and the darkened square is the mean difference.

forest plot
Click to enlarge

Note that two studies, Fava (2005) and Seligman, study 2 (2006) have long lines with an arrow at the right, but no darkened squares. The arrow indicates the line for each extends beyond what is shown in the graph. The long line for each indicates wide confidence intervals and imprecision in the estimated effect. Implications? Both studies are extreme outliers with large, but imprecise estimates of effect sizes. We will soon see why.

There are also vertical lines in the graph. One is marked 0,00 and indicates no difference between the intervention and control group. If the line for an individual study crosses it, the difference between the intervention and control group was not significant.

Among the things to notice are:

  • Ten of the 14 effect sizes available for depressive symptoms across the 0,00 line indicating that individual effect sizes were not significant.
  • The four lines that don’t cross this line and therefore had significant effects were Fava (2005), Hurley, Mongrain, Seligman (2006, study 2).

Checking Table 2 for characteristics of the studies, we find that Fava compared 10 people receiving the positive psychology intervention to a control group of 10. Seligman had 11 people in the intervention group and 9 in the control group. Hurley is listed as comparing 94 people receiving the intervention to 99 controls. But I checked the actual study and these numbers represent a substantial loss of participants from the 151 intervention and 164 control participants who started the study. Hurly lost 39% of participants from the Time 2 assessment and analyzed only completers, without intent to treat analyses or imputation (which would have been inappropriate anyway because of the high proportion of missing data).

I cannot make sense of Mongrain’s studies being counted as positive. A check with Table 1 indicates that 4 studies with Mongrain as an author were somehow combined. Yet, when I looked them up, one  study reports no significant differences between intervention and control conditions for depression, with the authors explicitly indicated that they failed to replicate Seligman et al (2006). A second study reports

In terms of depressive symptoms, no significant effects were found for time or time x condition. Thus, participant reports of depressive symptoms did not change significantly over time, or over time as a function of the condition that they were assigned to.

A third study reported significant effects for completers, but nonsignificant effects in multilevel modeling analyses that attempted to compensate for attrition. The fourth study  again failed to find that depressive symptoms’ decline over time was a function of which group to which participants were assigned, in multilevel analyses attempting to compensate for attrition.

So, Mongrain’s studies should not be counted as having a positive effect size for depressive symptoms unless perhaps we accept a biased completer analysis over multilevel modeling. We are left with Fava and Seligman’s quite small studies and Hurley’s study relying on completer analyses without adjustment for substantial attrition.

By the authors’ ratings, the quality of these studies was poor. Fava score and Seligman both scored 1 out of 6 in the quality assessments. Hurley scored 2.  Mongrain scored 4 and the other negative studies had a mean score of 2.6. So, any claim from individual studies of positive psychology interventions have an effect on depressive symptoms depend on two grossly underpowered studies and another study with analysis of only completers in the face of substantial attrition. And the positive studies tend to be of lower quality.

worse than itBut the literature concerning positive psychology interventions is worse than it first looks.

The authors’ quality ratings are too liberal.

  • Item 3, Baseline comparability of groups at the beginning of the study, is essential if effect sizes are to be meaningful. But it becomes meaningless if such grossly underpowered studies are included. For instance, it would take a large difference in baseline characteristics of Fava’s 8 intervention versus 8 control participants to be significant. That there were no significant differences in the baseline characteristics is very weak as assurance that individual or combined baseline characteristics did not account for any differences that were observed.
  • Item 4, Whether there was an adequate power analysis or at least 50 participants in the analysis can be met in either of 2 ways. But we don’t have evidence that the power analyses were conducted prior to the conduct of the trial and having at least 50 participants does not reduce bias if there is substantial attrition.
  • Item 5, Completeness of follow up data: clear attrition analysis and loss to follow up < 50%, allows studies with substantial loss to follow up to score positive. Hurly’s loss of over a third of participants who were randomized rules out generalization of results back to the original sample, much less an effect size that can be integrated with other studies that did not lose so many participants.

The authors of this meta analysis chose to “adapt,” rather than simply accept the validated Cochrane Collaboration risk of bias assessment. Seen here, one Cochrane criterion is whether the randomization procedure is described in sufficient detail to decide that the intervention and control group would be comparable except for group assignment. These studies typically did not provide sufficient details of any care having been taken to ensure this or any details whatsoever except that the study was randomized.

Another criterion is whether there is evidence of selective outcome reporting. I would not score any of these studies as demonstrating that all outcomes were reported. The issue is that authors can assess participants with a battery of psychological measures, and then pick those that differed significantly between groups to be highlighted.

The Cochrane Collaboration includes a final criterion, “other sources of bias.” In doing meta analyses of psychological intervention studies, consider investigator allegiance is crucial because the intervention for which the investigator is rooting almost always does better.  My group’s agitation about financial conflicts of interest has won us the Bill Silverman award from the Cochrane Collaboration. The collaboration is now revising its other sources of bias critirion so that conflicts of interest are to be taken into account. Some authors of articles about positive psychology interventions profit immensely from marketing positive psychology merchandise. I am not aware of any of the studies included in the meta analysis having disclosures of conflict of interest.

If you think I am being particularly harsh in my evaluation of positive psychology interventions, you need only to consult my numerous other blog posts about meta analyses and see the consistency with which I apply standards. And I have not even gotten to my pet peeves in evaluating intervention research – overly small cell size and “control groups” that are not clear on what is being controlled.

The number of participants some of these studies is so small that the intended effects of randomization cannot be assured and any positive findings are likely to be false positives. If the number of participants in either the intervention or control group is less than 35, there is less than 50% probability of detecting a moderate sized positive effect, even if it is actually there. Put differently, there is more than 50% probability that any significant finding will be false positive. Inclusion of studies with so few participants undermines the validity of other quality ratings. We cannot tell why Fava or Seligman did not have one more or one fewer participant. These are grossly underpowered studies and adding or dropping a single participant in either group could substantially change results.

Then there is the question of control groups. While some studies simply indicate waitlist, others had an undefined treatment as usual, or no treatment, and a number of others indicate “placebo,” apparently following Seligman et al’s  (2005):

Placebo control exercise: Early memories. Participants were asked to write about their early memories every night for one week.

As Mongrain correctly noted, this is not a “placebo.” Seligman et al. and the studies modeled after it failed to include any elements of positive expectation, support, or attention that are typically provided in conditions labeled “placebo.” Mongrain and her colleagues attempted to provide such elements in their control condition, and perhaps this contributed to their negative findings.

A revised conclusion for this meta-analysis

Instead of the wimpy conclusion of the authors presented in their abstract, I would suggest acknowledgment that

The existing literature does not provide robust support for the efficacy of positive psychology interventions for depressive symptoms. The absence of evidence is not necessarily evidence of an absence of an effect. However, more definitive conclusions await better quality studies with adequate sample sizes and suitable control of possible risk of bias. Widespread dissemination of positive psychology interventions, particularly with glowing endorsements and strong claims of changing lives, is premature in the absence of evidence they are effective.

Can the positive psychology intervention literature be saved from itself?

Studies of positive psychology interventions are conducted, published, and evaluated in a gated community where vigorous peer review is neither sought nor apparently effective in identifying and correcting major flaws in manuscripts before they are published. Many within the positive psychology movement  find this supportive environment an asset, but it has failed to produce a quality literature demonstrating positive interventions can indeed contribute to human well-being. Positive psychology intervention research has been insulated from widely accepted standards for doing intervention research. There is little evidence that any of manuscripts reporting the studies were submitted with completed CONSORT checklists, which are now required by most journals. There’s little evidence of awareness of Cochrane risk of bias assessment or of steps been taking to reduce bias.

In what other area of intervention research are claims for effectiveness so dependent on such small studies of such low methodological quality published in journals in which there is only limited independent peer review and such strong confirmatory bias?

As seen on its Friends of Positive Psychology listserv, the positive psychology community is averse to criticism, even constructive criticism from within its ranks. There is dictatorial one-person rule on the listserv. Dissenters routinely vanish without any due process or notice to the rest of the listserv community, much like under disappearances under a Latin American dictatorship.

There are many in the positive psychology movement who feel that that the purpose of positive psychology research is to uphold the tenets of the movement and show, not test the effectiveness of its interventions for changing lives. Investigators who want to evaluate positive psychology interventions need to venture beyond the safety and support of Journal of Positive Psychology and Journal of Happiness Studies to seek independent peer review, informed by widely accepted standards for evaluating psychological interventions.