Power pose: II. Could early career investigators participating in replication initiatives hurt their advancement?

Participation in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

mind the brain logo

This is the second  blog post concerning the special issue of Comprehensive Results in Social Psychology  devoted to replicating Amy Cuddy’s original power pose study in Psychological Science.

Some things for early career investigators to think about.

Participating in attempts to replicate seriously flawed studies might be seen as bad judgment, when there many more opportunities to demonstrate independent, critical thinking.

I have long argued that there should be better incentives for early career (as well as more senior) investigators (ECRs) participating in efforts to improve the trustworthiness of science.

ECRs should be encouraged -and expected- to engage in post publication peer review and PubPeer and PubMed Commons in which to develop way in which such activity can be listed on the CV.

The Pottery Barn rule should be extended so that ECRs can publish critical commentaries in the journals that publish the original flawed papers. Retraction notices should indicate whose complaints led to the retraction.

Rather than being pressured to publish more underpowered, under-resourced studies, ECRs should be rewarded for research parasite activity. They should be assisted in obtaining data sets from already published studies. With that data, they should conduct exploratory, secondary analyses aimed at understanding what went wrong in larger-scale studies that left them methodologically compromised and with shortfalls in recruitment.

But I wonder if we should counsel ECRs that participating in a multisite replication initiatives like the one directed at the power pose effect might not contribute to the career advancement and may even hurt it.

MturkI’ve been critical of the value of replication initiatives  as the primary means of addressing the own trustworthiness of psychology, particularly in areas with claims of clinical and public health relevance. To add to the other reservations I have, I can point that the necessary economy and efficiency of reliance on MTurk and other massive administrations of experimental manipulations can force the efforts to improve the trustworthiness of psychology into less socially significant and may be less representative areas.

I certainly wouldn’t penalize an early career investigator for involvement in a multisite replication. I appreciate there is room for disagreement with my skepticism about the value of such initiatives. I would recognize the expression of valuation a better research practices that involvement would represent.

But I think early career investigator’s need to consider that some senior investigators and members of hiring and promotion committees (HPCs) might give a low rating of publications coming from such initiatives in judging the candidates potential for original, creative, risk-taking research. That might be even if these committee members appreciate the need to improve the trustworthiness of psychology.

Here are some conceivable comments that could be made in such a committee’s deliberations.

“Why did this candidate get involved in a modest scale study so focused on two saliva assessments of cortisol? Even if it is not their area of expertise, shouldn’t they have consulted the literature and saw how uninformative a pair of assessments of cortisol are, given the well-known problems with cortisol of intra-individual and inter-individual variation in sensitivity to uncontrolled contextual variables?…They should have powered their study to find cortisol differences amidst all the noise.”

“Were they unaware that testosterone levels differ between men and women by a factor of five or six? How do they expect that discontinuity in distributions to be overcome in any statistical analyses combining men and women? What basis was there in the literature suggests that a brief, seemingly trivial manipulation of posture with have such enduring effects on hormones? Why they specifically anticipate differences would be registered in women? Overall, their involvement in this initiative demonstrates a willingness to commit considerable time and resources to ideas that could have been ruled out by a search of the relevant literature.”

lemmengiphy.gif

“There seems to be a lemming quality to this large group of researchers pursuing some bad hypotheses with inappropriate methods. Why didn’t this investigator have the independence of mind to object? Can we expect a similar going with the herd after fashionable topics in research over the next few years?”

“While I appreciate the motivation of this investigator, I believe there was a violation of the basic principle of ‘stop and think before you undertake a study’ that does not bode well for how they will spend their time when faced with the demands of teaching and administration as well as doing research.”

Readers may think that these comments represent horrible, cruel sentiments and would be a great injustice if they influence hiring and decisions. But anyone who is ever been on a hiring and promotion committee knows that they are full of such horrible comments and that such processes are not fair or just or even rational.

 

 

 

Why PhD students should not evaluate a psychotherapy for their dissertation project

  • Things some clinical and health psychology students wish they had known before they committed themselves to evaluating a psychotherapy for their dissertation study.
  • A well designed pilot study addressing feasibility and acceptability issues in conducting and evaluating psychotherapies is preferable to an underpowered study which won’t provide a valid estimate of the efficacy of the intervention.
  • PhD students would often be better off as research parasites – making use of existing published data – rather than attempting to organize their own original psychotherapy study, if their goal is to contribute meaningfully to the literature and patient care.
  • Reading this blog, you will encounter a link to free, downloadable software that allows you to make quick determinations of the number of patients needed for an adequately powered psychotherapy trial.

I so relish the extra boost of enthusiasm that many clinical and health psychology students bring to their PhD projects. They not only want to complete a thesis of which they can be proud, they want their results to be directly applicable to improving the lives of their patients.

Many students are particularly excited about a new psychotherapy about which extravagant claims are being made that it’s better than its rivals.

I have seen lots of fad and fashions come and go, third wave, new wave, and no wave therapies. When I was a PhD student, progressive relaxation was in. Then it died, mainly because it was so boring for therapists who had to mechanically provide it. Client centered therapy was fading with doubts that anyone else could achieve the results of Carl Rogers or that his three facilitative conditions of unconditional positive regard, genuineness,  and congruence were actually distinguishable enough to study.  Gestalt therapy was supercool because of the charisma of Fritz Perls, who distracted us with his showmanship from the utter lack of evidence for its efficacy.

I hate to see PhD students demoralized when their grand plans prove unrealistic.  Inevitably, circumstances force them to compromise in ways that limit any usefulness to their project, and maybe even threaten their getting done within a reasonable time period. Overly ambitious plans are the formidable enemy of the completed dissertation.

The numbers are stacked against a PhD student conducting an adequately powered evaluation of a new psychotherapy.

This blog post argues against PhD students taking on the evaluation of a new therapy in comparison to an existing one, if they expect to complete their projects and make meaningful contribution to the literature and to patient care.

I’ll be drawing on some straightforward analysis done by Pim Cuijpers to identify what PhD students are up against when trying to demonstrate that any therapy is better than treatments that are already available.

Pim has literally done dozens of meta-analyses, mostly of treatments for depression and anxiety. He commands a particular credibility, given the quality of this work. The way Pim and his colleagues present a meta-analysis is so straightforward and transparent that you can readily examine the basis of what he says.

Disclosure: I collaborated with Pim and a group of other authors in conducting a meta-analysis as to whether psychotherapy was better than a pill placebo. We drew on all the trials allowing a head-to-head comparison, even though nobody ever really set out to pit the two conditions against each other as their first agenda.

Pim tells me that the brief and relatively obscure letter, New Psychotherapies for Mood and Anxiety Disorders: Necessary Innovation or Waste of Resources? on which I will draw is among his most unpopular pieces of work. Lots of people don’t like its inescapable message. But I think that if PhD students should pay attention, they might avoid a lot of pain and disappointment.

But first…

Note how many psychotherapies have been claimed to be effective for depression and anxiety. Anyone trying to make sense of this literature has to contend with claims being based on a lot of underpowered trials– too small in sample size to be expected reasonably to detect the effects that investigators claim – and that are otherwise compromised by methodological limitations.

Some investigators were simply naïve about clinical trial methodology and the difficulties doing research with clinical populations. They may have not understand statistical power.

But many psychotherapy studies end up in bad shape because the investigators were unrealistic about the feasibility of what they were undertaken and the low likelihood that they could recruit the patients in the numbers that they had planned in the time that they had allotted. After launching the trial, they had to change strategies for recruitment, maybe relax their selection criteria, or even change the treatment so it was less demanding of patients’ time. And they had to make difficult judgments about what features of the trial to drop when resources ran out.

Declaring a psychotherapy trial to be a “preliminary” or a “pilot study” after things go awry

The titles of more than a few articles reporting psychotherapy trials contain the apologetic qualifier after a colon: “a preliminary study” or “a pilot study”. But the studies weren’t intended at the outset to be preliminary or pilot studies. The investigators are making excuses post-hoc – after the fact – for not having been able to recruit sufficient numbers of patients and for having had to compromise their design from what they had originally planned. The best they can hope is that the paper will somehow be useful in promoting further research.

Too many studies from which effect sizes are entered into meta-analyses should have been left as pilot studies and not considered tests of the efficacy of treatments. The rampant problem in the psychotherapy literature is that almost no one treats small scale trials as mere pilot studies. In a recent blog post, I provided readers with some simple screening rules to identify meta-analyses of psychotherapy studies that they could dismiss from further consideration. One was whether there were sufficient numbers of adequately powered studies,  Often there are not.

Readers take their inflated claims of results of small studies seriously, when these estimates should be seen as unrealistic and unlikely to be replicated, given a study’s sample size. The large effect sizes that are claimed are likely the product of p-hacking and the confirmation bias required to get published. With enough alternative outcome variables to choose from and enough flexibility in analyzing and interpreting data, almost any intervention can be made to look good.

The problem is is readily seen in the extravagant claims about acceptance and commitment therapy (ACT), which are so heavily dependent on small, under-resourced studies supervised by promoters of ACT that should not have been used to generate effect sizes.

Back to Pim Cuijpers’ brief letter. He argues, based on his numerous meta-analyses, that it is unlikely that a new treatment will be substantially more effective than an existing credible, active treatment.  There are some exceptions like relaxation training versus cognitive behavior therapy for some anxiety disorders, but mostly only small differences of no more than d= .20 are found between two active, credible treatments. If you search the broader literature, you can find occasional exceptions like CBT versus psychoanalysis for bulimia, but most you find prove to be false positives, usually based on investigator bias in conducting and interpreting a small, underpowered study.

You can see this yourself using the freely downloadable G*power program and plug in d= 0.20 for calculating the number of patients needed for a study. To be safe, add more patients to allow for the expectable 25% dropout rate that has occurred across trials. The number you get would require a larger study than has ever been done in the past, including the well-financed NIMH Collaborative trial.

G power analyses

Even more patients would be needed for the ideal situation in which a third comparison group allowed  the investigator to show the active comparison treatment had actually performed better than a nonspecific treatment that was delivered with the same effectiveness that the other had shown in earlier trials. Otherwise, a defender of the established therapy might argue that the older treatment had not been properly implemented.

So, unless warned off, the PhD student plans a study to show not only that now hypothesis can be rejected that the new treatment is no better than the existing one, but that in the same study the existing treatment had been shown to be better than wait list. Oh my, just try to find an adequately powered, properly analyzed example of a comparison of two active treatments plus a control comparison group in the existing published literature. The few examples of three group designs in which a new psychotherapy had come out better than an effectively implemented existing treatment are grossly underpowered.

These calculations so far have all been based on what would be needed to reject the null hypothesis of no difference between the active treatment and a more established one. But if the claim is that the new treatment is superior to the existing treatment, our PhD student now needs to conduct a superiority trial in which some criteria is pre-set (such as greater than a moderate difference, d= .30) and the null hypothesis is that the advantage of the new treatment is less. We are now way out into the fantasyland of breakthrough, but uncompleted dissertation studies.

Two take away messages

 The first take away message is that we should be skeptical of claims of the new treatment is better than past ones except when the claim occurs in a well-designed study with some assurance that it is free of investigator bias. But the claim also has to arise in a trial that is larger than almost any psychotherapy study is ever been done. Yup, most comparative psychotherapy studies are underpowered and we cannot expect robust claims are robust that one treatment is superior to another.

But for PhD students been doing a dissertation project, the second take away message is that they should not attempt to show that one treatment is superior to another in the absence of resources they probably don’t have.

The psychotherapy literature does not need another study with too few patients to support its likely exaggerated claims.

An argument can be made that it is unfair and even unethical to enroll patients in a psychotherapy RCT with insufficient sample size. Some of the patients will be randomized to the control condition that is not what attracted them to the trial. All of the patients will be denied having been in a trial makes a meaningful contribution to the literature and to better care for patients like themselves.

What should the clinical or health psychology PhD student do, besides maybe curb their enthusiasm? One opportunity to make meaningful contributions to literature by is by conducting small studies testing hypotheses that can lead to improvement in the feasibility or acceptability of treatments to be tested in studies with more resources.

Think of what would’ve been accomplished if PhD students had determined in modest studies that it is tough to recruit and retain patients in an Internet therapy study without some communication to the patients that they are involved in a human relationship – without them having what Pim Cuijpers calls supportive accountability. Patients may stay involved with the Internet treatment when it proves frustrating only because they have the support and accountability to someone beyond their encounter with an impersonal computer. Somewhere out there, there is a human being who supports them and sticking it out with the Internet psychotherapy and will be disappointed if they don’t.

A lot of resources have been wasted in Internet therapy studies in which patients have not been convinced that what they’re doing is meaningful and if they have the support of a human being. They drop out or fail to do diligently any homework expected of them.

Similarly, mindfulness studies are routinely being conducted without anyone establishing that patients actually practice mindfulness in everyday life or what they would need to do so more consistently. The assumption is that patients assigned to the mindfulness diligently practice mindfulness daily. A PhD student could make a valuable contribution to the literature by examining the rates of patients actually practicing mindfulness when the been assigned to it in a psychotherapy study, along with barriers and facilitators of them doing so. A discovery that the patients are not consistently practicing mindfulness might explain weaker findings than anticipated. One could even suggest that any apparent effects of practicing mindfulness were actually nonspecific, getting all caught up in the enthusiasm of being offered a treatment that has been sought, but not actually practicing mindfulness.

An unintended example: How not to recruit cancer patients for a psychological intervention trial

Randomized-controlled-trials-designsSometimes PhD students just can’t be dissuaded from undertaking an evaluation of a psychotherapy. I was a member of a PhD committee of a student who at least produced a valuable paper concerning how not to recruit cancer patients for a trial evaluating problem-solving therapy, even though the project fell far short of conducting an adequately powered study.

The PhD student was aware that  claims of effectiveness of problem-solving therapy reported in in the prestigious Journal of Consulting and Clinical Psychology were exaggerated. The developer of problem-solving therapy for cancer patients (and current JCCP Editor) claimed  a huge effect size – 3.8 if only the patient were involved in treatment and an even better 4.4 if the patient had an opportunity to involve a relative or friend as well. Effect sizes for this trial has subsequently had to be excluded from at least meta-analyses as an extreme outlier (1,2,3,4).

The student adopted the much more conservative assumption that a moderate effect size of .6 would be obtained in comparison with a waitlist control. You can use G*Power to see that 50 patients would be needed per group, 60 if allowance is made for dropouts.

Such a basically inert control group, of course, has a greater likelihood of seeming to demonstrate a treatment is effective than when the comparison is another active treatment. Of course, such a control group also has the problem of not allowing a determination if it was the active ingredient of the treatment that made the difference, or just the attention, positive expectations, and support that were not available in the waitlist control group.

But PhD students should have the same option as their advisors to contribute another comparison between an active treatment and a waitlist control to the literature, even if it does not advance our knowledge of psychotherapy. They can take the same low road to a successful career that so many others have traveled.

This particular student was determined to make a different contribution to the literature. Notoriously, studies of psychotherapy with cancer patients often fail to recruit samples that are distressed enough to register any effect. The typical breast cancer patient, for instance, who seeks to enroll in a psychotherapy or support group trial does not have clinically significant distress. The prevalence of positive effects claimed in the literature for interventions with cancer patients in published studies likely represents a confirmation bias.

The student wanted to address this issue by limiting patients whom she enrolled in the study to those with clinically significant distress. Enlisting colleagues, she set up screening of consecutive cancer patients in oncology units of local hospitals. Patients were first screened for self-reported distress, and, if they were distressed, whether they were interested in services. Those who met both criteria were then re-contacted to see if that be willing to participate in a psychological intervention study, without the intervention being identified. As I reported in the previous blog post:

  • Combining results of  the two screenings, 423 of 970 patients reported distress, of whom 215 patients indicated need for services.
  • Only 36 (4% of 970) patients consented to trial participation.
  • We calculated that 27 patients needed to be screened to recruit a single patient, with 17 hours of time required for each patient recruited.
  • 41% (n= 87) of 215 distressed patients with a need for services indicated that they had no need for psychosocial services, mainly because they felt better or thought that their problems would disappear naturally.
  • Finally, 36 patients were eligible and willing to be randomized, representing 17% of 215 distressed patients with a need for services.
  • This represents 8% of all 423 distressed patients, and 4% of 970 screened patients.

So, the PhD student’s heroic effort did not yield the sample size that she anticipated. But she ended up making a valuable contribution to the literature that challenges some of the basic assumptions that were being made about how cancer patients in psychotherapy research- that all or most were distressed. She also ended up producing some valuable evidence that the minority of cancer patients who report psychological distress are not necessarily interested in psychological interventions.

Fortunately, she had been prepared to collect systematic data about these research questions, not just scramble within a collapsing effort at a clinical trial.

Becoming a research parasite as an alternative to PhD students attempting an under-resourced study of their own

research parasite awardPsychotherapy trials represent an enormous investment of resources, not only the public funding that is often provided for them,be a research parasite but in the time, inconvenience, and exposure to ineffective treatments experienced by patients who participate in the trials. Increasingly, funding agencies require that investigators who get money to do a psychotherapy study some point make their data available for others to use.  The 14 prestigious medical journals whose editors make up the International Committee of Medical Journal Editors (ICMJE) each published in earlier in 2016 a declaration that:

there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.

These statements proposed that as a condition for publishing a clinical trial, investigators would be required to share with others appropriately de-identified data not later than six months after publication. Further, the statements proposed that investigators describe their plans for sharing data in the registration of trials.

Of course, a proposal is only exactly that, a proposal, and these requirements were intended to take effect only after the document is circulated and ratified. The incomplete and inconsistent adoption of previous proposals for registering of  trials in advance and investigators making declarations of conflicts of interest do not encourage a lot of enthusiasm that we will see uniform implementation of this bold proposal anytime soon.

Some editors of medical journals are already expressing alarmover the prospect of data sharing becoming required. The editors of New England Journal of Medicine were lambasted in social media for their raising worries about “research parasites”  exploiting the availability of data:

a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

 Richard Lehman’s  Journal Review at the BMJ ‘s blog delivered a brilliant sarcastic response to these concerns that concludes:

I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.

However, lost in all the outrage that The New England Journal of Medicine editorial generated was a more conciliatory proposal at the end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

The PLOS family of journals has gone on record as requiring that all data for papers published in their journals be publicly available without restriction.A February 24, 2014 PLOS’ New Data Policy: Public Access to Data  declared:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Many of us are aware of the difficulties in achieving this lofty goal. I am holding my breath and turning blue, waiting for some specific data.

The BMJ has expanded their previous requirements for data being available:

Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015 May 7;350:h2373.

The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices. The data transparency revolution is gathering pace.

I am no longer heading dissertation committees after one that I am currently supervising is completed. But if any PhD students asked my advice about a dissertation project concerning psychotherapy, I would strongly encourage them to enlist their advisor to identify and help them negotiate access to a data set appropriate to the research questions they want to investigate.

Most well-resourced psychotherapy trials have unpublished data concerning how they were implemented, with what bias and with which patient groups ending up underrepresented or inadequately exposed to the intensity of treatment presumed to be needed for benefit. A story awaits to be told. The data available from a published trial are usually much more adequate than then any graduate student could collect with the limited resources available for a dissertation project.

I look forward to the day when such data is put into a repository where anyone can access it.

until youre done In this blog post I have argued that PhD students should not take on responsibility for developing and testing a new psychotherapy for their dissertation project. I think that using data from existing published trials is a much better alternative. However, PhD students may currently find it difficult, but certainly not impossible to get appropriate data sets. I certainly am not recruiting them to be front-line infantry in advancing the cause of routine data sharing. But they can make an effort to obtain such data and they deserve all support they can get from their dissertation committees in obtaining data sets and in recognizing when realistically that data are not being made available, even when the data have been promised to be available as a condition for publishing. Advisors, please request the data from published trials for your PhD students and protect them from the heartache of trying to collect such data themselves.

 

Hazards of pointing out bad meta-analyses of psychological interventions

 

A cautionary tale

Psychology has a meta-analysis problem. And that’s contributing to its reproducibility problem. Meta-analyses are wallpapering over many research weaknesses, instead of being used to systematically pinpoint them. – Hilda Bastian

  • Meta-analyses of psychological interventions are often unreliable because they depend on a small number of poor quality, underpowered studies.
  • It is surprisingly easy to screen the studies being assembled for a meta-analysis and quickly determine that the literature is not suitable because it does not have enough quality studies. Apparently, the authors of many published meta-analyses did not undertake such a brief assessment or were undeterred by it from proceeding anyway.
  • We can’t tell how many efforts at meta-analyses were abandoned because of the insufficiencies of the available literature. But we can readily see that many published meta-analyses offer summary effect sizes for interventions that can’t be expected to be valid or generalizable.
  • We are left with a glut of meta-analyses of psychological interventions that convey inflated estimates of the efficacy of interventions and on this basis, make unwarranted recommendations that broad classes of interventions are ready for dissemination.
  • Professional organizations and promoters of particular treatments have strong vested interests in portraying their psychological interventions as effective. They will use their resources to resist efforts to publish critiques of their published meta-analyses and even fight the teaching of basic critical skills for appraising meta-analysis.
  • Publication of thorough critiques has little or no impact on the subsequent citation or influence of meta-analyses. Furthermore, such critiques are largely ignored.
  • Debunking bad meta-analyses of psychological interventions can be frustrating at best, and, at worst, hazardous to careers.
  • You should engage in such activities if you feel it is right to do so. It will be a valuable learning experience. And you can only hope that someone at some point will take notice.

3 Simple screening questions to decide whether a meta analysis is worth delving into.

I’m sick and tired of spending time trying to make sense of meta-analyses of psychological interventions that should have been dismissed out of hand. The likelihood of any contribution to the literature was ruled out by repeated, gross misapplication of meta-analysis by some authors  or, more often, the pathetic quality and quantity of literature available for meta-analysis.

Just recently, Retraction Watch reported the careful scrutiny of a pair of meta-analyses by two psychology graduate students, Paul-Christian Bürkner and Donald Williams. Coverage in Retraction Watch focused on their inability to get credit for the retraction of one of the papers that had occurred because of their critique.

But I was more saddened by their having spent so much time on the second meta-analysis, “A meta-analysis and theoretical critique of oxytocin and psychosis: Prospects for attachment and compassion in promoting recovery, The authors of this meta-analysis  had themselves acknowledged the literature was quite deficient, but proceeded anyway and published a paper that has already been cited 13 times.

The graduate students, as well as the original authors could simply have taken a quick look at the study’s Table 1: the seven included studies had from 9 to 35 patients exposed to oxytocin.  The study  with 35 patients was an outlier. This study also provided only a within-subject effect size, which should not have been entered into the meta-analysis with the results of the other studies.

The six remaining studies had an average sample size of 14 in the intervention group. I doubt that anyone would have undertaken a study of psychotic patients inhaling oxytocin to generate a robust estimate of effect size with only 9, 10, or 11 patients. It’s unclear why the original investigators stopped accruing patients when they did.

Without having specified their sample size ahead of time (there is no evidence that the investigators did), original investigators could simply have stopped when a peek at the data revealed statistically significant findings or they could have kept accruing patients when a peek revealed only nonsignificant findings. Or they could have dropped some patients. Regardless, the reported samples are so small that adding only one or two more patients could substantially change the results.

Furthermore, if the investigators were struggling to get enough patients, the study was probably under-resourced and compromised in other ways. Small sample sizes compound the problems posed by poor methodology and reporting. The authors conducting this particular meta-analysis could only confirm for one of the studies that data from all patients who were randomized were analyzed, i.e., that there was intention to treat analyses. Reporting was that bad, and worse. Again, think of the effects of the loss of data from the analysis of one or a few patients- it could be decisive for results –  particularly when the loss was not random.

Overall, the authors of the original meta-analysis conceded that the seven studies they were entering into the meta-analyses had a high risk of bias.

It should be apparent that authors cannot take a set of similarly flawed studies and integrate their effect sizes with a meta-analysis and expect to get around the limitations. Bottom line – readers should just dismiss the meta-analysis and get on to other things…

These well-meaning graduate students were wasting their time and talent carefully scrutinizing a pair of meta-analyses that were unworthy of their sustained attention. Think of what they could be doing more usefully. There is so much other bad science out there to uncover.

Everybody – I recommend not putting a lot of effort into analyzing obviously flawed meta-analysis, other than maybe posting a warning notice on PubMed Commons  or ranting in a blog post or both.

Detecting Bad Meta Analyses

Over a decade ago, I developed some quick assessment tools by which I can reliably determine that some meta-analyses are not worth our attention. You can see more about the quickly answered questions here.

To start such an assessment, directly to the table describing studies that were included in a published meta-analysis.

  1. Ask: “To what extent are the studies dominated by cell sample sizes less than 35?” Studies of this size have only a power of .50 to detect a moderate size effect. So, even if an effect were present, it would only be detected 50% of the time of all studies were being reported.
  2. Next, check to see whether whoever did the meta-analysis rated the included studies for risk of bias and how, if at all, risk of bias was taken into account in the meta-analyses.
  3. Finally, does the meta analysis adequately deal with clinical heterogeneity of included studies? Is there a basis for giving a meaningful interpretation to a single summary effect size?

Combining studies may be inappropriate for a variety of the following reasons: differences in patient eligibility criteria in the included trials, different interventions and outcomes, and other methodological differences or missing information.  Moher et al., 1998

I have found this quick exercise often reveals that meta-analyses of psychological interventions are dominated by underpowered studies of low methodological quality that produce positive effects for interventions at a greater rate than would be expected. There is little reason to proceed to calculate a summary effect size.

Pothole-FinalThe potholed road from a presentation to a publication.

My colleagues and I applied these criteria in a 2008 presentation to a packed audience at the European Health Psychology Conference in Bath. My focus was Undertook a similar exercise with four meta-analyses of behavioral interventions for adults (Dixon, Keefe, Scipio, Perri, & Abernethy, 2007; Hoffman, Papas, Chatkoff, & Kerns, 2007 ; Irwin, Cole, & Nicassio, 2006; and Jacobsen, Donovan, Vadaparampil, & Small, 2007) that appeared in a new section of Health Psychology, Evidence Based Treatment Reviews.

A sampling of what we found::

Irwin et al. The Irwin et al meta analysis had the stated objective of

comparing responses in studies that exclusively enrolled persons who were 55 years of age or older versus outcomes in randomized controlled trials that enrolled adults who were, on average, younger than 55 years of age(p. 4).

A quick assessment revealed exclusion of small trials (n < 35) would have eliminated all studies of older adults; five studies included 15 or fewer participants per condition. For the studies including younger adults, only one of the 15 studies would have remained.

Hoffman et al. We found that 17 of the 22 included fell below n = 35 per group. Response to our request, the authors graciously shared a table of the methodological quality of the included studies.

Intervention and control groups were not comparable In 60% of the studies on key variables at baseline.

Less than half provided adequate information concerning number of patients enrolled, treatment drop-out and reasons for drop-outs.

Only 15% of trials provided intent-to-treat analyses.

In a number of studies, the psychological intervention was part of the multicomponent package so that its unique contribution could not be determined. Often the psychological intervention was minimal. For instance, one study noted: “a lecture to give the patient an understanding that ordinary physical activity would not harm the disk and a recommendation to use the back and bend it.”

The only studies comparing a psychological intervention to an active control condition consisted of three underpowered studies into in which effects of the psychological component cannot be separated from the rest of the package in which it was embedded. In one of the studies, massage was the psychological intervention, but in another, it was the control group.

Nonetheless,  Hoffman et al. concluded ““The robust nature of these findings should encourage confidence among clinicians and researchers alike.”

As I readily demolished the meta-analyses  to the delight of the audience, I remarked something to the effect that I’m glad the editor of Health Psychology is not here to hear what I am saying about articles published in the journal he edits.

But Robert Kaplan was there. He invited me for a beer as I left the symposium. He said that such critical probing was sorely lacking in the journal. He invited that my colleagues and I submit an invited article. Eventually it would be published as:

Coyne JC, Thombs BD, Hagedoorn M. Ain’t necessarily so: Review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology. 2010 Mar;29(2):107.

However, Kaplan first had an Associate Editor send out the manuscript for review. The manuscript was rejected  based on a pair of reviews that were not particularly informative . One reviewer stated:

The authors level very serious accusations against fellow scientists and claim to have identified significant shortcomings in their published work. When this is done in public, the authors must have done their homework, dotted all the i’s, and crossed all the t’s. Instead, they reveal “we do not redo these meta-analyses or offer a comprehensive critique, but provide a preliminary evaluation of the adequacy of the conduct, reporting and clinical recommendations of these meta-analyses”. To be frank, this is just not enough when one accuses colleagues of mistakes, poor judgment, false inferences, incompetence, and perhaps worse.

In what he would later describe as the only time he did this in his term as editor of Health Psychology, Bob Kaplan overruled the unanimous recommendations of his associate editor and the two reviewers. He accepted a revision of our manuscript in which we try to be clearer about the bases of our judgments.

According to Google Scholar, our “Ain’t necessarily so…” has been cited 53 times. Apparently it had little effect on the reception of the four meta-analyses. Hoffman et al. has been cited 599 times.

From a well-received workshop to a workshop canceled in order to celebrate a bad meta-analysis.

Mariet Hagedorn and I gave a well-received workshop at the annual meeting of The Society for Behavioral Medicine the next year. A member of SBM’s Evidence-based Behavioral Medicine Committee invited us to their committee meeting held immediately after the workshop. We were invited to give the workshop again in two years. I also became a member of the committee. I offered to be involved in future meta-analyses, learning that a number were planned.

I actually thought that I was involved in a meta-analysis of interventions for depressive symptoms among cancer patients. I immediately identified a study of problem-solving therapy for cancer patients that had such improbably large effect sizes that should be excluded from any meta-analysis as an extreme outlier. The suggestion was appreciated.

But I heard nothing further about the meta-analyses and to I was contacted by one of the authors who said that my permission was needed to be acknowledged in the accepted manuscript. I refused. When I finally saw the published version of the manuscript in the prestigious Journal of the National Cancer Institute, I published a scathing critique, which you can read here. My critique has so far been cited once, the meta-analysis in eighty times.

Only a couple of months before our workshop had been scheduled to occur I was told it was canceled in order to clear the schedule for full press coverage of a new meta-analysis. I only learned of this when I emailed the committee concerning the specific timing of the workshop.  The reply came from the first author of the new meta-analysis.

I have subsequently made the case that that meta-analysis was horribly done and horribly misleading of consumers in two blog posts:

Faux Evidence-Based Behavioral Medicine at Its Worst (Part I)

Faux Evidence-Based Behavioral Medicine Part 2

Some highlights:

The authors boasted of “robust findings” of “substantial rigor” in a meta-analysis that provided “strong evidence for psychosocial pain management approaches.” They claimed their findings supported the “systematic implementation” of these techniques.

The meta-analysis depended heavily on small trials. Of the 38 trials, 19 studies had less than 35 patients in the intervention or control group and so would be excluded with application of this criterion.

Some of the smaller trials were quite small. One had 7 patients receiving an education intervention;  another had 10 patients getting hypnosis; another, 15 patients getting education; another, 15 patients getting self hypnosis; and still another, 8 patients getting relaxation and eight patients getting CBT plus relaxation.

Two of what were by far the largest trials should have been excluded because they involved complex intervention. Patients received telephone-based collaborative care, which had a number of components, including support for adherence to medication.

It appears that listening to music, being hypnotized during a medical procedure, and being taught self hypnosis over 52 sessions, are all under the rubric of skills training. Similarly, interactive educational sessions are considered equivalent to passing out informational materials and simply pamphleteering.

But here’s what most annoyed me about clinical and policy decisions being made on the basis of this meta-analysis:

Perhaps most importantly from a cancer pain control perspective, there was no distinguishing of whether the cancer pain was procedural, acute, or chronic. These types of pain take very different management strategies. In preparation for surgery or radiation treatment, it might be appropriate to relax or hypnotize the patient or provide soothing music. The efficacy could be examined in a randomized trial. But the management of acute pain is quite different and best achieved with medication. Here is where the key gap exists between the known efficacy of medication and the poor control in the community, due to professional and particularly patient attitudes. Control of chronic pain, months after any painful procedures, is a whole different matter, and based on studies of noncancer pain, I would guess that here is another place for psychosocial intervention, but that should be established in randomized trials.

shushedGetting shushed about the sad state of couples interventions for cancer patients research

One of the psychologists present at the SBM meeting published a meta-analysis of couples interventions   in which I was thanked for my input in an acknowledgment. I did not give permission and this notice was subsequently retracted.

Ioana Cristea and Nilufer Kafescioglu and I subsequently submitted a critique to Psycho-Oncology. We were initially told it would be accepted as a letter to the editor, but then it was subject to an extraordinary six uninformative reviews and rejected. The article that we critiqued was given special status as a featured article and distributed free by the otherwise pay walled journal.

A version of our critique was relegated to a blog post.

The complicated politics of meta-analyses supported by professional organizations.

Starting with our “Ain’t necessarily so..” effort, we were taking aim at meta-analyses making broad, enthusiastic claims about the efficacy and readiness for dissemination of psychological interventions. Society for Behavioral Medicine was enjoying a substantial increase in membership, but like other associations dominated by psychologists, the new members were clinicians, not primarily academic researchers. SBM wanted to offer a branding of “evidence-based” to the psychological interventions for which the clinicians were seeking reimbursement. At the time, insurance companies were challenging that licensed psychologists would get reimbursed for psychological interventions that would not administered to patients with psychiatric diagnoses.

People involved with the governance of SBM at the time cannot help but be aware of an ugly side to the politics back then. A small amount of money had been given by NCI to support meta-analyses and it was quite a struggle to control its distribution. That the SBM-sponsored meta-analyses were oddly published in the APA journal, Health Psychology, rather than SBM’s Annals of Behavioral Medicine reflected the bid for presidency of APA’s Division of Health Psychology by someone who had been told that she could not run for president of SBM. But worse, there was a lot of money and undeclared conflicts of interest in play.

Someone originally involved in the meta-analysis of interventions for depressive symptoms among cancer patients had received a $10 million grant from Pfizer to develop a means of monitoring cancer surgeons’ inquiring about psychological distress and their offering of interventions. The idea (which was actually later mandated) was that cancer surgeons could not close their electronic records until they had indicated that they had asked the patient about psychological distress. If patient reported distress, the surgeons had to indicate what intervention was offered to the patient. Only then could they close the medical record. Of course, these requirements could be met simply by asking if a breast cancer patient was distressed and offering her antidepressant without any formal diagnosis or follow-up. These procedures were mandated as part of accreditation of facilities providing cancer care.

Psycho-Oncology, the journal with which we skirmished about the meta-analysis of couples interventions was the official publication of the International Psycho-Oncology Society, another organization dominated by commission seeking reimbursement for services to cancer patients.

You can’t always get what you want.

I nonetheless encourage others, particularly early career investigators, to take the tools that I offer. Please scrutinize meta-analyses that otherwise would have clinical and public policy recommendations attached to their findings. You may have trouble getting published, and you will be slowly disappointed if you expect to influence the reception of already published meta-analysis. You can always post your critiques at PubMed Commons.

You will learn important skills and the politics of trying to publish critiques of papers that are protected as having been “peer reviewed.” If enough of you do this and visibly complain about how ineffectual your efforts have been, we may finally overcome the incumbent advantage and protection from further criticism that goes with getting published.

And bloggers like myself and Hilda Bastian will recognize you and express appreciation.

 

 

Keeping zombie ideas about personality and health awalkin’: A teaching example

Reverse engineer my criticisms of this article and you will discover a strategy to turn your own null findings into a publishable paper.

TYPE-D-HeartcurrentsHere’s a modest little study with null findings, at least before it got all gussied up for publication. It has no clear-cut clinical or public health implications. Yet, it is valuable as a teaching example showing how such studies get published. That’s why I found it interesting enough to blog about it at length.

 

van de Ven, M. O., Witteman, C. L., & Tiggelman, D. (2013). Effect of Type D personality on medication adherence in early adolescents with asthma. Journal of Psychosomatic Research, 75(6), 572-576. Abstract available here and fulltext here]

As I often do, I am going to get quite critical in this blog post, maybe even making some readers wince. But if you hang in there, you will see some strategies for publishing negative results as if they were positive that are widely used throughout personality, health, and positive psychology. Your critical skills will be sharpened, but you will also be able to reverse engineer my criticisms to get papers with null findings published.

Read on and you’ll see things that the reviewers at Journal of Psychosomatic Research apparently did not see, nor the editors, but they should have.  I have emailed the editors inviting them to join in this discussion and I am expecting them to respond. I have had lots of dealings with them and actually find them to be quite reasonable fellows. But peer review is imperfect, and one of the good things about blogging is that I can get the space to call out when it fails us.

The study examined whether some measures of negative emotion predicted adherence in early adolescents with asthma. A measure of negative affectivity (sample item: “I often make a fuss about unimportant things”) and what was termed social inhibition (sample item “I would rather keep other people at a distance”) were examined separately and when combined in a categorical measure of Type D personality (the D in Type D stands for distress).

Type D personality studies were once flourishing, even getting coverage in Time and Newsweek and discussion by Dr. Oz.  The claim was that a Type D personality predicted death among congestive heart failure patients so well that clinicians should begin screening for it. Type D was supposed to be a stable personality trait, so it was not  clear what clinicians could do with the information from screening. But I will be discussing in a later blog post why the whole area of research can itself be declared dead because of fundamental, inescapable problems in the conception and measurement of Type D. When I do that, I will draw on an article co-authored with Niels de Voorgd,  “Are we witnessing the decline effect in the Type D personality literature?”

John Ioannidis providing an approving commentary on my paper with Niels, with the provocative title of “Scientific inbreeding and same-team replication: Type D personality as an example.” Among the ideas attributable to Ioannidis are that most positive findings are false, as well as that most “discoveries” are subsequently proven to be false or at least exaggerated. He calls for a greater value being given to replication, rather than discovery.

Yet in his commentary on our paper, he uses the Type D personality literature as a case example of how the replication process can go awry. A false credibility for a hypothesis is created by false replications. He documented is significant inbreeding of investigators of type D personality: a quite small number of connected investigators are associated with studies with statistically improbable positive findings. And then he introduced some concepts that can be used to understand processes by which the small group could have undue influence on replication attempts by others:

… Obedient replication, where investigators feel that the prevailing school of thought is so dominant that finding consistent results is perceived as a sign of being a good scientist and there is no room for dissenting results and objections; or obliged replication, where the proponents of the original theory are so strong in shaping the literature and controlling the publication venues that they can largely select and mold the results, wording, and interpretation of studies eventually published.

Ioannidis’ commentary also predicted that regardless of any merits, our arguments would be studiously ignored and even suppressed by proponents of Type D personality. Vested interests use the review process to do that with articles that are inconvenient and embarrassing. Reviewing manuscripts has its advantages in terms of controlling the content of what is ultimately published.

Don’t get me wrong. Niels and I really did not expect everyone to immediately stop doing Type D research just because we published this article. After all, a lot of data have already been collected. In Europe, where most Type D personality data get collected, PhD students are waiting to publish their Type P articles in order to complete their dissertations.

We were very open to having Type D personality researchers pointing out why we wrong very wrongwere wrong, very wrong, and even stupidly wrong. But that is not what we are not seeing. Instead, it is like our article never appeared, with little trace of it in terms of citations even in, ah,  Journal of Psychosomatic Research, where our article and Ioannidis’ commentary appeared. According to ISI Web of Science, our article has been cited an overall whopping 6 times as of April 2014. And there have been lots more Type D studies published since our article first appeared.

Anyway, the authors of the study under discussion adopted what has become known as the “standardized method” (that means that they don’t have to justify it) for identifying “categorical” Type D personality. They took their two continuous measures of negative affectivity and social inhibition and split (dichotomized) them. They then crossed them, creating a four cell, 2 x 2 matrix.

Chart One2

 Next, they then selected out the high/high quadrant for comparison to the three other groups combined as one.

Chart 2

So, the authors made the “standardized” assumption that only the difference between a high/high group and everyone else was interesting. That means that persons who are low/low will be treated just the same as persons who are high in negative affectivity and low in social inhibition. Those who were low in negative affectivity but high in social inhibition are simply treated the same as those who are low on both variables. The authors apparently did not even bother to check– no one usually does– whether some of the people who were high in negative affectivity and low in social inhibition actually had higher scores on negative affectivity than those assigned to the high/high group.

I have been doing my own studies and reviews of personality and abnormal behavior for decades. I am not aware of any other example where personality types are created in which the high/high group is compared to everybody else lumped together. As we will see in a later blog, there are lots of reasons not to do this, but for Type D personality, it is the “standardized” method.

Adherence was measured twice in this study. At one point we readers are told that negative emotion variables were also assessed twice, but the second assessment never comes up again.

The abstract concludes that

categorical Type D personality predicts medication adherence of adolescents with asthma over time, [but] dimensional analyses suggest this is due to negative affectivity only, and not to the combination of negative affectivity and social inhibition.

Let’s see how Type D personality was made to look like a predictor and what was done wrong to achieve this. To enlarge Table 2 just below, double click on it.

table page TypeDJPR-page-0

Some interesting things about Table 2 that reviewers apparently missed:

  • At time T1, adherence was not related to negative affectivity, social inhibition, or Type D personality. There is not much prediction going on here.
  • At time T2, adherence was related to the earlier measured negative affectivity, but not to social inhibition or Type D personality.

Okay, if the authors were searching for significant associations, we have one, only one, here. But why should we ignore the failure of personality variables to predict adherence measured at the same time and concentrate on the prediction of later adherence? Basically, the authors have examined 2×3=6 associations, and seem to be getting ready to make a fuss about the one that proved significant, but was not predicted to stand alone.

Most likely this statistical significance is due to chance– it certainly was not replicated in same-time assessments of negative affectivity and adherence at T1. But this Association seems to be the only basis of claiming one of these negative emotion variables are actually predictors.

  • Adherence at time T2 is strongly predicted by adherence at time T1.

The authors apparently don’t consider this particularly interesting, but it is the strongest association in the data set. They want instead to predict change in adherence from T1 to T2 from trait negative emotion. But why should change in the relatively stable adherence be predicted by negative emotion when negative emotion does not predict adherence measured at the same time?

We need to keep in mind that these adolescents have been diagnosed with diabetes for a while. They are being assessed for adherence at two arbitrary time points. This no indication that something has happened in between those points that might strongly affect their adherence. So, we are trying to predict fluctuations in a relatively stable adherence from a trait, not any upward or downward spiral.

Next, some things we are not told that might further change our opinions about what the authors say is going on in their study.

magician_rabbit_hatLike pulling a rabbit out of a hat, the authors suddenly tell us that they measured self-reported depressive symptoms. The introduction explains this article is about negative affectivity, social inhibition or Type D personality, but only mentions depression in passing. So, depression was never given the explanatory status that the authors give to these other three variables. Why not?

Readers should have been shown the correlation of depression with the other three negative emotion variables. We could expect from a large literature that the correlation is quite high, probably as high as their respective reliabilities allow—as good, or as bad as it gets.

This no particular reason why this study could not have focused on depressive symptoms as predictors of later adherence, but maybe that story would not have been so interesting, in terms of results.

Actually, most of the explanations offered in the introduction as to why measures of negative emotion should be related to adherence would seem to apply to depression. Just go back to the explanations and substitute depression for whatever variables being discussed. See, doesn’t depression work as well?

One of the problems in using measures of negative emotion to predict other things is that these measures are related so much to each other that we can’t count on them to measure only the variable we are trying to emphasize and not something else.

Proponents of Type D personality like these authors want to assert that their favored variable does something that depression does not do in terms of predictions. But in actual data sets, it may prove tough to draw such distinctions because depressive symptoms are so highly correlated with components of Type D.

Some previous investigators of negative emotion have thrown up their hands in despair, complaining about the “crud factor” or “big mess” of intercorrelated measures of negative emotion ruining their ability to test their seemingly elegant ideas about supposedly distinctly different negative emotion variables. When one of the first Type D papers was published,   an insightful commentary complained that the concept was entering an already crowded field of negative emotion variables and asked whether we really needed another one.

In this study, the authors depressive symptoms with the self-report Hospital Anxiety and Depression Scale (HADS) The name of the scale suggests that it separately measures anxiety and depression. Who can argue with the authority of a scale’s name? But using a variety of simple and complicated statistical techniques like different variants of factor analysis, investigators have not been able to show consistently that the separates subscales for anxiety and depression actually measure something different from each other– or that the two scales should not be combined into a general measure of negative emotion/distress.

So talk about measuring “depressive symptoms” with the HADS is wrong, or at least inaccurate. But there are a lot of HADS data sets out there, and so it would be inconvenient to acknowledge what we said in the title of another Journal of Psychosomatic Research article,

The Hospital Anxiety and Depression Scale elivs citings(HADS) is dead, but like Elvis, there will still be citings.

Back to this article, if readers had gotten to see the basic correlations of depression with the other variables in Table 2, we might have seen how high the correlation of depression was with negative affectivity. This would have sent us off in a very different direction than the authors took.

To put my concerns in simple form,  data that are available to the authors but hidden from the readers’ view probably do not allow making the clean kind of distinctions that the authors would need to make if they are going to pursue their intended storyline.

Depressive symptoms are like the heal in rigged American wrestling matches, a foil for Type D personality..
Depressive symptoms are like the heal in rigged American wrestling matches, a foil for Type D personality.

But, uh, measures of depressive symptoms show up all

Type D personality is the face, intended to win against depressive symptoms.
Type D personality is the face, intended to win against depressive symptoms.

the time in studies of Type D personality. Think of such studies as if they are like the rigged American wrestling matches. Depressive symptoms are the heel (or rudo in lucha libre) that always shows up as a looking mean and threatening contender, but most always loses to the face, Type D personality. Read on and find out how supposedly head-to-head comparisons are rigged so this dependably happens.

The authors  eventually tell us that they assessed (1) asthma duration, (2) control, and (3) severity. But we were not allowed to examine whether any of these variables were related to the other variables in Table 2. So, we cannot see whether it is appropriate to consider them as “control variables” or more accurately, confounds.

There is good reason to doubt that these asthma variables are suitable “control variables” or candidates for a confounding variable in predicting adherence.

First, for asthma control to serve as a “control variable” we must assume that it is not an effect of adherence. If it is, it makes no sense to try to eliminate asthma control’s influence on adherence with statistics. It sure seems logical that if these teenagers adhere well to what they are supposed to do to deal with their asthma, asthma control will be better.

Simply put, if we can reasonably suspect that asthma control is a daughter of adherence, we cannot keep treating it as if it is the mother that needs to be controlled in order to figure out what is going on. So there is first a theoretical or simple logical objection to treating asthma control as a “control” variable.

Second, authors are not free to simply designate whatever variables they would like as control variables and throw them into multiple regression equations to control a confound. This is done all the time in the published literature, but it is WRONG!

Rather, authors are supposed to check first and determine if two conditions are met. The variables should be significantly related to the predictor variables. In the case of this study, asthma control should be shown to be associated with one or all of the negative emotion variables. But then the authors would also have to show that it was also related to subsequent adherence. If both conditions are not met, the variable should not be included as a control variable.

Reviewers should have insisted on seeing these associations among asthma duration, control, severity, and adherence. While the reviewers were at it, they should have required that the correlations be available to other readers, if the article is to be published.

We need to move on. I am already taxing readers’ patience with what is becoming a longread. But if I have really got you hooked into thinking about the appropriateness of controlling for particular confounds, you can digress to a wonderful slide show telling more.

So far, we have examined a table of basic correlations, not finding some things that we really need to decide what is going on here, but we seem to be getting into trouble. But multivariate analyses will be brought in to save this effort.

The magic of misapplied multivariate regression.

The authors deftly save their storyline and get a publishable paper with “significant” findings in two brief paragraphs

The decrease in adherence between T1 and T2 was predicted by categorical Type D personality (Table 3), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Adolescents with a Type D personality showed a larger decrease in adherence rates fromT1 to T2 than adolescents without a Type D personality.

And

The results of testing the dimensions NA and SI separately as well as their interaction showed that there was a main effect of NA on changes in adherence over time (Table 4), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Higher scores on NA at T1 predicted a stronger decrease in adherence over time. Neither SI nor the interaction between NA and SI predicted changes in adherence.

Wow! But before we congratulate the authors and join in the celebration we should note a few things. From now on in the article, they are going to be discussing their multivariate regressions, not the basically null findings obtained with the simple bivariate correlations. But these regression equations do not undo the basic findings with the bivariate correlations. Type D personality did not predict adherence, but it only appears to do so in the context of some arbitrariy and ill-chosen covariates. But now they can claim that Type  D won the match fair and square, without cheating.

But don’t get down on these authors. They probably even believe in their results. They have merely following the strong precedent of what almost everybody else seems do in the published literature. They did not get caught by the reviewers or editors of Journal of Psychosomatic Research.

Whatever happened to depressive symptoms as a contender for predicting adherence? It was not let into the ring until after Type D personality and its components had secured the match. These other variables got to do all the predicting they could do, and only then depressive symptoms were entered the ring. That is what happens when you have highly correlated variables and manipulate the match by picking one to go first.

And there is a second trick guaranteeing that Type D will win over depressive symptoms. Recall that to be called Type D personality, research subjects had to be high on negative affectivity and high on social inhibition. Scoring high on two (imperfectly reliable) measures of negative emotion usually bests scoring high on only (imperfectly reliable) one. But if the authors had used two measures of depressive symptoms, they could have had a more even match.

The big question: so what?

Type D personality is not so much a theory, as a tried-and-true method for getting flawed analyses published. Look at what the authors of this paper said about it in the introduction in their discussion. They really did not present a theory, but rather cited precedent and made some unsubstantiated speculations about why past results may have been obtained.

Any theory about Type D personality and adherence really does not make predictions with substantial clinical and public health implications. Think about it: if this study had worked out as the authors intended, what difference would it have made? Type D personality is supposedly a stable trait, and so the authors could not have proposed psychological interventions to change it. That has been done and does not work in other contexts.

What, then, could authors have proposed, other than that more research is needed? Should the mothers of these teenagers be warned that their adolescents had Type D personality and so might have trouble with their adherence? Why not just focus on the adherence problems, if they are actually there, and not get caught up in blaming the teens’ personality?

But Type D has been thung.

Because the authors have been saying in lots of articles that they have been studying Type D, it is tough to get heard saying “No, pal, you have studying statistical mischief. Type D does not exist except for statistical mischief.” Type D has been thung, and who can undo that?

Thing (v). to thing, thinging.   1. To create an object by defining a boundary around some portion of reality separating it from everything else and then labeling that portion of reality with a name.

One of the greatest human skills is the ability to thing. We are thinging beings. We thing all the time.

And

Yes, yes, you might think, but we are not really “thinging.” After all trees, branches and leaves already existed before we named them. We are not creating things we are just labeling things that already exist. Ahhh…but that is the question. Did the things that we named exist before they were named? Or more precisely, in what sense did they exist before they were named, and how did their existence change after they were named?

…And confused part-whole relationships become science and publishable.

Once we have convincingly thung Type D personality, we can fool ourselves and convince others about there being a sharp distinction with the similarly thung “depressive symptoms.”

Boundaries between concepts are real because we make them so, just like between Canada and the United States, even if particular items are arbitrarily assigned to one or the other questionnaire. Without our thinging, we do not as easily forget the various items come from the same “crud factor,” “big mess,” and could have been lumped or split in other ways.

Part-whole relationships become entities interacting with entities in the most sciencey and publishable ways. See for instance

Romppel, et al. (2012). Type D personality and persistence of depressive symptoms in a German cohort of cardiac patients. Journal of Affective Disorders, 136(3), 1183-1187.

Which compares the effectiveness of Type D as a screening tool compared to established measures of depressive symptoms measured with the (ugh) HADS for predicting subsequent HADS depression.

Lo and behold, Type D personality works and we have a screening measure on our hands! Aside from the other advantages that I noted for Type D as a predictor, negative affectivity items going into the ype D categorization are phrased as if they refer to enduring characteristics whereas items on the HADS are phrased to refer to the last week.

Let us get out of the mesmerizing realm of psychological assessment. Suppose we ask a question about whether someone ate meatballs last week or whether they generally eat meatballs. Which would question you guess better predicts meatball consumption over the next year?

And then there is

Michal,  et al. (2011). Type D personality is independently associated with major psychosocial stressors and increased health care utilization in the general population. Journal of Afective Disorders, 134(1), 396-403.

Which finds in a sample of 2495 subjects that

Individuals with Type D had an increased risk for clinically significant depression, panic disorder, somatization and alcohol abuse. After adjustment for these mental disorders Type D was still robustly associated with all major psychosocial stressors. The strongest associations emerged for feelings of social isolation and for traumatic events. After comprehensive adjustment Type D still remained associated with increased help seeking behavior and utilization of health care, especially of mental health care.

The main limitation is the reliance on self-report measures and the lack of information about the medical history and clinical diagnosis of the participants.

Yup, relied on self-report questionnaires in multivariate analyses, not interview-based diagnosis and the measure of “depression” or “depressive symptoms” asked about the last 2 weeks.

2-15-13-Zombie-run_full_600Keeping zombie ideas awalkin’

How did the study of negative emotion and adherence get published with basically null findings? With chutzpah and by the authors following the formulaic D personality  strategy for getting published. This study did not really obtain significant findings, but the availability of the precedent of  many studies of type D personality  to support claims they achieved a conceptual replication, even if not an empirical one. And these claims were very likely evaluated by members of the type D community making similar claims. In his commentary, Ioannidis pointed to how null Type D findings are gussied up with “approached significance” or, better, was “independently related to blah, blah, when x,y, and z are controlled.”

Strong precedents are often are confused with validity, and the availability of past claims relaxes the standards for making subsequent claims.

The authors were only doing what authors try to do, their damnedest to get their article published. Maybe the reviewers are from the Type D community and can cite the authority of hundreds of studies were only doing what the community tries to do– keep the cheering going for the power of Type D personality and adding another study to the hundreds. But where were the editors of Journal of Psychosomatic Research?

Just because the journal published our paper, for which we remain grateful, I do not assume that they will require authors who submit new papers to agree with us. But you would think, if the editors are committed to the advancement of science, they would request that authors of manuscripts at least relate their findings to the existing conversation, particularly in the Journal of Psychosomatic Research. Authors should dispute our paper before going about their business. If it does not happen in this journal, how can we expect to happen elsewhere?

 

Advice to Junior Academics on How to Get Involved With Twitter

tweet imagesI’m not a good role model for junior academics whom I encourage to get involved with Twitter. I have been experimenting turning exchanges on Twitter or my Facebook wall into blog posts, which I increasingly turn into articles. When my articles are newly published, I promote them with the full range of social media. All this takes considerable commitment of time.

It is too early to evaluate whether this is really worth it, but so far I find it quite satisfying. Yet, most novices would consider it an unacceptable investment of their time to try to follow what I do. Many are concerned about social media consuming too much time with uncertain payoffs.

So, I turned to a more junior colleague to offer them advice. She has been quite successful getting involved in Twitter, obtaining its rewards, and not letting it consume the rest of her life. I gave her a series of questions to answer, and then invited her to provide some brief tips and tricks for junior people. Looking over her responses, I’m impressed how solid and useful the advice is.

gozdeGozde Ozakinci, PhD, is a lecturer in health psychology at the University of St Andrews, Scotland. She obtained her BA in Psychology at Bogazici University, Istanbul, her M.Sc in Health Psychology at the University College London, and her PhD at Rutgers-The State University of New Jersey, USA. Her main research interests are in emotional regulation and health behaviour change. She works with diverse group of clinical and non-clinical populations from cancer patients to medical students. She also teaches behavioural sciences to undergraduate medical students and health psychology topics to M.Sc health psychology students. When not on Twitter, she can be found doing DIY around the house, consuming coffee (preferably Turkish) and enjoying walks in Scotland (preferably not in rain). More information about her research can be found here. Twitter: @gozde786

So, how did you get past the idea that Twitter is a waste of time?

I was reluctant to get involved with Twitter, thinking it was the same as Facebook which I use mostly to keep in touch with family and friends. I thought I didn’t need another potential time-sucker social media outlet. But I quickly realized Twitter is very different – something I can get much out of professionally. I dip in and out during the day and each time I have a nugget of information that I find useful. I feel that with Twitter, my academic world expanded to include many colleagues I wouldn’t otherwise meet. I am now able to keep my finger on the academic pulse better. The information shared on Twitter is so much more current than you would find on journals or conferences.

thinker-twitterFor instance, academics I follow post their latest articles on Twitter that would otherwise probably take me months to learn about . I can then ask questions of the authors themselves and chat with them. I think we all love to talk about our work! The blog posts I find through Twitter make me feel connected to my colleagues, current issues that face us, and take part in conversations that matter to me from evaluating evidence to more general issues in higher education.

How did you take the plunge and get started on Twitter?

I got hooked on Twitter right away, when I realised that I could get access to information that I would have heard either too late or sometimes never. It was like suddenly my academic daily life became a lot bigger. I could interact with many more colleagues from all over the world on a daily basis, rather than just the people in the office or collaborators over email/meetings.

Importantly, I didn’t get discouraged when people didn’t follow me back. If I really wanted people who didn’t follow me back to comment or pay attention to something I wanted to have a conversation on, then I just added them to my tweet. The day that Clare Gerada, the past president of Royal College of General Practitioners followed me back and commented me that we had common research interests was a good day!

The other thing that helped me is that I have broad academic interests so I follow people from different backgrounds and tweet about various topics: cancer to politics. So, I’m not restricted to my own area at all. That means that many people can find something of interest in what I put out there. I think this is important.

Did you start with a clear goal?

I guess in the beginning, I didn’t have clear goals but they developed over time in a natural way:

  1. Wanting to be a part of a conversation on academic topics rather than watching people I admire from sidelines.
  2. Being a source of rigorous evidence on a variety of topics and encouraging discussion (not sure how much I manage the discussion part).
  3. Being a source of encouragement/support for early career scientists (I even got invited to a talk at another university on my health psychology career because of colleagues I met on Twitter!).

How did you get your initial selection of people to follow? 

I started checking out who followed who. Like I checked out your list! I was surprised to see how many people that I wanted to get to know academically were on Twitter. Some of them were leaders in their field. I also started following editors of journals, journals themselves, bloggers in science communication in general (Dean Burnett, Suzi Gage, etc..). I also found a wonderful group of women scientists who blogged and tweeted: Athene Donald, Dorothy Bishop and Uta Frith for instance. They became somewhat role models to me. They were good scientists who cared about women in science, not because we were women but because we did what we did well. That was very empowering to me. They also found the time to tweet and write blog posts, showing me what an important tool we have through the modern communication tools.

I also follow major source of news such as NY Times, National Public Radio and Slate that I feel many of my followers don’t follow. So if I tweet something from there, it attracts their attention as that’s a source they wouldn’t normally hear from.

Was there some trial and error for you? Moments of doubt whether it was worth it?

It was VERY slow the first 6 months to get followers and at times for no apparent reason that I could fathom, there would be periods of losing 4-5 followers in a row and stagnation. I still get that and I can’t figure out why.

I found that daily engagement with Twitter is necessary. It’s not difficult for me as it makes me feel connected to the wider academic world. But you can’t take a holiday from Twitter for a month and hope that people will still be interested in following you or you’ll find new followers upon your return.

You might ask ‘why should I care about having followers? Isn’t it all a bit vain?’. Well, I see it as having something to say and sharing it with others. I tried not to get obsessed about number of followers in the beginning (although it was hard!) as I soon realized that with daily tweets/conversations and retweets, people started to follow me anyway. But I guess, the message would be ‘don’t give up and keep tweeting and following people you’re interested in’.

Can you provide junior persons some tips and tricks for getting involved with Twitter?

Don’t just get a twitter account. USE IT! You have to engage with it before it starts to pay off. Don’t worry about how many people follow you. It takes time to establish a critical mass of followers and also a certain level of engagement with other people. Don’t give up. And don’t be shy. Think about Twitter as another dissemination tool. We are in science because we do something valuable and we need to share that knowledge.

You don’t know who to follow? Everybody knows someone on Twitter, so search for them. Once you found them start looking at their followers.

Start following those who interest you. And don’t be afraid of unfollowing them if you don’t find their tweets interesting. And don’t be discouraged if they don’t follow you back. I follow almost double the number of people I have as followers. This doesn’t bother me as I get fed by their tweets.

Initiate a conversation. If you think you have something interesting to say to the person you follow but they don’t follow you back, just tag their handle and you may get them engage in a conversation with you.

Keep in mind that social media has been rightfully called a great equalizer. So it doesn’t matter at what stage of your career you’re at. You can have a conversation with people you admire and also with people at the other end of the world whom you’ve never met.

TweetHashtagYou find something interesting that you want to share, make sure you use the hashtag associated with it. Add your own comment to the retweets.  I used to be shy to do that but it adds another dimension to the communication you want to initiate rather than just a simple retweet.

Tweet at conferences using the conference hashtag. It’s a great way of meeting people as they will pick up your tweets and you theirs. It brings an engagement with the conference that I found very refreshing.

Start reading the blogs of people who advertise theirs on Twitter. This is as good strategy for you to get to a researcher’s thinking at the time.

Personal versus professional use. I use Twitter mainly for keeping on top of my field but I also tweet about my personal interests (about 20% of the time). It’s a balance you have to find. But people usually don’t want to hear all your inane thoughts.twitter-follow-me-icon

Follow Gozde @gozde786 and Jim @CoyneoftheRealm on Twitter. Think about our differences in strategy. Check out differences in whom we follow and who follow us. Freely take suggestions for whom you should follow from our lists. Compare our tweets. What differences  are apparent in what we are trying to accomplish? What is best for you? Join in favoring or replying to our tweets. Feel free to leave comments about this blog and your experience with tweeter below.

Junior researchers face a choice: a high or low road to success?

November 8, 2013.

 

This is a  presentation from the International Psycho-Oncology Society Conference in Rotterdam, November 8, 2013 invited by the Early Career Professionals Special Interest Group.* I am grateful for such a relaxed opportunity to speak my mind about some issues that junior researchers in psycho-oncology, like those in many fields, are facing. Senior members of the field have failed you. We need you to undo some of the damage that is being done.

As you enter the field, recognize that you are different from cohorts of researchers who have come before you. On the one hand, you are more methodologically and statistically sophisticated. You are also more digitally savvy, although I am sometimes bewildered by how little you as yet take advantage of the resources of the Internet and social media.

On the other hand, you face new accountability and pressures in terms of the monitoring of the impact factor of journals in which you publish, as well as you having to adhere to reporting standards and preregister your clinical trials before you even run the first patient. Researches who came before you had it easier in these respects.

I’ve done this kind of talk before and I recognize there is an expected obsolescence to what I present. I recall way back when I was junior person in the field, senior faculty warned me not to start using email because it was a total waste of time and inferior to communicating by snail mail. I am sure that much of the advice being offered to you is just as valuable and soon to be obsolete. And, similarly, many of the tools and strategies you will need to acquire first seem a waste of time.

Five years ago, I would have encouraged you to get more comfortable communicating about your work and even self-promoting. I would have suggested you use the now obsolete means, listserves to do so.  I would have encouraged you to challenge the gross inadequacies of peer review by writing letters to the editor, which also have the advantage of cultivating critical skills better than journal clubs do. Both listserves and letters to the editor are now obsolete, but the ideas behind these recommendations still hold, maybe even more. You just have to pursue these goals differently and certainly with different tools.

As for myself, I’m undergone a lot of changes in the past five years. Some of my best recent papers have been written with authors gathered from the Internet, often without me first meeting all of them. I was honored that one of these papers won the Cochrane Collaboration’s Bill Silverman Prize, which I guess makes my co-authors and myself certified disruptive innovators.

I now tweet, blog, use Facebook, and champion open access publishing. Later in this talk, I will provide the exciting details of the launch of a trial of PubMed Commons. I had been afraid of having to observe an embargo on discussing this. But fortunately the shutdown of the US federal government ended, and PubMed Commons was launched just in time for me to talk about it in this presentation.

Tweeting and blogging are not distractions or alternatives to writing peer-reviewed papers, they can become the means of doing so. Tweets may grow into blog posts, then a series of blog posts, and eventually even a peer-reviewed journal article. No guarantees, but looking back, that’s how a number of my peer-reviewed papers have developed.

On the other hand, the process can work in reverse. Blogging and tweeting about recent and forthcoming papers is a very important part of how to be a scholar and how to promote yourself in the current digital moment.

Here are some examples of me self-consciously and experimentally promoting recent papers with blogging.

My first bit of advice to junior investigators is figure out where such action is occurring. The form and format it takes is constantly shifting. Observe, experiment, and get involved, consistent with your own comfort level. Remain lurking if you’d like, reading blogs, and occasionally expressing approval clicking “like” or “favorite” until you are ready for get more involved.

I invite all of you to join me in participating in disruptive innovation. On the other hand, I realize this is not for everyone, and so I will spell out alternative low road.

The state of the field being what it is, offers clear opportunities for you to conform, and play the game according to the rules that work. Many of you will do so and some of you can rise to the top of a mediocrity.

My second bit of advice is that if everyone likes your work, you can be certain that that you are not doing anything important. That sage advice I got from Andrew Oswald.

The behavioral and social sciences are a mess. We have four or five times the rate of positive findings relative to some of the hard sciences, and I don’t think is because our theories and methods are more advanced.

The field of psycho-oncology is particularly a mess, as seen in rampant confirmation bias and many of our widely acclaimed papers presenting evidence that is interpreted in ways that are exaggerated or outright false. The bulk of intervention studies in psycho-oncology are underpowered and the flaws in their designs provide a high risk of bias. Studies consistently obtain significant results at an impressive, but statistically improbable rate.

At the heart of the special problems of the field is the consistent subordination of a commitment to evidence-based science to the vested interests of those who want to promote and secure opportunities for the clinical services of their professions, regardless of what the evidence suggests. This is most notably seen in the relentless promotion of screening for distress in the absence of evidence that actually improves patient outcomes.

Ultimately, data will provide the basis for deciding whether screening is a cost-effectivedistress way of improvements and whether it represents the best use of scarce resources. I think that the evidence will be negative. But I am more worried about the lasting effects on the credibility and integrity of a field in which editing and peer review have been so distorted by the felt need to demonstrate that screening has benefit.

Many celebrated findings in the field of psycho-oncology are really null findings, if you carefully look at them.

These include

  • Spiegel, D., Kraemer, H., Bloom, J., & Gottheil, E. (1989). Effect of psychosocial treatment on survival of patients with metastatic breast cancer. The Lancet, 334(8668), 888-891.
  • Fawzy, F. I., Fawzy, N. W., Hyun, C. S., Elashoff, R., Guthrie, D., Fahey, J. L., & Morton, D. L. (1993). Malignant melanoma: effects of an early structured psychiatric intervention, coping, and affective state on recurrence and survival 6 years later. Archives of General Psychiatry, 50(9), 681.
  • Antoni, M. H., Lehman, J. M., Klibourn, K. M., Boyers, A. E., Culver, J. L., Alferi, S. M., … & Carver, C. S. (2001). Cognitive-behavioral stress management intervention decreases the prevalence of depression and enhances benefit finding among women under treatment for early-stage breast cancer. Health Psychology, 20(1), 20.
  • Andersen, B. L., Yang, H. C., Farrar, W. B., Golden‐Kreutz, D. M., Emery, C. F., Thornton, L. M., … & Carson, W. E. (2008). Psychologic intervention improves survival for breast cancer patients. Cancer, 113(12), 3450-3458.

There are important negative trials of supportive expressive therapy and expressive writing being kept hidden in file drawers. Just search clinicaltrials.gov.

Zombie ideas and tooth fairy science still hold sway in the literature and win mediatooth fairy attention. I have in mind the notion that psychological interventions can extend the lives of cancer patients and exaggerated ideas about the mind holding sway over the body and defeating cancer.

You are entering a system of publication and awards that is not working fairly. Papers appear in ostensibly peer reviewed journals without adequate review. There’s rampant cronyism in opportunities to publish and widespread sweetheart deals as to whether authors have to address concerns raised by reviewers. There is sandbagging of critics and negative findings. It is an embarrassment to the field that authors of flawed ideas are able to suppress commentary on their work and censor criticism.

The respected, high impact Journal of Clinical Oncology is particularly bad when it comes to psychosocial studies. It shows consistently flawed peer-review, the influence of sloppy editorial oversight, and serious restrictions on commenting on its miscarriage of the review process. Feeble post publication peer review is continually handicapped and silenced. I believe that journal has an ethical responsibility to identify to its readers which articles have evaded peer review and to announce that authors of published papers can exercise veto over any criticism or negative commentary.

If you want to take the low road, you have lots of opportunities to succeed.

  • Pick a trendy topic.
  • Don’t be critical of the dominant views, even if you see through the hype and hokum.
  • Use biological measures, particularly ones that can be derived from saliva, even if they have no or unknown clinical significance.
  • Report positive findings, even if you have to spin and torture and suppress data.
  • No matter what your results, in your discussion section claim they confirm the dominant view and reaffirm that view, even if it is irrelevant or contradicted by your findings.

When you design studies, have lots of endpoints that you can always ignore later. Pick the one to report that makes your study look best. A lot of the positive findings in literature cannot really be replicated, but you can always appear to do so by pushing aside the results of primary analyses, and favor unplanned secondary and subgroup analyses. If necessary, construct some post hoc new outcome measures you didn’t even envision when you originally designed your study. Prominent examples of these strategies can readily be found in the published literature.

Many of you will do all this, wittingly or unwittingly following the advice and example of your advisors, but you can become more proficient in pursuing this low road.

Alternatively, I invite at least some of you to take the high road and join me and participate in disruptive innovation. Again, it’s not for everyone.

Blog, and if you’re not ready to consistently post your own, join in a group blog. I highlymental ellf1 recommend groups like Mental Elf, where you can take turns offering critical commentary on recently published papers.

If you are not ready to blog, you can tweet. You can selectively follow those on Twitter who show they can offer you both fresh new ideas with which you would not otherwise come into contact, as well as a filtering out of much that is hype, hokum and sheer nonsense.

pubmedcommonsNow I can announce the PubMed Commons revolution is upon us. Here are some links that explain what it is and how it works.

As long as you have a paper published in PubMed, even a letter to the editor, you can secure an invitation to comment on any article that has appeared in PubMed. You can have others “like” or add a response to your comment, and you to theirs as part of a continuing process of post publication peer review. With PubMed Commons, we’re taking post publication peer review out of the hands of editors who so often have aggressively and vainly taken control of a process that should be left with readers.

I’m asking you to join with me in pursuing a larger goal of creating a literature that is an honest and reliable guide for other researchers, clinicians, patients, the media, and policymakers as to the best evidence. Let’s work together to create a system where review process is transparent and persists for the useful life of a work. For this last point, I give thanks to Michael Eisen, cofounder of PLOS one and disruptive innovator extraordinaire.

*Special thanks to

Claire Wakefield, Michelle Peate, University of NSW,Sydney Australia

Kirsten Douma and Inge Henselmans, Academic Medical Center Amsterdam, The Netherlands

Wendy Lichtenthal, Memorial Sloan-Kettering Cancer Center, New York City

(CC-BY-NC-SA)