Failure to replicate as an opportunity for learning

I am currently reading Kuhn’s “The Structure of Scientific Revolutions” and often find myself travelling back in time to a family dinner at home.

Many years ago, while still an undergrad in Argentina, I returned home from a tiring day in the lab complaining that the experiment ‘hadn’t worked’. My dad looked at me dismissively and said:

“The experiment worked, you just don’t know what variables you didn’t control”.

My dad is not a scientist, but an avid reader of popular science books. He had attended a couple of years of chemistry before leaving University to start his own business, so it was hard not to attack him with my mashed potatoes. But this was probably the most important lesson I learned in my entire science career:

Failure to replicate exposes those unknown/unthought-of variables that determine the result of a given experiment.

As a PhD student later on, I was expected to replicate previous findings before moving on with my own work that built on those findings. In many cases I replicated successfully, in other cases I didn’t. I had to even replicate work from within the lab. When I failed, we uncovered nuances about how each of us were actually ‘doing’ the work. In some cases, replicability came down in those nuances that were not written down in the lab protocols and recipes. But in all cases we learned something from those failures.

I expect the same from myself and my students, though they (and many of my colleagues) find that re-doing what has been done is a waste of time. I don’t. And here is why:

Let’s say that someone described the expression pattern of a protein in the brain of species A and I want to see if the expression pattern is the same in the brain of species B. I follow the same protocol, and find a difference between the two species. Now, how do I decide whether that difference is a species-specific difference or something else that I am doing that is different from what the original authors did and that I did not account for? Well, the only way of knowing is by trying to replicate the original findings in species A. If I can replicate, then I can more confidently argue that it is a species-specific difference (at least with respect to that specific protocol). If I can’t then I have further defined the boundaries within which those original findings are valid. Win-Win.

by Sam UL cc-by-nc-sa on flickr

This brings up another reaction to the results of experiments: How hard do we work at trying to get an experiment to ‘work’ when we expect it won’t? For example: if I expect (based on the published literature) that a protein is not being expressed in a particular brain region, I may quite quickly accept a negative result. But if I did not have this pre-knowledge, I might go through a lot of different attempts before I am convinced it is not there. So the readiness with which we accept or not a negative or positive result is influenced by that pre-knowledge. But how deeply do we go into that published literature to examine how well justified that “pre-kowledge” is? As I go through the literature I often find manuscripts that make claims where I would like to see how the inter-lab or inter-individual variability has been accounted for, or at least considered, or what it took to accept a positive or negative result.

Every now and then I find someone in my lab that can’t replicate my own results. I welcome that. In the majority of the cases, we can easily identify what the variable is – in other we uncover something that we had not thought of that may be influencing our results. After all, one can only control the variables that one thinks of a priori. How is one to control for variables one does not think of? Well, those will become obvious when someone fails to replicate.

So why are scientists so reactive to these failures to replicate? After all, it is quite likely that the group failing to replicate also did not think of those variables until they got their results. A few months ago PLOS, FigShare and Science Exchange launched the Reproducibility Initiative that, as they say will help correct the literature out there, but I think also define better the conditions that make an experiment work one or another way.

So, back to my dad. All experiments work, even those that give us an unexpected result. What I learned from dad is that being a good scientist is not abut dismissing “bad experiments” and discarding the results, but more about looking deeper into what variables might have led to a different result. In many cases, it might be a bad chemical batch – in others it might uncover a crucial variable that defines the boundaries of validity of a result.

I call that progress.

Kuhn, T. S. (1962). The structure of scientific revoutions. Chicago: The University of Chicago Press.
*** Update 16/11/12: I noticed that I had mistakenly used an image with full copyright, despite having narrowed my original search to CC-licenced content. I apologise for this oversight. I have now removed the original image and replaced it with one with a suitable licence.


5 thoughts on “Failure to replicate as an opportunity for learning”

  1. Hi Fabiana. What you call “pre-knowledge” is pretty close to what many just call “hypotheses”. Though hypotheses also imply a certain amount of logical inference, not just knowledge itself, and a lot of neuro journals don’t seem to care that much whether you REALLY DID hypothesize your findings.

    Anyway, as you say, we (at least in psychology and neuroscience) tend to stop looking when hypotheses are borne out. I once saw a cartoon in which the supervisor says to the RA or grad student: “Quick! Let’s publish it before it changes again!” And, as you say, if our predictions are not borne out, we keep looking. We try different types of analyses, hoping that SOMETHING will work.

    We all know this violates the Bonferroni principle: taking many tries increases the risk of false positives — yet most of us do it anyway, at least to some degree, mainly because we want to publish our findings. You point to a very important alternative reason for avoiding these “fishing” trips: because they tend to paint over important differences that were never predicted or could not have been predicted in the first place. Seems that in the physical sciences, most new knowledge comes as a surprise. Who would have predicted quantum mechanics?!

    Now if only one could publish non-replication results as easily in psychology and neuroscience journals! That would make for quite a difference in how we go about our analyses.


    1. Hi Marc, thanks for the comment –
      I was thinking of preknowledge as those “facts we hold to be” whatever those are: Neurons in region A of the brain express glutamate receptors. Not necessarily a hypothesis.

      ” if our predictions are not borne out, we keep looking. We try different types of analyses, hoping that SOMETHING will work.”

      I think this is the thing that I learned – when I don’t get what I expected, things still worked. The challenge is to figure out what that pesky variable is. So for me it is not about trying the same thing over and over (though yes, repeat for enough confidence for a negative result) but to try to search for that pesky variable (if it is justifialbe). But agree, there needs to be more room for the publication of replication and negative results.

      I can also think of several examples of serendipitous findings in Neuroscience, though not at the level of quantum mechanics (I just submitted one for publication last week) and also about findings that got buried because there was no technology at the time to make any sense of them. Few things as fun as browsing through late 1800 and early 1900 literature to find the topic for a new grant!


  2. interestingly, a student I co-supervise came to my office today, puzzled because her data was quite different from what she expected from what was published. We traced back her steps – her numbers were really odd to me – outside of what I would expect for a biological system. Anyway, it turns out there is a glitch in her software (commercial) which had not come up when she did her initial setup/calibrarion (and it is a bit of an odd one too). Makes me wonder how widespread this glitch is, and whether other researchers have (or have not!) noticed it before they published!


    1. But that anecdote shows exactly why editors prefer to overlook non-hypothesized findings, doesn’t it? I mean, they have a point. Who really cares if the software has a glitch. Yes, it will be a pain, and people will need to reanalyze things. But we’re not going to find anything astonishing that way.

      Error, sloppiness, mess, randomness…..these are often why our hypotheses (or foreknowledge) don’t get borne out in the data. But it would be damn difficult to identify criteria for really meaningful non-hypothesized findings!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s