I get emails.
Many of those emails are for boner pills and Hot Singles who live in the city where my VPN is set up.
Others are more interesting.
They come from other scientists or journalists who wish to know if a paper (attached) is fraudulent or problematic, and if a rack of criticisms of that paper (also attached) are fair.
The problem with making this assessment is straightforward: all errors are not created equal.
My response to these emails are always similar, so to save myself time, I have edited two of the most recent responses into a single public document.
Given the recent fuss at Harvard Business School, I have edited this into a document that you, too, can enjoy. See if you can follow along through the recent fuss with the different distinctions outlined below.
Accept my calumny, you rats.
*************************************************
"So, is this fraud or what?"
When I’m sent a paper like this, there are three domains of evidence we must step through. I think of them as buckets that fill, overflow, and then fill another, and so on. The potential contents of the buckets is, of course, bullshit.
Here is the bucket sequence:
* THE ANOMALY BUCKET – obviously, whether or not there are anomalous elements to the data presented. Options are many. Some pieces may be added up incorrectly. Some statistics are impossible. Some data is collected according to an impossible protocol. And so it goes. Into this, alongside the accuracy of the paper itself we can include circumstances, its provided data, its implications, etc. This leads to…
* THE MANIPULATION BUCKET – having filled the Anomaly bucket, we must now look for evidence, direct or indirect, of whether or not these anomalies show any evidence of intentionality. Sometimes this is easy - for instance, impossible data in Excel is simply anomalous, but leaving the formulas in which generate the impossible data is obviously manipulated. Don’t laugh, this has happened. It’s usually not easy, though. This leads to…
* THE CIRCUMSTANCE BUCKET – if the Manipulation bucket has overflowed, the final bucket is determining why. This is a very subjective bucket, because we must speculate about intent - if the anomalies exist and show evidence of manipulation, why is it the way it is? What can we glean about the motivations of the bullshitter?
As might be implied, these occur in order.
Anomalies are common, and many, maybe even most, are simply some combination of screw-ups, overconfidence, goofs, etc.
Let’s say that in a paper with four different labs and forty images, the wrong image is in the wrong box. Let’s say Figure 17C was actually a repeat of Figure 4A. Or it was a repeat of Figure 12 of the same workgroup’s paper 12 months ago.
These are inevitable. Scientists have poor data hygiene in general. However, there is a gulf between ‘just an anomaly’ and ‘manipulation causing an anomaly’, and often we cannot cross it. Is the p-value described as p=0.01 in the abstract and later p=0.10 in the text because the researcher is sloppy? For instance, say Figure 17C is actually a zoomed-in, flipped, and horizontally distorted section of Figure 4A. This is very unlikely to happen by accident, and vanishingly unlikely to happen multiple times by accident. To continue the example, let’s say we also find the same in Figures 17A, 17B *as well as* 17C.
Having established this, we have the harder task of discerning circumstances. This requires us to read the tea leaves to some extent, but having done so successfully, we often get information that tells us why this manipulation is the way it is. In my hypothetical example, all the figures in panel 17 are a problem, and I chose this because the later figures in an experiment are noticeably more likely to contain anomalies as they are correspondingly more likely to represent extra work that a particularly annoying Reviewer #2 asked for. “Paper includes controls for A, B, C, D, E, and F… but what about G?” Condition G is fabricated, appended to the end of the existing paper, and is included. The goal is, of course, to shut Reviewer #2 up and get the paper published.
I rely on this far more than the police procedural ‘motive, means, and opportunity’ for a ‘crime’.
Because everyone has:
the motive (publishing lots of science for fame and funding),
the means (manipulating the high-level data and images typically presented in a paper has been trivial since Babbage was complaining about scientific bullshit in the 19th C; manipulating whole data sets is much more challenging),
and the opportunity (submissions are open, and anyone can technically publish anything anywhere)…
…at every point, and anyone pretending they don’t is a goddamn simpleton.
Anyone can lie about anything, anywhere, and there are good reasons to do so. This Magnum PI silliness is a bad frame for investigating scientific integrity.
Let’s walk through all three buckets and the nature of the decisions we have to make in them in an example.
REDACTED - TWO PAPERS
Anomalies – In paper B, there are points where data should be vertically organized in pairs, because both measurements are at identical times. Vector graphics do not struggle to plot things on the same x-coordinates! So the included elements of REDACTED have a strange variety of offsets and missing data that is very hard to explain. However, graphics programs may not be perfect, and scientific authors frequently do stupid and irreproducible things to make figures, like try to clean the data by hand in an additional step between analysis and graphing, or take a nice high resolution figure and compress it without mercy until their fine details are uninterpretable.
Manipulation – this is harder, but repeated elements in REDACTED as alleged here (outlined in the final points of the rejoinder) are the classic hallmark of recklessness at a minimum, and outright dishonesty at worst. Repeated elements are present here, but we – that is, you and me, very much non-experts in [the specific scientific subfield] here– do not know how similar these elements should be at a baseline. There is overlap between graphical elements, yes, but how much SHOULD there be in real datasets? I have no idea. In cyclical data, this is an analytical question which you could answer with comparative data from other papers or simulation. As per the above, it is often hard to establish this.
Circumstances – this is much easier to answer. If both of the above are true, I am 99% sure this is an ‘end run’. An end run is a particular form of scientific publication malfeasance. A quick timeline makes them easy to understand.
Paper A, which is real and important, is submitted to Fancy Journal on Day 0. The paper is then preprinted on Day 20.
Workgroup B sees this preprint and thinks ‘I wish we published that’. Then they say, ‘screw it, we will publish that!’
Then, they write Paper B, which is either reckless, terrible, or fabricated. It is submitted to Some Other Journal on Day 68, and because Workgroup B has chosen a journal where they have a good editorial relationship, Paper B is accepted quite quickly (i.e. by Day 160) and published on Day 169.
Paper A eventually catches up – accepted on Day 211, and published on Day 246. It was submitted first, but is published second.
In isolation from any kind of additional fuckery, two papers being published on the same result is quite common - ‘what should we do next?’ in a scientific field is suggested similarly to multiple parties by a common literature. An end run is taking advantage of this commonality. There are three potential ways for it to be dishonest:
(a) as it appears to be in this case, the second competing paper is bad.
(b) the manuscript can be deliberately delayed or strategically rejected in peer review to make the strategy viable
(c) the publication timeline of the competing paper can be artificially lowered by manipulation of peer review.
That’s an ‘end run’, a term taken from American football, where an entire defensive line is occupied so the ball carrier on offense just runs around the whole mess and scores a touchdown. This is the only thing I know about football, which I am willing to learn about only inasmuch as it relates to research integrity.
What needs to happen, of course, is straightforward: the raw data should be released immediately. There are no relevant privacy concerns, access to the data in this journal is very much a condition of publication, and the allegations are serious. At a minimum, it is unlikely there will be a happy match between some of the anomalous graphical elements and the data underneath it. What that means in context is still unclear, so the resolution should be interesting. Suffice to say, in fields with heavily contested results like this, access to the raw data should be absolutely mandatory.
Good luck. You’ll need it.
We need more peer reviewers and editors willing to apply this kind of scrutiny. (But then I wonder if the journals would go broke through lack of papers to publish.) The other major issue is trial design itself. Many times the tested intervention isn’t compared with an informative control and/or unwarranted conclusions are drawn. A great example was a diet soda study which concluded that diet drinks make people fat. Another declared that CBT was an effective treatment, despite noting that half the participants had nil beneficial outcomes, and the remainder were insignificant or indeterminate. This stuff is unhinged but still gets published.