How healthy is the scientific enterprise?
A challenging question. Even simple questions about scientific integrity have complicated answers.
The answer is, variously:
less healthy than it should be
much sicker than people naively assume, and some of those people are scientists
substantially sicker than the narrative of Continuous Progress would assume
nowhere near as sick as to require the ‘Etch-A-Sketch scenario’ (which amounts to ‘turn the whole enterprise over, shake it until the picture disappears, and then draw it again’)
What are the symptoms of the sickness?
At this point, I feel bleak at the prospect of typing them out again.
The problems with overpublication, ‘publish or perish’ culture, abusive lab environments, analytical flexibility, p-hacking, clinical trial registration games, grant front-running, intellectual capture, nonsense journals, fake journals, peer review manipulation, moral entrepreneurship, etc. precede the present discussions of paper mills and active falsification/fabrication cases.
These are not new problems. Most of them are well outlined before 1970. We simply mutated them through digital life until they actively became worse.
But they are also systemic. Example: it is hard to fund research according to strict parameters for quality, because any marker you could define for scientific success (‘X papers in Y journal’ ‘Z publications per year’ etc.) breaks down immediately under the weight of Goodhart’s Law. We have too much competition for too little money. You cannot wish this away, nor immediately change any kind of policy to simplify the adjudicating of who gets the cash.
But also, you cannot look for another source of funding, a separate system. The NIH budget is about 30B p.a. The NSF, 10B p.a. Pick your own country and revel in the amount of cash involved. There is no-one alive that can make a realistic addition to the broader supply of funds, nor would they if they existed. Obviously private philanthropy very much exists, and is often reasonably well focused and generous in its disbursements. But it is not even the same game as hard government funding.
And, if we are sick, and we have systemic problems, what should we do about them?
I have tried at least four times in my memory to write out and codify how I would start an institute to combat these problems. Specifically, a formal organization under a 501c3 structure designed to address the problem.
To my lasting irritation, I have never been happy with any of those documents. They fail to capture the scale, the urgency, the extreme wooliness of the whole bastard thing.
There are two parts to doing this.
The easy part.
The easy part is the broad mission: making sure that the world understands that the scientific enterprises has weaknesses by uncovering and publicizing the Bad Things they cause.
There is still years of this to do, and an astonishing array of nonsense to uncover. The collective urgency around the problem is still far too small, and it will be too small for a while yet.
As a consequence, this means any organization responsible for uncovering a lot of fraud, waste, and malfeasance, then flaying it open in the sun like Prometheus, to be pecked apart by birds, is easy to justify.
How this might be achieved — the methods and techniques we use for error detection — is becomingly pleasingly well-rounded, due to efforts by me (a small bit) and other people (a bigger bit) who would also not use a metaphor about Prometheus because they are nicer and presumably less histrionic. This is valuable, but there is still so very much more to do. The scientific needle has moved, the public needle has not, and the government needle is rusted shut onto the dial at ‘everything is fine’.
The problem with this ‘investigatory’ scientific work is that it is a hostile environment for scientists to work in - you risk getting marginalized, sued, and stonewalled. It is not a good career move, so much of the work is done by ‘insider-outsiders’ - people who understand the problem well enough to work on it, but cannot be punished by a traditional lack of opportunities. And there are enough of us to make good inroads, but there should be many more.
That’s the easy part.
The hard part.
The hard part is: choosing what to concentrate on, a focus that makes sense. This is the point I have always struggled to make land when writing a proposal to people who might fund such a thing.
It is hard to say to well-meaning donors or interested organizations ‘let’s get funding to go out into the world and cause Manifest Scientific Trouble in a diffuse kind of way’.
It is psychologically unsatisfying, it is unfocused, it is hard to cost.
It would work, of course, especially done at scale. But it is hard to codify, and I don’t expect people to be as interested in the broader problem as I am. And it is hard to give the entire context, briefly, sufficient that someone who is busy and disinterested (which is everyone when it comes to other people’s problems).
However.
Thinking about it yesterday, I think I have an answer.
There is a single constrained problem in research integrity which would forcefully make the point that science suffers from ineffective oversight, which root out scientific malfeasance, AND stop people from being hurt or killed.
Let us discard all other research areas in favour of one: medicine.
And let us also discard all other papers which might contain research misconduct except for cuckoo bird papers.
Just one type of paper.
In one area.
I’ll show you why.
The Cuckoo-Bird Problem
All bad research is a waste of time and money, and undermines public trust in science. But bad medical research is the most immediate threat to human health.
In that domain, medical questions about treatment — ‘should we use protocol X?’ ‘is drug Y effective?’ — are not answered by single studies, they are answered by an accumulation of evidence.
These bodies of evidence directly drive medical treatment. They go into large meta-analyses, field-wide reviews, and Cochrane Library reviews - important documents which rely on several studies aggregated together. These documents are read by members of national governments and major medical associations, and directly inform best medical practice.
They are also not perfect. One major reason is that they comb through all the available evidence on a topic, and assemble it together. This assemblage often includes studies which are poorly conducted, problematic, or fake.
When this happens, untrustworthy studies which are included in these analyses often tip the balance of evidence one way or another. Either of these outcomes is a disaster - it is as much a problem to conclude an ineffective treatment is worthwhile as it is to conclude an effective treatment is not.
However, the researchers who write these studies have no formal training in how to spot bad research. They do assess individual studies for risk of bias (and there are other formal criteria studies have to meet to be included), and these are often diligently applied during the process of review of meta-analysis.
BUT.
Meta-analyses do not exclude studies for being fake.
This is not because they do not wish to, but because spotting fraud is a skill we have not yet codified. And no-one has ever paid me (or anyone else) to write a curriculum on how to do it.
Instead, the quality of evidence relies on the law of large numbers - one study of 200 people is a good estimate, but eight studies on 4000 people is a better estimate.
But medical research fraud breaks the law of large numbers.
Here is why.
(1) a study is published saying which supports a specific medical hypothesis: say, ‘laproxolol lowers post-surgical embolism risk’
(2) people pay attention to that study, and start to do their own studies; the original authors get funding to do follow-up studies, perhaps in different subgroups (men over 60, women after vascular surgery, etc. etc.)
(3) broader evidence starts to emerge, and researchers start to write about the ‘landscape’ of the drug, the overall position of its effectiveness
(4) people publishing fraudulent studies start to pay attention - one great way not to get caught making up research is to fabricate an uncontroversial result and hide it in a basket of other similar results - so, they do that
(5) all the studies available are combined to produce a meta-analysis
This is why the LLN breaks down. The more prominence an effort to establish a meta-effect size has, the more attention is attracts, and at the point a meta-analysis includes several honest results, it is increasingly likely to attract a smaller number of dishonest ones.
…and a few dishonest results can be enough to change the conclusion of the whole group, and hence the recommendation, of the eventual study.
Again: bad or fake studies do not have to make some grand or surprising assertion, they just need to change the conclusion of the collected evidence. Good evidence becomes mixed, good evidence becomes great, it doesn’t matter - all relevant changes are a disaster.
These incredibly dangerous studies have a name, because I gave them one a few years ago during the ivermectin debacle: ‘cuckoo bird studies’.
Cuckoos practice brood parasitism:
“[a] phenomenon and behavioural pattern of certain animals … that rely on others to raise their young. … The brood parasite manipulates a host, either of the same or of another species, to raise its young as if it were its own, usually using egg mimicry, with eggs that resemble the host's.”
Cuckoos lay eggs that grow into larger, hungrier infant birds in the nests of other species. They grow up to be bigger than their parents, who then struggle to feed their own chicks and themselves. Likewise, fraudulent studies laid into a background of honest studies ‘poison the nest’ and change their collective conclusions.
In the entire scientific landscape, these dishonest studies are the most significant, because they have the highest proximal ability to kill people.
Would you like an example? Of course you would.
THE DECREASE TRIALS
From the British Medical Journal:
In 2011, the European [1] guidelines recommending initiation of beta- blockade for many patients undergoing non-cardiac surgery were discovered to be based on a family of reports that had suffered from ‘data fabrication’ and ‘academic misconduct’.[2]
An attempt by the relevant university to investigate the research found that the study leadership ‘made a number of incorrect or contradictory statements’ which were ‘not very credible’[2] . That 2011 university investigation published a press release describing data as ‘fictitious’ and ‘unreliable’[3].
The 2012 second and final university report[4] confirmed that one of the studies[5] cited directly by the guidelines[1] with respect to beta blockade was ‘negligent’[4] and ‘scientifically incorrect’[4] . Curiously, the penultimate paragraph of the published investigation[4] read: ‘The report of the investigative committee on academic integrity dated 8 November 2011 has done considerable harm to the reputation of the research group involved. This 2012 follow-up investigation has not been able to limit this harm’. Clinical guidelines[1] under the same leadership had already based recommendations on these studies, which the University had now found to be ‘negligent’[4] and ‘scientifically incorrect’[4] .
Meta-analysis of the remaining (credible) trials now shows initiation of beta-blockade in preparation for non-cardiac surgery to be associated with a 27% increase in perioperative mortality.[6]
Across Europe, there are ~760 000 deaths/year following non-cardiac surgery, based on information published by the guideline leadership (1.9%[7] of 40 million operations/year[8]). Using the calculation recommended by the guideline leadership[7] upon these all-cause mortality data suggests up to 160 000 excess deaths per year. On that calculation, the toll for a full 5-year guideline lifespan[9] might reach 800 000, although there may be many clinicians who, through failing to adhere to these guidelines, may have paradoxically saved lives
For those of you who don’t read cardiology journals all day, let me translate.
The DECREASE were a family of trials published by a university in the Netherlands. They reported that giving patients undergoing major surgery died a lot less (note: a lot less) if they were given beta-blockers.
Surgery is, of course, stressful. One of the things beta-blockers do very well (and usually safely) is reduce the ability of the sympathetic nervous system to act on the heart and blood vessels.
So, in this context, that means slower heart rate, lower blood pressure, and less electrical heart malfunction = less post-operative cardiovascular problems.
Unfortunately, the same mechanism also increases the rate of strokes, and, uh, death.
A large trial that compared beta-blockers vs. no beta-blockers concluded the drugs killed more people than they saved. DECREASE claimed that the lowered risk of post-surgical heart problems was much more important than the elevated risk of strokes.
Unfortunately, this was also ‘negligent’ and ‘scientifically incorrect’.
As the DECREASE trials, but particularly the first one - DECREASE I - were included in a meta-analysis of all of the studies available, the European Society of Cardiology did not explicitly recommend against their use for several years.
This killed a lot of people.
The estimate of 800,000 people seems high, but the underlying figures are absolute. The true figure is likely some reasonable proportion of that, and could be calculated further from the number of strokes, the number of prescriptions issued, or the changes in mortality rates seen after individual surgeries.
What would that number be? 50,000? 400,000? And, of course, how many lesser complications? Strokes may not be fatal.
Pause for a second and appreciate the sheer number of dead people here. I referred to the upper bound of this once in a talk as ‘about half a Pol Pot’.
It was later remarked to me that this was a distasteful comparison. It was not intended to be. I was multiplying the estimate by two.
There is no number that isn’t genocidal.
I wish I was being hyperbolic. Cuckoo-bird studies are parasitic in the straightforward sense - they kill the host.
Some additional points, all of them important:
The most important point here is: the problems with the original DECREASE study were reasonably straightforward to detect. The statistical methods to do so were not complicated. I have done similar work for a decade. I have just never had the resources to focus that work on a single problem.
This is one single solitary example. A bad one, certainly, but a single one. Also, not the only one (please see the appendix below for additional serious examples). We have no cogent estimate of how many cuckoo-birds there are. We do not yet know how bad this is.
And finally, I have not even attempted to outline the amount of wasted time, effort, and money that dead-end work like this can waste. It is enough to think about the illness created and the lives lost.
CONCLUSION
What I am proposing is an organisation that would proactively detect exactly these sorts of papers. I am open to it being broader, but small organizations typically start in a strong niche.
The mechanics are simple:
ingest information from the public and/or review omnibus analyses
inspect outlier results
determine if they have problems
if detected, determine if those problems hurt or kill people
tell the world
It is possible that there are many examples of these studies. We do not know how many, because no-one has ever looked for them.
To do this, I need about a million dollars that no government will give me. Realistically, it should be two, but we need to start.
This would be enough to start: buy stuff that goes beep, train analysts, establish investigative operations, start to identify problems, and begin to put blood in the water.
This is not cheap. I think it is a small investment to stop people from being hurt or killed by research fraud. It is simply unclear who it might come from.
I can obviously expand on this at great length, and with documents that are recognisable as budgets, strategy, et al. but hopefully this is enough for anyone who is interested in the problem to want to talk about it.
CODA
I will be writing up further examples in coming days. There will be stacks of bodies. See below for some previous examples.
FURTHER READING:
They should scare the shit out of you. If you think ‘Jesus, is it as bad as he’s saying?’ then I understand your hesitation because it seems like the sort of thing that We Do Not Let Happen, but it absolutely IS as bad as I am saying.
However, I should also point out this is not a new problem, and I did not discover it. I am simply the person who thinks dealing with it should be codified and treated as a discrete task that will stop people from being dead.
Several additional examples are given below, all from high-profile outlets.
A more recent example is that of Yoshihiro Sato, a Japanese bone-health researcher. Sato, who died in 2016, fabricated data in dozens of trials of drugs or supplements that might prevent bone fracture. He has 113 retracted papers, according to a list compiled by the website Retraction Watch. His work has had a wide impact: researchers found that 27 of Sato’s retracted RCTs had been cited by 88 systematic reviews and clinical guidelines, some of which had informed Japan’s recommended treatments for osteoporosis3.
Some of the findings in about half of these reviews would have changed had Sato’s trials been excluded, says Alison Avenell, a medical researcher at the University of Aberdeen, UK. She, along with medical researchers Andrew Grey, Mark Bolland and Greg Gamble, all at the University of Auckland in New Zealand, have pushed universities to investigate Sato’s work and monitored its influence. “It probably diverted people from being given more effective treatment for fracture prevention,” Avenell says.
Originally published in Nature.com; https://www.nature.com/articles/d41586-023-02299-w
Open link: https://readwise.io/reader/shared/01hackp0jb3mesdda2ygahef2f/
Millions of patients may, as a consequence, be receiving wrong treatments. One example concerns steroid injections given to women undergoing elective Caesarean sections to deliver their babies. These injections are intended to prevent breathing problems in newborns. There is a worry that they might cause damage to a baby’s brain, but the practice was supported by a review, published in 2018, by Cochrane, a charity for the promotion of evidence-based medicine. However, when …[scientists] looked at this review, they found it included three studies that they had noted as unreliable. A revised review, published in 2021, which excluded these three, found the benefits of the drugs for such cases to be uncertain.
Originally published in the Economist: https://www.economist.com/science-and-technology/2023/02/22/there-is-a-worrying-amount-of-fraud-in-medical-research
The results showed that high dose mannitol greatly reduced death and disability six months after the head injury. A Cochrane systematic review that included these trials concluded: “high dose mannitol seems to be preferable to conventional dose mannitol in the acute management of comatose patients with severe head injury.”4 However, one of the trials was accompanied by an editorial that questioned the reliability and validity of the results, calling for further multicentre studies.5 A subsequent investigation by the Cochrane Collaboration was unable to confirm that the studies took place.
Originally published in the British Medical Journal: https://www.bmj.com/content/334/7590/392
This deserves funding. To prioritise what needs to be looked at is no minor feat. Most people get lost in quite useless anti quackery.
As chemist I also recognised how medicine has more potential because better evidence standards already exists compared to chemicals (safety or effects) and more measurable, immediate and large scale effects are at play around saving lives (except for a few exceptions like the ozone layer).
To support your case even further: Lets take a look at the biggest industries (2021, yahoo et al, market value) to determine which are the areas of our lives that could be disturbed by fake studies the most.
Financial services 22.5
Construction 12.5trn
Commercial Real estate 9.6trn
E-commerce 9.09trn
Health insurance 8.45trn
IT 5trn
Food 5trn
Oil& Gas 4.5trn
Automotive 3trn
Telecommunications 1.74trn
Pharma 1.6 trn
Except for the following industries -
Food
Pharma
- None of these industries would be easily affected by unnoticed research misconduct & affect as many lives, since they would rather immediately see that some finding is wrong (if they don’t just believe wha is said but look for it*), since no complex human organism is involved & results are rather easily visible and measurable & don’t need too much time to unfold.
Oil & Gas with environmental & health effects, as well as Automotive and Health Insurances could be the exceptions, if the Conflict of interest takes reign & they start to finance fabricated studies (or manipulate testing software*) to get out of their duties & stay in business. Happened before. My gut about global natural disaster statistics tells me falling buildings aren’t killing people on a genicidal level.
So judging by this and the amount of money that goes into medical practice (Health Insurances), this prioritisation is of top relevance even in the off-chance it is not the top priority for where to identify research misconduct (se if some trendy ubiquitous chemical kills or does whatever hormonal with humanity or the ozone layers or atmosphere before we find it out, because we only monitor half an inch of grass on this planet.)
One must also never forget the opportunity costs of lost progress. Research Misconduct in virus and pandemic research are ultimately included in this proposal. climate research isn’t included but one crew can’t do everything, nor can we drop everything and only focus on the climate, which could benefit from more resources for progress & some research integrity on another notice. All in all this pitch is superb, only the “how” can be discussed:
If there are no immediate hints from someone on what study might be fake:
1) Wouldn’t it be more efficient to start with systematic reviews or studies that made it into guidelines or quasi guideline documents?
2) Wouldn’t it then be efficient for the purpose of the institute to prioritise the fraud-check to studies/sys.reviews in guidelines that could affect most people(death/QALY’s) if the results were distorted by fraud.
3) Probably a huge open door: The cases of the links relate to studies of single authors: this suggests checking systematic reviews for fraud robustness by filtering out one author at a time and seeing where the results go. This could help decide which studies should be investigated.
4) Because Cuckoo studies don’t tend to be outliers, how reliable are your methods to detect them? Do these methods detect the entirety of trials that were found in human history to be fraudulent or problematic? This sounds hard to do, since real but not well made RCT’s, could be the one or 2 studies that the results depend on & it could need: raw data, who knows what or even lengthy individual patient level data reviews to check for weird diagnosis summaries and readjudication of diagnostic outcomes until the statistics hide the side effects or make the results look good.
4.1) What I really want to ask is how your institute can hold the long breath it takes to do that, since it takes years to get trial data, via request, leak or court order, if it’s even possible. Or is this excluded from the institutes scope for now? Which would be entirely reasonable as well.
4.nonesense) Give us a brief CV of fraud detection. I want to envy your skill and experience while looking at something like a Google scholar profile only about scientific fraud detection & initiated digital toilet paper retraction.
Please go run about and bother some philanthropists until they give you the money for this institute. There are US tax returns on philanthropic activities and studies that checked who pays whom to promote science. Maybe you could phone everyone on that database.
First of all, thank you for writing this!
As to the solution, we have been long advocating the development and implementation of preventing measures that can be very effective.
For example, we approached three major publishing houses (from editor to VP level) and asked them to consider the following experiment. We suggested identifying a journal that receives a large number of submissions and that would agree to modify the Instructions for Authors by requesting authors’ consent to allow a potential assessment of their laboratory notebooks if the manuscript is accepted and published. This proposal has not been accepted by any publisher and the reason was not the cost of the assessments. Despite the low probability of a paper being subjected to an assessment (which could be 1 in 1,000), explicit reinforcement contingencies were thought to endanger the submission rate that could put the publishing business model at risk.
This was not surprising, because feedback control, even under the conditions of the partial reinforcement, can be very powerful. One well-known example is tax systems, where not every tax return report gets audited but the probability of being audited is nevertheless sufficiently high to keep most (certainly, not all) taxpayers law-obedient.