Part 2: Hindawi.
Introduction
Recap
In 2020, the scientific publisher Hindawi started getting hit by a scam - across multiple journals, dozens of guest editors accepted thousands of poorly written, nonsensical, and clearly fraudulent articles.
In January, 2021 Wiley completed the acquisition of Hindawi for ~$300M.
In August, 2022 - which is 20 months later - staff at Wiley finally noticed they had (a) they had bought a nonsense-publishing machine, and (b) then overseen the publication of many thousands of additional nonsense papers after acquisition.
Approximately 11,300 of these papers have been retracted so far.
This is the biggest failure of scientific governance in human history.
This is Part 2.
What’s in Part 2?
Today, will be dealing with some of the more Hindawi-specific factors in this massive failure.
Later, in Part 3, we will deal with Wiley’s role.
In Part 4, I will make recommendations and lose my temper completely.
WHAT HAPPENED
Summary
Paper mills and special issue scams were well known, widely reported on, and had affected other publishers before August, 2022.
Multiple public reports of research integrity issues at Hindawi journals pre-date either of the publishers taking notice of the problem. Specifically, reports which showed obvious signs of paper mill activity were in the public domain well before anyone realised they had a problem. The data in the paper mill papers themselves is also completely unbelievable.
The Hindawi mass retraction has been under-reported, puzzlingly incomplete, and very slow. It is actually still going on. Right now. Two years later.
One at a time now.
(1) Paper mills and special issue scams were well established, widely reported on, and had affected other publishers before August, 2022.
A ‘paper mill’ is an underground commercial organizations that fabricates scientific papers to order: they invent fake experiments with fake data, turn those into fake papers, sell authorship slots on those papers to real scientists who don’t have the time or ability to write their own, and in doing so have a red hot go at trying to ruin the trustworthiness of science entirely.
Paper mills aren’t exactly public knowledge, so it might be tempting to think that the problem they posed to Wiley/Hindawi was obscure, difficult to detect, or required specialised knowledge to understand.
If that was the case, it would be a mitigating factor in failing to notice the activity of paper mills in the few hundred journals that were bought when Hindawi was acquired.
But: knowledge and discussion of paper mills is long established, and a mature research integrity program that worked to protect journal group assets would be capable of detecting the problem.
(Also, if they require specialized knowledge to understand, you’d hope that knowledge would be present in a well-paid research integrity team at a large publisher, the exact people who are paid to know that sort of thing.)
Anyway. Here are some examples of the long-established discussion:
*******************************************************************************************
December, 2014. Charles Seife writes an article for Scientific American which includes the first description I’m aware of that has hallmarks of a paper mill. There are earlier reports of fake papers, but this is the first description of the prototypical ‘paper mill language’ that informs the below.
July, 2018. Jana Christopher describes her experience with paper mill products at FEBS Letters.
February, 2019. Jennifer Byrne and colleagues describe their experience with paper mill products in Biomarker Insights.
July, 2019. RetractionWatch reports on the 123mi.ru paper mill, which claims to have fabricated more than 10,000 papers for paying researchers.
December, 2019. Jamie Trapp outlines the precise mechanics of what a special issue scam looks like, and why a paper mill would pursue one. Published in Physical and Engineering Sciences in Medicine.
February, 2020. Byrne and Christopher collaborate to produce a guide to the business model, hallmarks, and handling of paper mill manuscripts. Published in FEBS Letters.
February, 2020. Elizabeth Bik describes the precise hallmarks of a single paper mill that specialises in fraudulent biology, including the types of data and figures which are normally most straightforward to detect.
February, 2020. Science Magazine reports on the above.
December, 2020. The Journal of Nanoparticle Research describes their experience with a fake special issue, including the mechanics of how paper mills impersonate well-known researchers to get their fabrications accepted. “The Journal of Nanoparticle Research victim of an organized rogue editor network!”
February, 2021. Roland Seifert publishes a series of observations about hallmarks of paper mill activity. In particular, he stresses that soft targets for paper mills quickly become known in the fraudulent papers industry, and that it is important to find and retract mill products immediately so the problem does not grow. Published in Naunyn-Schmiedeberg's Archives of Pharmacology.
June, 2021. RetractionWatch reporting on special issue scams (also called ‘guest editor’ scams, as guest editors handle special issues), and mentions this is the fourth time in the public record this scam has been detected.
July, 2021. RetractionWatch reports that ~400 papers from Elsevier journals are being considered for retraction due to special issue scams.
July, 2021. Guillaume Cabanac, Cyril Labbé, & Alexander Magazinov release the Problematic Paper Screener, which can analyse ‘tortured phrases’. Tortured phrases are a common and hilarious features of paper mill papers. To defeat plagiarism detectors, paper mills use software to automatically swap certain words for other words, to make text they copy sufficiently unique that it isn’t detectable.
Unfortunately for them, the software is really bad, and makes hilariously inaccurate substitutions. Some memorable examples: “Bosom peril” is not “breast cancer”, and "butt-centric waterway" is not “anal canal”.
BUTT-CENTRIC WATERWAY!
*******************************************************************************************
The list above features work by many of the well-known people and publications in this space - these are the people who publish the new papers, build the statistical tools, speak at the conferences, etc. I know all these people.
While I’m writing this, a few of them are actually speaking at the World Conference on Research Integrity. They’re hardly trying to hide, because none of this is a secret. In fact, some of them are probably speaking about this. The above list represents an ongoing attempt to direct as much attention towards the problem as possible.
I’m also not including the many academic blog posts that I think were crucial in getting Wiley to finally pay attention to the problem in 2023. We’ll see them next episode in Part 3.
Finally, I have also stopped a full year before the staff at Wiley noticed their Hindawi special issue program was overrun with paper mill output, because otherwise this would be simply be pages of hyperlinks.
(2) Multiple public reports of research integrity issues at Hindawi journals pre-date either publisher taking notice of the problem. Specifically, reports which showed obvious signs of paper mill activity were in the public domain well before Hindawi realised they had a problem.
You need to understand just how disinterested publishers are in research accuracy.
We know this because they’ll just flat out say it in public documents.
Q. What is the best practice in responding to PubPeer comments and/or emails sent from anonymous users? Are publishers and editors expected to post responses to these sites with outcome details of any investigations? And are editors expected to respond to concerns raised via social media that are relevant to our affiliated journals (especially when the editor is mentioned in the post)?
A. Wiley does not actively review PubPeer comments, however, we do investigate those claims if they are raised with the journal or with Wiley directly. More often than not, we are made aware of concerns raised on PubPeer through social media. In those cases, we review the claim and engage the journal in an investigation. We treat anonymous claims in the same fashion; we evaluate for their legitimacy, investigate with the journal (and this may involve authors or their institutions), and we take the appropriate corrective action. We do not post outcomes to PubPeer, but all corrective actions we take are public, whether they be a correction, retraction, withdrawal, or expression of concern.
Hindawi’s approach is very similar to Wiley’s approach as detailed above and we do not currently actively monitor commenting sites such as PubPeer, but we do investigate concerns initially raised on PubPeer when raised to us directly. We have previously provided comments and updates on PubPeer, however, this functionality is currently restricted and not available to publishers without payment. For transparency, where an issue has been raised on PubPeer initially, Hindawi cites the PubPeer comment in any related correction or retraction notice.
This, to me, is astonishing.
There is a website, which is free, where researchers post research integrity problems - and the research integrity staff responsible for the content at those publishers don’t even look at it.
They are 100% non-proactive. They only do something about the concerns raised if they are actively harassed about it.
And that, honestly, is the central mentality that leads you to a place where you buy 300 million dollars worth of garbage.
To get some help with this, I reached out to the lads at PubPeer myself. Brandon and Boris were very helpful in this regard, and here are the ballistics:
Before August, 2022 (which you might remember is the official time reported as ‘when Wiley noticed there was a problem’) there were more than two thousand individual comments left on Hindawi papers.
The below are a selection from the relevant journals.
Published December, 2019.
Comment February, 2021.
Duplicate images to separate results described elsewhere.
https://pubpeer.com/publications/36EB7E814777A2C92F3497CA63E6CF/#1
Published May, 2020.
Comment August, 2021.
Tortured phrases.
https://pubpeer.com/publications/CC02DB051C1BB7A03011491CB150EE/
Published June, 2020
Comment March, 2022
Potential plagiarism. Tortured phrases. Authors have previous form.
https://pubpeer.com/publications/2076FC74848179CC66E28C97915927/#1
Published July, 2020
Comment May, 2022
Completely incoherent maths.
https://pubpeer.com/publications/1C0AFA2E862BB9E1D60E7BEC0FABA8/#1
Published November, 2020
Comment November, 2021
Irrelevant citation patterns. Nonsense maths.
https://pubpeer.com/publications/8DA7C073DA7ED5F86693113DDF6422/#1
Of course, these could be dismissed as a small and fairly random group of comments. But I just stopped after five, because I can’t read 2000 comments, and I only have so much time.
If you know the first thing about research integrity, these sorts of reports are describing, in public, the three big hallmarks of paper mills.
nonsense language
nonsense maths
nonsense citations
And, yes, maybe you’d have to be knee-deep in the muck of global scientific horseshit to spot the patterns.
But also, why wouldn’t the research integrity team at a multi-billion dollar company be exactly that? They are the professionals. They have jobs working as research integrity experts, in teams with other research integrity experts.
We, as I am continually reminded, are the amateurs.
Let me make this worse. You don’t even have to go onto weird websites where people like me point out bad science. You don’t even need to read the papers.
All the data you need to spot the fact that something isn’t right, is written on the front page of the papers themselves. It’s not hidden, it’s obvious.
I’ll confine myself to just two examples.
https://osf.io/preprints/psyarxiv/6mbgv
Professor Dorothy Bishop at Oxford did an analysis of how long it took these papers to get published. The analysis is quite straightforward - every paper lists the day it was submitted to the journal, the day it was received a second time after being revised, the day the editor accepted the paper given the revisions made to it, and the date it was published on the website.
One issue of Wireless Communications and Mobile Computing from 2022, edited mostly by Hamurabi Gamboa Rosales, took an average of about 20 days to go from initial submission to revision submission. This is not unlikely, it’s impossible.
The easiest way to explain this is with an analogy.
Say there’s a pothole outside your house, and you call the council. You tell them ‘there’s a big hole in the road outside my house!’ The person at the other end, rather than tiredly telling you to fill out a form - which is what councils do all over the world, in my experience - instead yells ‘MOTHER OF GOD! WE’RE RIGHT ON IT!’
Twenty minutes later, a bitumen truck comes HURTLING around the corner of your street at full send, with the road workers hanging out the back of it, the driver leaning on the horn and yelling ‘GET OUT OF THE WAY! POTHOLE!’
They pull up outside your house, and you see the brakes go hot. But the guys don’t even wait for it to stop, they jump off while it’s slowing down, and they grab pry bars and a burner and a kettle of bitumen, and they start hammering out the edges, pour the bitumen and start slamming it with hammers almost at the same time. In about six minutes, the hole is filled and flattened, and they admire their work for about four hundred milliseconds and SCREAM off the way they came. No sooner has the truck disappeared, then your phone rings - and it’s the council worker from before.
‘POTHOLE! *pant* *pant* FIXED! Happy to be of service!’ *click*
That’s how likely the entire editorial process taking 20 days is.
Three times that, 60 days, would be lightning fast. Here’s what has to happen:
the author makes a submission to the journal
the editor of that journal, in this case a guest editor, assigns that paper to be analyzed by at least two external reviewers - who they have to find, typically this means sending a lot of emails
having found at least two people who accept the job of peer reviewing the paper, they then go through the paper to try to improve it (or, sometimes, reject it)
the peers send their findings to the editor
the editor, having received all those findings, writes to the authors with recommended changes
the authors make those changes, which can sometimes involve new experiments and observations
and then they resubmit the same paper for the editor’s approval a second time
That whole process doesn’t happen in 20 days SIXTY TWO TIMES.
And the data needed to do it? Again, the dates of submission, revision, and acceptance are written right on the front of every single paper. They’re just under the title. Right in the middle of the page.
https://deevybee.blogspot.com/2022/10/what-is-going-on-in-hindawi-special.html
Another piece of readily available information is the email address of the corresponding author, which is the author on a paper who agrees to handle any future emails about the paper. I’ll quote here from a blog post by Nick Wise, another researcher with a strong interest in paper mills.
The most intriguing fact about the papers in the special issue however, is that only 4 authors give corresponding email addresses that match their affiliation. These 4 include the only 3 papers with non-Chinese authors. Of the other 58, 1 uses an email address from Guangzhou University, 6 use email addresses from Changzhou University, and 51 use email addresses from Ma’anshan University. All of the Ma’anshan addresses are of the form 1940XXXX@masu.edu.cn and many are nearly sequential, suggesting that someone somewhere purchased a block of sequential email addresses (you do not need to be at Ma’anshan University to have an @masu email address).
If that seems complicated, let me simplify: in this special issue, 51 out of 62 corresponding authors all used email addresses - which were nearly sequential - from a university they didn’t work at.
Basically, the moment Dorothy and Nick knew where to look, when the story broke, they immediately found completely nonsensical and obvious problems staring them in the face, and wrote up compelling analyses of those problems in their spare time.
In other words: absolutely no-one at either publishing group or any journal did any diligence about any of these papers, ever.
(3) The Hindawi mass retraction has been under-reported, puzzlingly incomplete, and slow.
After all this time (bang on two years since the issue blew up like a satchel charge), it’s trivial even now to find un-retracted nonsense papers within the incredible mass of nonsense that is the Hindawi journal archives.
Yes. 12000-odd retractions, and they haven’t got them all.
A Breast Cancer Image Classification Algorithm with 2c Multiclass Support Vector Machine
Strong nonsense text.
“There are two kinds of calcification: microcalcification and microcalcification [19]. Microcalcification is characterized by a large quantity of calcium and is an asymptomatic sign of benign calcification, while microcalcification is characterized by a very little amount of calcium, which is less than 0.5 mm and is suggestive of malignant calcification [20].”
“The first methodology is based on the extraction of a collection of handmade highlights encoded by two coding models and generated by assist vector machines”.
‘Assist’ vector machines is a tortured phrase - the real text is ‘SUPPORT vector machines’, but - again - the software designed to obscure plagiarism replaced it with a synonym.
Oh, and:
“Because of its tiny size, the placement of the bosom tumor is curable, and it can improve patient observation.”
How did I find this? Just by searching for ‘bosom’ in the Hindawi archives. This is the most famous tortured phrase, subsequent to the now-infamous (and very good) article Bosom Peril Is Not Breast Cancer.
Let’s find another one.
Evaluation Index of School Sports Resources Based on Artificial Intelligence and Edge Computing
“The phenomenon of black screen in the middle of the caton improves the data computing ability, reduces the dependence on the performance of the user terminal equipment itself, builds a smart sports resource platform, and combines artificial intelligence (AI) to create smart communities and smart stadiums, and realizes event services and operations in important competition venues, intelligent travel, safety prevention, control, and so on.”
What?
“The conditions are open to the public, the society is the platform to meet the needs of bodybuilders, and reasonable opening hours and charging standards are formulated.”
WHAT?
This survived the purge?
A Combined Deep CNN: LSTM with a Random Forest Approach for Breast Cancer Diagnosis
Again, salted with nonsense.
“Followed by Mujarad’s examination that identified cancer malignant growth utilizing multifacet perceptron with a prescient exactness upsides of 65.21%.”
God forbid we should lack prescient exactness upsides…
“unpredictable boondocks estimation” …
This is presumably random forest estimators?
You can do this yourself. Pick any journal that used to be a Hindawi journal, and scroll to a few years ago, open a special issue. Find anything that hasn’t been retracted and look at it.
You’ll find a paper that’s wildly under-reported, confusing, and - even if you’re not an expert in the paper’s topic - definitely out of scope.
In fact, if you look hard enough, you’ll also STILL FIND WHOLE ISSUES OF NONSENSE. There are still contaminated special issues, two years later.
That’s where I found that last paper, in a special issue called Complexity and Robustness Trade-Off for Traditional and Deep Models 2022. In other words, it’s about deep learning.
This special issue also includes updated benzene models, putting COVID on the blockchain, and a skills survey of junior software developers. All of which have nothing to do with deep learning.
In short, in 20 months, the combined forces of two publishers failed to notice this problem.
And in another two years, have failed to finish fixing it.
Yes, I gave myself the luxury of a pull quote there.
The reason you’re reading this a few months after I wrote it is actually encapsulated in the above: I wanted to see if the above articles would migrate to the Wiley website unchanged. Click any of the links above, and it’ll take you from a Hindawi link through a proxy and into the Wiley website.
And the articles are now listed there.
They literally imported this nonsense and made it their own.
The only word for it is: pathetic.
Conclusion
I cannot over-emphasize how utterly asleep both organizations must have been, just what a galactic failure of governance this represents.
One thing that isn’t clear to me is also the time spent in due diligence before the acquisition was complete. We don’t get to know the nature of the diligence process, nor how long it took. So when I say ‘20 months’ above, that’s only the time period POST-acquisition. It was actually longer.
This is what people like me mean when we say flippant and hurtful things like ‘academic publishing has jumped the shark’, or ‘the world would better off without academic journals’. Those may or may not be true, but they belie the frustrations of just how utterly terrible these companies are at research integrity oversight.
At present, the global scientific enterprise relies more or less entirely on these companies to function.
And this is what they give us.
Next time, in Part 3, the role of Wiley.
What an absolute mess. And to think there are likely to be many (hundreds? thousands?) problematic papers still out there, too. One of my concerns is how this mass of paper mill content starts to shift research - for instance, when people are doing literature searches or using aggregating tools to scope a field.
Blogs like this are very helpful in highlighting the scale of these problems, and I also appreciated your history of paper mill literature, too.
I absolutely love your bitter snarkiness. I always know your blogs are going to be a fun take-down of the ivory tower. I myself (as a bibliometrics librarian) have tried to express cynicism about university rankings and the h-index, but that fell on deaf ears. So I get a chuckle out of your withering exposés of academic publishing. Please do a piece on "Research.com": It seems to be a new type of meta-academic fraud run by a Tunisian with a PhD in biometrics. That's all the expertise you need to do bibliometrics, right? (It's like Austria/Australia...how far could they be?)