I tried to get this published for a while. It bounced around during the rankings wars, but never found a happy home.
So I have combined all the edits of it here, for you.
Then I put the bile back in, just for me.
Oh yeah: I have a Substack now. Here’s my thought process: fuck it, could be fun.
*********************************************
In February 2022, a professor at Columbia University published a detailed analysis - on a Columbia University web page, no less - which accused his university of distorting its US News rankings.
(A quick second, first, to appreciate the glorious power move that is publishing criticism of your own employer, on its own servers - this is what tenure is for, regardless of what the frightened little rabbits of the academic world might think. If youth is wasted on the young, then tenure is wasted on the old.)
The investigation was important. US News rankings are the single most influential comparator of US universities. There is cachet in being in the top 500, or the top 100, or if you are Columbia, tied for #2.
The piece concludes, in part:
“Almost any numerical standard, no matter how closely related to academic merit, becomes a malignant force as soon as universities know that it is the standard. A proxy for merit, rather than merit itself, becomes the goal.”
Basically, they were fiddling the numbers.
Thaddeus checked some of the figures Columbia provided, and not even under the sunniest interpretation could they be considered correct.
Educators and parents of prospective college students were justifiably upset. They were presumably more upset when Thaddeus turned his attention to the engineering school in particular - which revealed a similar kind of ranking distortion.
Soon afterwards, US News investigated, and then initially removed Columbia from their ranking system entirely after they failed to substantiate the data previously sent.
This was the end of the beginning. Then, the arse went right out of the entire enterprise.
In November 2022, law schools at Yale, Berkeley, and Harvard announced they were refusing to participate in the ranking system at all. The law rankings are similarly important to the broader college rankings. And Yale is #1, Harvard is #4. Eventually, almost all the law schools in the top 14 quit providing data for the college ranking games.
There is no evidence that this is due to anything as problematic as Columbia - more that there are so many other reasons to find how the rankings were conducted annoying. In the case of the law schools, it is largely a disagreement about priorities: they penalise students who get public service jobs and reward schools who spend as much money per student as possible. The administrators decided they wanted to educate lawyers, and not cock about with a pointless grading exercise.
This was an unusual move, as subservience to the ranking system of a newspaper is far more historically normal, as there have been several well-publicized examples of people trying to fiddle it over the last decade.
(My favourite: as recently as April last year, a complaint was filed with the New Jersey Superior Court which alleges Rutgers placed their MBA candidates into sham jobs to inflate their post-degree statistics, which are another cornerstone metric for university rankings. Can’t get your graduates jobs? Make some up!)
And now, it feels like the rankings are starting to wither a bit. The yearly list is normally released in March, but the timing of the latest version was pushed back twice as US News strived to implement “substantive changes” to its methodology which themselves inspired new concerns about the underlying data. Now, at last, the new and allegedly improved law and med school rankings have been released, with much less fanfare than previous years.
Ranking tomfuckery is not a secret in higher ed. It is not even an open secret. I myself have taught a class that was rigidly capped at 19 undergraduates (so it would qualify as the smallest possible class size - “less than 20”). I would have taken more students but was only given 19 chairs. I was working at Northeastern then, who were notorious for being obsessed with rankings games like this… and equally obsessed with keeping it quiet.
This greasy fealty to an empty exercise of paper excellence is a prime example of what scientists call Goodhart’s Law, which we can paraphrase to: “any measure that becomes a target stops being a good measure”. Ask your friendly neighbourhood scientist if they’ve heard of it: the answer will probably be yes.
The reason why is simple - because the entire higher education system and the research infrastructure within it is riddled with Goodhartian nonsense. Distorting educational or scientific objectives to fit an capricious external ranking system is usually just called Tuesday.
The academic version of the US News rankings is called the Impact Factor, a method of ranking the importance of academic journals via a simple formula which divides the recent citations to that journal by the number of articles published within it. Approximately, an average citation rate per article, calculated over the last two calendar years.
The Impact Factor is both extremely important and deeply fucking stupid.
Important because formal assessment of academic work - say, for instance, in the evaluation of a grant an academic would submit to a government funding body - often relies on it heavily. Many researchers have lost their positions due to their inability to win this funding, and many of those rejections have been because their work is ‘published in lower impact journals’.
But, simultaneously, silly. Reams of research on the impact factor have concluded that it is a bad metric for ranking journals and a truly terrible metric for assessing the importance of any given piece of work. At a minimum, it is completely non-reproducible - there’s no way to tell how it was calculated, because it’s calculated by a private company according to unpublished criteria. Of course, it also makes no mathematical sense: if a journal publishes five academic papers, and four are never cited, but one is cited 100 times, the journal has an impact factor of 20 - a figure which describes none of the papers in it.
This is only partially hypothetical. Impact factors are driven strongly by Pareto-like laws, where a small amount of very heavily cited papers drive the number. I analyzed this once - kill the top 4% of cited papers in a journal, the IF drops by 20%. Top 10% of papers, 20 to 40%.
Journals, like universities, take their rankings-worship to an extreme. I published an example of this with a colleague that described the meteoric rise of the British Journal of Sports Medicine in the Impact Factor rankings.
Again, an Impact Factor is the number of recent citations to a journal divided by the number of research items it publishes. Interestingly, some of what a journal may publish (i.e. correspondence, editorials, or other similar short items without data) are not ‘research items’. However, citations of these shorter documents still count towards increasing the impact factor.
These smaller and entirely less important short publications are usually quite uncommon in research journals - call them old-fashioned, but academics have generally settled on a system where academic research journals publish academic research. But one issue of the BJSM counts only three empirical research papers and no fewer than eight individual editorials.
This is extremely unusual. The regular amount of editorials is one, or none. In fact, in 2017, only ~25% of all the items BJSM published were actually research. A typical amount would be well over 80%.
Likewise, it is a long-observed regularity in academic work that review papers (which summarise progress in a broad research area) and meta-analyses (which aggregate the statistics from multiple papers in order to reach a broader conclusion about a scientific outcome, usually shittily) are well cited. Those, too, were preferred for publication.
The irony here is quite obvious: according to the impact factor, a successful journal publishes as little actual research as possible.
I should emphasise here: this is not illegal, and some people may not even regard it as unethical. These people are wrong, of course, and short-sighted little shitbirds, but rankings bollocks is at least a logical outcome of maximising local benefit in a system that contains perverse incentives. It also has the corrosive tendency to turn serious research publications into the scientific equivalent of People magazine.
This, too, is the problem with the broader university ranking fixation, because all Goodhartian systems demand work turn into meta-work - in other words, rather than administrating the university, they administrate the ranking metrics.. Administrators spend many thousands of hours preparing reams of data using creative accounting methods, invent job schemes to raise the number of students who are employed after graduation, strategically position their institutions to increase their reputation metrics, artificially control class sizes, and more.
But then, somehow, in this case, they found a spine. And then they rebelled.
Can the employees they administer come up with the same kind of clarity? Because a lot more has been written about how research ranking games are counterproductive, much moreso than administrative rankings games - because researchers research things! Including their own rankings.
This criticism goes back many years. For instance, this from 1998.
The source of much anxiety about Journal Impact Factors comes from their misuse in evaluating individuals, e.g. during the Habilitation process. In many countries in Europe, I have found that in order to shortcut the work of looking up actual (real) citation counts for investigators the journal impact factor is used as a surrogate to estimate the count. I have always warned against this use. There is wide variation from article to article within a single journal as has been widely documented…
In other words, the modal use of the impact factor - to evaluate individuals by measuring the fanciness of their work determined by the impact factor of where they’re published - is explicitly described as misuse.
This criticism carries some weight, because the author invented the impact factor.
It’s hard not to be bleak about this: if 50 years of criticism didn’t help, what will?
Honestly, no idea - but the shine coming off the rankings business elsewhere can’t hurt.
What troubles me is: we can point these problems out all day, but there seems almost no real will to do anything about it. I have no idea what to do about that. As in, I can't even imagine what combating the malignant impact of Goodhartian scenarios would look like. Maybe if there were some viable means of addressing them they'd get some attention.
Fascinating