Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Aaron Swartz would have turned 32 November 8th. He died five years and 10 months ago, and since then, like this weekend, the annual Aaron Swartz weekend takes place with all kinds of hackathons and events in his memory. At the time of his suicide Swartz was being prosecuted for downloading material in bulk from JSTOR, a scientific papers archive (even though he had legitimate access to it).

In 2014 the Smart New World exhibition took place in Kunsthalle Düsseldorf, which Elmine and I visited. Part of it was the installation “18.591 Articles Sold By JSTOR for $19 = $353.229” with those 18.591 articles printed out, showing what precisely is behind the paywall, and what Swartz was downloading. Articles, like those shown, from the 19th century, since long in the public domain, sold for $19 each. After Swartz’ death JSTOR started making a small percentage of their public domain content freely accessible, limited to a handful papers per month.

The Düsseldorf exhibit was impressive, as it showed the volumes of material, but the triviality of most material too. It’s a long tail of documents with extremely low demand, being treated equally as recent papers in high demand.



Scientific journal publishers are increasingly a burden on the scientific world, rent-seeking gatekeepers. Their original value added role, that of multiplication and distribution to increase access, has been completely eroded, if not actually fully reversed.