Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence

Finalised in June, the AI Act (EU 2024/1689) was published yesterday 12-07-2024 and will enter into force after 20 days, on 02-08-2024. Generally the law will be applicable after 2 years, on 02-08-2026, with. a few exceptions:

  • The rules on banned practices (Chapter 2) will become applicable in 6 months, on 02-02-2025, as will the general provisions (Chapter 1)
  • Parts such as the chapter on notified bodies, general purpose AI models (Chapter 5), governance (Chapter 7), penalties (Chapter 12), will become applicable in a year, on 02-08-2025
  • Article 6 in Chapter 3, on the classification rules for high risk AI applications, will apply in 3 years, from 02-02-2027

The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety, fundamental rights as enshrined in the Charter of Fundamental Rights of the European Union (the ‘Charter’), including democracy, the rule of law and environmental protection, to protect against the harmful effects of AI systems in the Union, and to support innovation. This Regulation ensures the free movement, cross-border, of AI-based goods and services, thus preventing Member States from imposing restrictions on the development, marketing and use of AI systems, unless explicitly authorised by this Regulation.

Bookmarked You don’t hate AI; You hate… by Mita Williams

Mita Williams pulls together an interesting set of quotes. Need to still read this for real but this looks like a great little list that ties the unease around AI to other aspects that have been uneasy much longer. At the end at first glance it lists a court case that on the basis of Charlie Stross’s remarks on corporations being slow AI compares strict AI regulation to the loose position in the US of corporations being persons too.

Swiss author and playwright, picked the book up in Zurich in 2024. I thoroughly enjoyed Die Erfindung des Ungehorsams (2021), the invention of disobedience, and read it in one sitting. Well told, many beautiful sentences. Three women in NYC, China and England are followed as they try to understand the world. Their stories are interwoven through the emergence of AI driven automatons grasping their true autonomy. One because she sees the future in Babbage’s machinery and determines how to program them, one making sex dolls in China that get fitted with AI, one hosting Manhattan dinner parties where she tells, invents?, a story and only the others eat. All three finding a way to break their constraints, and become disobedient to their surroundings. A multilayered work, as one critic Daniela Janser wrote, a poetic homage to the oldest programming language of all, imagination. I will probably buy her more recent work Vor aller Augen, before all eyes, soon.

Bookmarked Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality by William Harding and Matthew Kloster

Gitclear takes a look at how the use of Copilot is impact coding projects on GitHub. They signal several trends that impact the overall code quality negatively. Churn is increasing (though by the looks of it, that trend started earlier), meaning the amount of code very quickly being corrected or discarded is rising. And more code is being added to projects, rather than updated or (re)moved, indicating a trend towards bloat (my words). The latter is mentioned in the report I downloaded as worsening the asymmetry between writing/generating code and time needed for reading/reviewing it. This increases downward quality pressure on repositories. I use GitHub Copilot myself, and like Github itself reports it helps me generate code much faster. My use case however is personal tools, not a professional coding practice. Given my relatively unskilled starting point CoPilot makes a big difference between not having and having such personal tools. In a professional setting more code however does not equate better code. The report upon first skim highlights where benefits of Copilot clash with desired qualities of code production, quality and team work in professional settings.
Via Karl Voit

To investigate, GitClear collected 153 million changed lines of code,
authored between January 2020 and December 2023….. We find disconcerting trends for maintainability. Code churn — the
percentage of lines that are reverted or updated less than two weeks after
being authored — is projected to double in 2024 compared to its 2021,
pre-AI baseline. We further find that the percentage of “added code” and
“copy/pasted code” is increasing in proportion to “updated,” “deleted,” and
“moved” code.

Gitclear report

A final draft of the European AI Regulation is circulating (here’s an almost 900 page PDF). The coming days I will read it with curiosity.

With this the ambitious legal framework for everything digital and data that the European Commission set out to create in 2020 has been finished within this Commission period. That’s pretty impressive.
In 2020 there was no Digital Markets Act, Digital Services Act, AI Regulation, Data Governance Act, Data Act, nor an Open Data Directive/High Value Data implementing regulation.
Before the European elections coming spring, they are all in place. I’ve closely followed the process (and helped create a very small part of it), and I think the result is remarkably consistent and level headed. DG CNECT has done well here in my opinion. It’s a set of laws that are very useful in themselves that which simultaneously forms a geo-political proposition.

The coming years will be dedicated to implementing these novel instruments.