Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked Latticework: Unifying annotation and freeform text editing for augmented sensemaking by Matthew Siu and Andy Matuschak

Back in early February I got a chance to work with a beta tool for sense making in my notes. See my impressions at the time. Matthew Siu and Andy Matuschak watched me for an hour as I used their prototype tool to start shaping a workshop design from various inputs. I was intrigued and enthusiastic, but a few weeks later due to some tech glitches I stopped using it. Today Maarten den Braber in an e-mail pointed me to Latticework from last June, describing the project as it stood at the end. It’s an interesting read, which I annotated (if you read those annotations, start at bottom of the page to read them from the top of the article (or use Hypothes.is to see them in context,there’s no way to link to the overview directly for non-users I think).

I re-installed the plugin in Obsidian, and will work with it some more. Here’s hoping some of the original glitches no longer occur.

We had a strong personal motivation for this project: we often find ourselves stuck in our own creative work. Latticework’s links might make you think of citations and primary sources—tools for finding the truth in a rigorous research process. But our work on Latticework was mostly driven by the problems of getting emotionally stuck, of feeling disconnected from our framing of the project or our work on it.

Matthew Siu and Andy Matuschak

When I was at university and my electronic engineering student association got an internet connection at the very end of the 80s, we named our servers. In the early 90s we had Utelscin (a mix of the (sub)domain names for the uni, faculty and association), and Bettie. Bettie was the mail server, short for Bettie Serveert, ‘Bettie serves’, after a Dutch alternative rock band (the band in turn was named after the title of a book on tennis by Dutch tennis player Betty Stöve).

Just now I was going through some papers on language and thinking by Dr. Evelina Fedorenko at MIT’s EvLab, where I came across a statement they name the lab’s hardware after scientists and engineers in history who did not get sufficient credit for their contributions. I like that.

screenshot of EvLAb website stating they name hardware after scientists, with links to those names

Maybe we should do something like that in our company too, for undercredited people in the fields we are active in.

In reply to It is bigger than a tiny little textbox by Dave Winer

What is biggger than a tiny little textbox, like the ones we get on social platforms, and a full blown CMS, like the editing back-end of my WordPress site? Asks Dave Winer. My current answer to that is: where I’m writing this reply now.

Mid 2022 Dave Winer talked about two-way RSS, which morphed into textcasting by the end of 2023. Now he’s looking at an editor that would work like that.

In my personal feed reader I added a form to post responses. You see Dave Winer’s posting that I’m responding to, and the response form.

The editor I am writing this in, is a simple webform underneath an entry in my feed reader. See the image above. Allowing me to respond while I’m reading feeds, and then move on to reading the next bit.

The editor allows me to set a title, keep the the title of the thing I’m responding to, or have no title. It can cater to different types of response (bookmark, favourite, reply). It can send to several WordPress sites (my blog, my company’s, the Dutch IndieWeb community site, and my company’s internal team site. As a post or a page.

Me writing this post in the response form in my feedreader.

But not just post to a website. It can post an online annotation to my Hypothes.is (the ‘H.’ response option at the top), and it can post to my local Obsidian markdown notes (the ‘obs’ site option underneath the edit boxes).

It accepts categories and tags as the same thing. The receiving site or location determines if one of the key-words is a category locally and treats the rest as tags.

It doesn’t use RSS except as source of the item I respond to, it uses the Micropub standard to talk to websites. It could use RSS or OPML. It accepts HTML and posts as Markdown to my notes. I just started tinkering with my feed reader and response form again, so I can take Dave’s question into account while doing that.

Now, the question: What’s between a tiny little text box and a full-blown content management system?
The question we intend to answer.
That’s what textcasting is for, to identity the essential features. This editor supports them.

Dave Winer

Bookmarked Commission opens non-compliance investigations against Alphabet, Apple and Meta under the Digital Markets Act (by European Commission)

With the large horizontal legal framework for the single digital market and the single market for data mostly in force and applicable, the EC is initiating first actions. This announcement focuses on app store aspects, on steering (third parties being able to provide users with other paths of paying for services than e.g. Apple’s app store), on (un-)installing any app and freedom to change settings, as well as providers preferencing own services above those of others. Five investigations for suspected non-compliance involving Google (Alphabet), Apple, and Meta (Facebook) have been announced. Amazon and Microsoft are also being investigated in order to clarify aspects that may lead to suspicions of non-compliance.

The investigation into Facebook is about their ‘pay or consent’ model, which is Facebook’s latest attempt to circumvent their GDPR obligations that consent should be freely given. It was clear that their move, even if it allows them to steer clear of GDPR (which is still very uncertain), it would create issues under the Digital Markets Act (DMA).

In the same press release the EC announces that Facebook Messenger is getting a 6 month extension of the period in which to comply with interoperability demands.

The Commission suspects that the measures put in place by these gatekeepers fall short of effective compliance of their obligations under the DMA. … The Commission has also adopted five retention orders addressed to Alphabet, Amazon, Apple, Meta, and Microsoft, asking them to retain documents which might be used to assess their compliance with the DMA obligations, so as to preserve available evidence and ensure effective enforcement.

European Commission

Bookmarked Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality by William Harding and Matthew Kloster

Gitclear takes a look at how the use of Copilot is impact coding projects on GitHub. They signal several trends that impact the overall code quality negatively. Churn is increasing (though by the looks of it, that trend started earlier), meaning the amount of code very quickly being corrected or discarded is rising. And more code is being added to projects, rather than updated or (re)moved, indicating a trend towards bloat (my words). The latter is mentioned in the report I downloaded as worsening the asymmetry between writing/generating code and time needed for reading/reviewing it. This increases downward quality pressure on repositories. I use GitHub Copilot myself, and like Github itself reports it helps me generate code much faster. My use case however is personal tools, not a professional coding practice. Given my relatively unskilled starting point CoPilot makes a big difference between not having and having such personal tools. In a professional setting more code however does not equate better code. The report upon first skim highlights where benefits of Copilot clash with desired qualities of code production, quality and team work in professional settings.
Via Karl Voit

To investigate, GitClear collected 153 million changed lines of code,
authored between January 2020 and December 2023….. We find disconcerting trends for maintainability. Code churn — the
percentage of lines that are reverted or updated less than two weeks after
being authored — is projected to double in 2024 compared to its 2021,
pre-AI baseline. We further find that the percentage of “added code” and
“copy/pasted code” is increasing in proportion to “updated,” “deleted,” and
“moved” code.

Gitclear report