The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked Commission opens non-compliance investigations against Alphabet, Apple and Meta under the Digital Markets Act (by European Commission)

With the large horizontal legal framework for the single digital market and the single market for data mostly in force and applicable, the EC is initiating first actions. This announcement focuses on app store aspects, on steering (third parties being able to provide users with other paths of paying for services than e.g. Apple’s app store), on (un-)installing any app and freedom to change settings, as well as providers preferencing own services above those of others. Five investigations for suspected non-compliance involving Google (Alphabet), Apple, and Meta (Facebook) have been announced. Amazon and Microsoft are also being investigated in order to clarify aspects that may lead to suspicions of non-compliance.

The investigation into Facebook is about their ‘pay or consent’ model, which is Facebook’s latest attempt to circumvent their GDPR obligations that consent should be freely given. It was clear that their move, even if it allows them to steer clear of GDPR (which is still very uncertain), it would create issues under the Digital Markets Act (DMA).

In the same press release the EC announces that Facebook Messenger is getting a 6 month extension of the period in which to comply with interoperability demands.

The Commission suspects that the measures put in place by these gatekeepers fall short of effective compliance of their obligations under the DMA. … The Commission has also adopted five retention orders addressed to Alphabet, Amazon, Apple, Meta, and Microsoft, asking them to retain documents which might be used to assess their compliance with the DMA obligations, so as to preserve available evidence and ensure effective enforcement.

European Commission

In reply to Creating a custom GPT to learn about my blog (and about myself) by Peter Rukavina

It’s not surprising that GPT-4 doesn’t work like a search engine and has a hard time surfacing factual statements from source texts. Like one of the commenters I wonder what that means for the data analysis you also asked for. Perhaps those too are merely plausible, but not actually analysed. Especially the day of the week thing, as that wasn’t in the data, and I wouldn’t expect GPT to determine all weekdays for posts in the process of answering your prompt.

I am interested in doing what you did, but then with 25 years of notes and annotations. And rather with a different model with less ethical issues attached. To have a chat about my interests and links between things. Unlike the fact based questions he’s asked the tool that doesn’t necessarily need it to be correct, just plausible enough to surface associations. Such associations might prompt my own thinking and my own searches working with the same material.

Also makes me think if what Wolfram Alpha is doing these days gets a play in your own use of GPT+, as they are all about interpreting questions and then giving the answer directly. There’s a difference between things that face the general public, and things that are internal or even personal tools, like yours.

Have you asked it things based more on association yet? Like “based on the posts ingested what would be likely new interests for Peter to explore” e.g.? Can you use it to create new associations, help you generate new ideas in line with your writing/interests/activities shown in the posts?

So my early experiments show me that as a data analysis copilot, a custom GPT is a very helpful guide… In terms of the GPT’s ability to “understand” me from my blog, though, I stand unimpressed.

Peter Rukavina

In 1967 French literary critic Roland Barthes declared the death of the author (in English, no less). An author’s intentions and biography are not the means to explain definitively what the meaning of a text (of fiction) is. It’s the reader that determines meaning.

Barthes reduces the author to merely a scriptor, a scribe, who doesn’t exist other than for their role of penning the text. It positions the work fully separate of its maker.

I don’t disagree with the notion that readers glean meaning in layers from a text, far beyond what an author might have intended. But thinking about the author’s intent, in light of their biography or not, is one of those layers for readers to interpret. It doesn’t make the author the sole decider on meaning, but the author’s perspective can be used to create meaning by any reader. Separating the author from their work entirely is cutting yourself of from one source of potential meaning. Even when reduced to the role of scribe, such meaning will leak forth: the monks of old who tagged the transcripts they made and turned those into Indexes that are a common way of interpreting on which topics a text touches or puts emphasis. So despite Barthes pronouncement, I never accepted the brain death of the author, yet also didn’t much care specifically about their existence for me to find meaning in texts either.

With the advent of texts made by generative AI I think bringing the author and their intentions in scope of creating meaning is necessary however. It is a necessity as proof of human creation. Being able to perceive the author behind a text, the entanglement of its creation with their live, is the now very much needed Reverse Turing test. With algorithmic text generation there is indeed only a scriptor, one incapable of conveying meaning themselves.
To determine the human origin of a text, the author’s own meaning, intention and existence must shine through in a text, or be its context made explicit. Because our default assumption must be that it was generated.

The author is being resurrected. Because we now have fully automated scriptors. Long live the author!

Bookmarked WordPress AI: Generative Content & Blocks (by Joe Hoyle, found via Chuck Grimmett)

As many others I am fascinated by what generative algorithms like ChatGPT for texts and Stable Diffusion for images can do. Particularly I find it fascinating to explore what it might do if embedded in my own workflows, or how it might change my workflows. So the link above showing an integration of ChatGPT in WordPress’ Gutenberg block editor drew my attention.

The accompanying video shows a mix of two features. First having ChatGPT generate some text, or actually a table with specific data, and having ChatGPT in ‘co-pilot’ style generate code for Gutenberg blocks. I think the latter might be actually useful, as I’ve seen generative AI put to good use in that area. The former, having ChatGPT write part of your posting is clearly not advisable. And the video shows it too, although the authors don’t point it out or haven’t reflected on the fact that ChatGPT is not a search engine but geared to coming up with plausible stuff without being aware of its actual information (the contrast with generating code is that code is much more highly structured in itself so probabilities collapse easier to the same outcome).

The blogpost in the video is made by generating a list of lunar missions, and then turning them into a table, adding their budgets and sorting them chronologically. This looks very cool in the vid, but some things jump out as not ok. Results jump around the table for instance: Apollo 13 moves from 1970 to 2013 and changes budget. See image below. None of the listed budgets for Apollo missions, nor their total, match up with the detailed costs overview of Apollo missions (GoogleDocs spreadsheet). The budget column being imaginary and the table rows jumping around makes the result entirely unfit for usage of course. It also isn’t a useful prompt: needing to fact check every table field is likely more effort and less motivating than researching the table yourself from actual online resources directly.

It looks incredibly cool ‘see me writing a blogpost by merely typing in my wishes, and the work being done instantly’, and there are definitely times I’d wish that to be possible. To translate a mere idea or thought into some output directly however means I’d skip confronting such an idea with reality, with counter arguments etc. Most of my ideas only look cool inside my head, and need serious change to be sensibly made manifest in the world outside my head. This video is a bit like that, an idea that looks cool in one’s head but is great rubbish in practice. ChatGPT is hallucinating factoids and can’t be trusted to create your output. Using it in the context of discovery (as opposed to the justification context of your output such as in this video) is possible and potentially useful. However this integration within the Gutenberg writing back-end of WordPress puts you in the output context directly so it leads you to believe the generated plausible rubbish is output and not just prompting fodder for your writing. ‘Human made’ is misleading you with this video, and I wouldn’t be surprised if they’re misleading themselves as well. A bit like staging the ‘saw someone in half and put them together again’ magician’s trick in an operating room and inviting surgeons to re-imagine their work.

Taking a native-first approach to integrating generative AI into WordPress, we’ve been experimenting with approaches to a “WordPress Copilot” that can “speak” Gutenberg / block-editor.

Copy-pasting paragraphs between ChatGPT and WordPress only goes so far, while having the tools directly embedded in the editor … open up a world of possibilities and productivity wins…

Joe Hoyle


An android robot is filling out a table listing Apollo missions on a whiteboard, generated image using Midjourney