The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked Latticework: Unifying annotation and freeform text editing for augmented sensemaking by Matthew Siu and Andy Matuschak

Back in early February I got a chance to work with a beta tool for sense making in my notes. See my impressions at the time. Matthew Siu and Andy Matuschak watched me for an hour as I used their prototype tool to start shaping a workshop design from various inputs. I was intrigued and enthusiastic, but a few weeks later due to some tech glitches I stopped using it. Today Maarten den Braber in an e-mail pointed me to Latticework from last June, describing the project as it stood at the end. It’s an interesting read, which I annotated (if you read those annotations, start at bottom of the page to read them from the top of the article (or use Hypothes.is to see them in context,there’s no way to link to the overview directly for non-users I think).

I re-installed the plugin in Obsidian, and will work with it some more. Here’s hoping some of the original glitches no longer occur.

We had a strong personal motivation for this project: we often find ourselves stuck in our own creative work. Latticework’s links might make you think of citations and primary sources—tools for finding the truth in a rigorous research process. But our work on Latticework was mostly driven by the problems of getting emotionally stuck, of feeling disconnected from our framing of the project or our work on it.

Matthew Siu and Andy Matuschak

Bookmarked Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence

Finalised in June, the AI Act (EU 2024/1689) was published yesterday 12-07-2024 and will enter into force after 20 days, on 02-08-2024. Generally the law will be applicable after 2 years, on 02-08-2026, with. a few exceptions:

  • The rules on banned practices (Chapter 2) will become applicable in 6 months, on 02-02-2025, as will the general provisions (Chapter 1)
  • Parts such as the chapter on notified bodies, general purpose AI models (Chapter 5), governance (Chapter 7), penalties (Chapter 12), will become applicable in a year, on 02-08-2025
  • Article 6 in Chapter 3, on the classification rules for high risk AI applications, will apply in 3 years, from 02-02-2027

The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety, fundamental rights as enshrined in the Charter of Fundamental Rights of the European Union (the ‘Charter’), including democracy, the rule of law and environmental protection, to protect against the harmful effects of AI systems in the Union, and to support innovation. This Regulation ensures the free movement, cross-border, of AI-based goods and services, thus preventing Member States from imposing restrictions on the development, marketing and use of AI systems, unless explicitly authorised by this Regulation.

Juni is een goede maand voor open data dit jaar.

Ten eerste keurde vorige week dinsdag 4 juni de Eerste Kamer de wet goed die de Europese open data richtlijn implementeert in de Nederlandse Wet Hergebruik Overheidsinformatie. Al is de wet nog niet gepubliceerd en dus nog niet van kracht komt daarmee een einde aan drie jaar vertraging. De wet had al per juli 2021 in moeten gaan. De Europese richtlijn ging namelijk in juli 2019 in en gaf Lidstaten twee jaar de tijd voor omzetting in nationale wetgeving.

Ten tweede ging afgelopen zondag 9 juni de verplichting voor het actief publiceren door overheden via API’s van belangrijke data op zes thema’s in. Die Europese verordening werd eind 2022 aanvaard, werd begin februari 2023 van kracht, en gaf overheden 16 maanden d.w.z. tot zondag om er aan te voldoen. De eerste rapportage over de implementatie moeten Lidstaten in februari 2025 doen, dus ik neem aan dat veel landen die periode nog gebruiken om aan de verplichtingen te voldoen. Maar het begin is er. In Nederland is de impact van deze High Value Data verordening relatief gering, want het merendeel van de data die er onder valt was hier al open. Tegelijkertijd was dat in andere EU landen niet altijd het geval. Nu kun je dus Europees dekkende datasets samenstellen.

How could I not buy these small notebooks? Made by my friend Peter from paper cut-offs from boxes he made and printed in Tuscany, they are made from Magnani 1404 paper. Magnani started making paper in Pescia in 1404 (they ceased operation in recent years, but another Magnani is still making paper, since 1481), right at the moment in time that the literate population of Tuscany started using paper notebooks to make everyday notes, and lots of them. Paper had become affordable and available enough roughly a century earlier, with Tuscany being at the heart of that, and Florentine merchants used their book keeping system and the paper notebooks needed for it to build a continent spanning trade network. After the Black Death personal note taking took off too, and from 1400 onwards it had become commonplace:

At the end of the Middle Ages, urban Tuscans seemed stricken with a writing fever, a desire to note down everything they saw.’ But they remained a peculiarly local phenomenon: there was something uniquely Florentine (or more accurately ‘Tuscan’ as examples also survive from Siena and Lucca) about them,…

Allen, Roland. The Notebook: A History of Thinking on Paper (p. 61).”

Around the turn of the year I gave The Notebook as a present to Peter thinking it would be something to his liking. My own notes have helped me learn and work for decades. E and I when we lived in Lucca for a month, passed through Pescia by train en route to Firenze.

Tuscany, paper from a company that was there from the start of everyday note taking, The Notebook, personal knowledge management, and friendship, all coming together in this piece of craftsmanship. How could I not buy them? So I did.