ELLIS as the CERN for AI

I an open letter (PDF) a range of institutions call upon their respective European governments to create ELLIS, the European Lab for Learning and Intelligent Systems. It’s an effort to fortify against brain drain, and instead attract top talent to Europe. It points to the currently weak position in AI of Europe between what is happening in the USA and in China, adding a geo-political dimension. The letter calls not so much for an institution with a large headcount, but for commitment to long term funding to attract and keep the right people. These are similar reasons that led to the founding of CERN, now a global center for physics (and a key driver of things like open access to research and open research data), and more recently the European Molecular Biology Laboratory.

At the core the signatories see France and Germany as most likely to act to start this intra-governmental initiative. It seems this nicely builds upon the announcement by French president Macron late March to invest heavily in AI, and keep / attract the right people for it. He too definitely sees the European dimension to this, even puts European and enlightenment values at the core of it, although he acted within his primary scope of agency, France itself.

(via this Guardian article)

Time for an RSS Revival

Wired is calling for an RSS revival.

RSS is the most important piece of internet plumbing for following new content from a wide range of sources. It allows you to download new updates from your favourite sites automatically and read them at your leisure. Dave Winer, forever dedicated to the open web, created it.

I used to be a very heavy RSS user. I tracked hundreds of sources on a daily basis. Not as news but as a way to stay informed about the activities and thoughts of people I was interested in. At some point, that stopped working. Popular RSS readers were discontinued, most notably Google’s RSS reader, many people migrated to the Facebook timeline, platforms like Twitter stopped providing RSS feeds to make you visit their platform, and many people stopped blogging. But with FB in the spotlight, there is some interest in refocusing on the open web, and with it on RSS.

Currently I am repopulating from scratch my RSS reading ‘antenna’, following around 100 people again.

Wired in its call for an RSS revival suggests a few RSS readers. I, as I always have, use a desktop RSS reader, which currently is ReadKit. The FB timeline presents stuff to you based on their algorithmic decisions. As mentioned I definitely would like to have smarter ways of shaping my own information diet, but then with me in control and not the one being commoditised.

So it’s good to read that RSS Reader builders are looking at precisely that.
“Machines can have a big role in helping understand the information, so algorithms can be very useful, but for that they have to be transparent and the user has to feel in control. What’s missing today with the black-box algorithms is where they look over your shoulder, and don’t trust you to be able to tell what’s right.”,says Edwin Khodabakchian cofounder and CEO of RSS reader Feedly (which currently has 14 million users). That is more or less precisely my reasoning as well.

Suggested Reading: GDPR, Fintech, China and more

Some links I think worth reading today.

GDPR as De Facto Norm: Sonos Speakers

Just received an email from Sonos (the speaker system for streaming) about the changes they are making to their privacy statement. Like with FB in my previous posting this is triggered by the GDPR starting to be enforced from the end of May.

The mail reads in part

We’ve made these changes to comply with the high demands made by the GDPR, a law adopted in the European Union. Because we think that all owners of Sonos equipment deserve these protections, we are implementing these changes globally.

This is precisely the hoped for effect, I think. Setting high standards in a key market will lift those standards globally. It is usually more efficient to internally work according to one standard, than maintaining two or more in parallel. Good to see it happening, as it is a starting point for the positioning of Europe as a distinct player in global data politics, with ethics by design as the distinctive proposition. GDPR isn’t written as a source of red tape and compliance costs, but to level the playing field and enable companies to compete by building on data protection compliance (by demanding ‘data protection by design’ and following ‘state of the art’, which are both rising thresholds). Non-compliance in turn is becoming the more costly option (if GDPR really gets enforced, that is).

Facebook GDPR Changes Unimpressive

It seems, from a preview for journalists, that the GDPR changes that Facebook will be making to its privacy controls, and especially the data controls a user has, are rather unimpressive. I had hoped that with the new option to select ranges of your data for download, you would also be able to delete specific ranges of data. This would be a welcome change as current options are only deleting every single data item by hand, or deleting everything by deleting your account. Under the GDPR I had expected more control over data on FB.

It also seems they still keep the design imbalanced, favouring ‘let us do anything’ as the simplest route for users to click through, and presenting other options very low key, and the account deletion option still not directly accessible in your settings.

They may or may not be deemed to have done enough towards implementing GDPR by the data protection authorities in the EU after May 25th, but that’s of little use to anyone now.

So my intention to delete my FB history still means the full deletion of my account. Which will be effective end of this week, when the 14 day grace period ends.

Available Energy Data in The Netherlands

Which energy data is available as open data in the Netherlands, asked Peter Rukavina. He wrote about postal codes on Prince Edward Island where he lives, and in the comments I mentioned that postal codes can be used to provide granular data on e.g. energy consumption, while still aggregated enough to not disclose personally identifiable data. This as I know he is interested in energy usage and production data.

He then asked:

What kind of energy consumption data do you have at a postal code level in NL? Are your energy utilities public bodies?
Our electricity provider, and our oil and propane companies are all private, and do not release consumption data; our water utility is public, but doesn’t release consumption data and is not subject (yet) to freedom of information laws.

Let’s provide some answers.

Postal codes

Dutch postal codes have the structure ‘1234 AB’, where 12 denotes a region, 1234 denotes a village or neighbourhood, and AB a street or a section of a street. This makes them very useful as geographic references in working with data. Our postal code begins with 3825, which places it in the Vathorst neighbourhood, as shown on this list. In the image below you see the postal code 3825 demarcated on Google maps.

Postal codes are both commercially available as well as open data. Commercially available is a full set. Available as open data are only those postal codes that are connected to addresses tied to physical buildings. This as the base register of all buildings and addresses are open data in the Netherlands, and that register includes postal codes. It means that e.g. postal codes tied to P.O. Boxes are not available as open data. In practice getting at postal codes as open data is still hard, as you need to extract them from the base register, and finding that base register for download is actually hard (or at least used to be, I haven’t checked back recently).

On Energy Utilities

All energy utilities used to be publicly owned, but have since been privatised. Upon privatisation all utilities were separated into energy providers and energy transporters, called network maintainers. The network maintainers are private entities, but are publicly owned. They maintain both electricity mains as well as gas mains. There are 7 such network maintainers of varying sizes in the Netherlands

(Source: Energielevernanciers.nl

The three biggest are Liander, Enexis and Stedin.
These network maintainers, although publicly owned, are not subject to Freedom of Information requests, nor subject to the law on Re-use of Government Information. Yet they do publish open data, and are open to data requests. Liander was the first one, and Enexis and Stedin both followed. The motivation for this is that they have a key role in the government goal of achieving full energy transition by 2050 (meaning no usage of gas for heating/cooking and fully CO2 neutral), and that they are key stakeholders in this area of high public interest.

Household Energy Usage Data

Open data is published by Liander, Enexis and Stedin, though not all publish the same type of data. All publish household level energy usage data aggregated to the level of 6 position postal codes (1234 AB), in addition to asset data (including sub soil cables etc) by Enexis and Stedin. The service areas of all 7 network maintainers are also open data. The network maintainers are also all open to additional data requests, e.g. for research purposes or for municipalities or housing associations looking for data to pan for energy saving projects. Liander indicated to me in a review for the European Commission (about potential changes to the EU public data re-use regulations), that they currently deny about 2/3 of data requests received, mostly because they are uncertain about which rules and contracts apply (they hold a large pool of data contributed by various stakeholders in the field, as well as all remotely read digital metering data). They are investigating how to improve on that respons rate.

Some postal code areas are small and contain only a few addresses. In such cases this may lead to personally identifiable data, which is not allowed. Liander, Stedin and I assume Enexis as well, solve this by aggregating the average energy usage of the small area with an adjacent area until the number of addresses is at least 10.

Our address falls in the service area of Stedin. The most recent data is that of January 1st 2018, containing the energy use for all of 2017. Searching for our postal code (which covers the entire street) in their most recent CSV file yields on lines 151.624 and 625:

click for full sizeclick to enlarge

The first line shows electricity usage (ELK), and says there are 33 households in the street, and the avarage yearly usage is 4599kWh. (We are below that at around 3700kWh / year, which is higher than we were used to in our previous home). The next line provides the data for gas usage (heating and cooking) “GAS”, which is 1280 m3 on average for the 33 connections. (We are slightly below that at 1200 m3).

SmugMug Buys Flickr, End of the Yahoo Era

I’ve been using Flickr to store photos since March 2005. It’s at the same time an easy way to embed photos in my blog without using up storage space in the hosting account, and an online remote back-up. Over the years I’ve uploaded some 24.000 photos, though I’ve been using Flickr less in the last 2 years.

My account is from just before the moment Yahoo bought Flickr from its founders, which was also in March 2005, and it forced me to create a Yahoo account for it in 2007. Yahoo never seemed to have much vision for Flickr, but as an early user (Flickrs was founded in 2004) the original functionality I signed up and paid for was all I really needed.

Yahoo has been bought by Verizon last year, and since then it was likely they’d sell some parts of it. SmugMug has acquired Flickr last week, and that at least means that photography is now the main focus again. That hopefully means further evolution of Flickr, or it might mean a switch to SmugMug in the future.

Tellingly one needs to accept the new terms of service by 25th May 2018, which is the day the EU data protection regulation GDPR enters into force.

It also means that I will be able to delete my Yahoo account, which I only had because Flickr users were forced to.
Yahoo is an internet dinosaur, launched in 1994. Its best days already lie way back. Deleting my Yahoo account as such is also an end of an era, an end that felt long overdue for years already.

Backdoors and Futile Stamping

Russia is trying to block Telegram, an end-to-end encrypted messaging app. The reason for blocking is that Telegram refused to provide keys to the authorities with which messages can be decrypted. Not for a specific case, but for listening into general traffic.

Asking for keys (even if technologically possible), to have a general backdoor is a very bad idea. It will always be misused by others. And yes, you do have something to hide. Your internet banking is encrypted, your VPN connection from home to your work computer is too. You use passwords on websites, mail accounts and your wifi. If you don’t have anything to hide, please leave your Facebook login details along with your banking details in the comments. I promise I won’t use them. The point isn’t whether I or government keep our promises (and I or government might not), it’s that others definitely won’t.

As a result of Telegram not providing the keys, Russia is now trying to block people from using it. This results in millions of IP addresses now being blocked, more than 1 IP address per the around 14 million users of Telegram in Russia. (Telegram reports about 200 million users globally per month). Because the service partly runs on servers of Amazon and Google data centers, and those are getting blocked. This impacts other services as well, who use the same data centers to flexibly scale their computing needs. The blocking attempts aren’t working though.

It shows how fully distributed systems are hard to stamp out, it will merely pop up somewhere else. The internet routes around damages, it is what it was designed to do.

Let’s see if actions will now be taken by Russian authorities against persons and assets of Telegram, as that really is the only (potential, not garantueed,) way to stamp out something: dismantling it. In the case of Telegram, a private company, there are indeed people and assets one could target. And Telegram is pledging to deploy those assets in resisting. Yet dismantling Telegram, even if successful and disregarding other costs and consequences for a government, defeats the original purpose of wanting to listen in to message traffic. Traffic will easily move into other encrypted tools, like Signal, while new even more distributed applications will also emerge in response.

Summary:

  • General backdoors, bad idea, regardless of whether you can trust the one you give back door access to.
  • Blocking is hard to do with distributed systems.
  • If you don’t accept attempts to do either from data driven authoritarian governments, you need to accept the same objections to general back door access apply to other situations where you think the stated aim has more merit.
  • Do use an encrypted messaging app, like Signal, as much as possible

Data Worlds, to Understand the Politics of Data

Jonathan Gray has published an article on Data Worlds, as a way to better understand and experiment with the consequences of the datafication of our lives. The article appeared in Krisis, an open access journal for contemporary philisophy, in its latest edition dealing with Data Activism.

Jonathan Gray writes

The notion of data worlds is intended to make space for thinking about data as more than simply a representational resource, and the politics of data as more than a matter of liberation and protection. It is intended to encourage exploration of the performative capacities of data infrastructures: what they do and could do differently, and how they are done and could be done differently. This includes consideration of, as Geoffrey Bowker puts it, “the ways in which our social, cultural and political values are braided into the wires, coded into the applications and built into the databases which are so much a part of our daily lives”

He describes 3 ‘data worlds’, and positions them as an instrument intended for practical usage.

The three aspects of data worlds which I examine below are not intended to be comprehensive, but illustrative of what is involved in data infrastructures, what they do, and how they are put to work. As I shall return to in the conclusion, this outline is intended to open up space for not only thinking about data differently, but also doing things with data differently. The test of these three aspects is therefore not only their analytical purchase, but also their practical utility.

Those 3 worlds mentioned are

  1. Data Worlds as Horizons of Intelligibility, where data is plays a role in changing what is sayable, knowable, intelligible and experienceable , where data allows us to explore new perspectives, arrive at new insights or even new overall understanding. Hans Rosling’s work with Gapminder falls in this space, and datavisualisations that combine time and geography. To me this feels like approaching what John Thackara calls Macroscopes, where one finds a way to understand complete systems and one’s own place and role in it, and not just the position of oneself. (a posting on Macroscopes will be coming)
  2. Data Worlds as Collective Accomplishments, where consequences (political, social, economic) result from not just one or a limited number of actors, but from a wide variety of them. Open data ecosystems and the shifts in how civil society, citizens and governments interact, but also big data efforts by the tech industry are examples Gray cites. “Looking at data worlds as collective accomplishments includes recognising the role of actors whose contributions may otherwise be under-recognised.
  3. Data Worlds as Transnational Coordination, in terms of networks, international institutions and norm setting, which aim to “shape the world through coordination of data“. In this context one can think of things like IATI, a civic initiative bringing standardisation and transparency to international aid globally, but also the GDPR through which the EU sets a new de-facto global standard on data protection.

This seems at first reading like a useful thinking tool in exploring the consequences and potential of various values and ethics related design choices.

(Disclosure: Jonathan Gray and I wore both active in the early European open data community, and are co-authors of the first edition/iteration of the Open Data Handbook in 2010)

Macron’s 1.5 Billion for Values, Data and AI

Data, especially lots of it, is the feedstock of machine learning and algorithms. And there’s a race on for who will lead in these fields. This gives it a geopolitical dimension, and makes data a key strategic resource of nations. In between the vast data lakes in corporate silos in the US and the national data spaces geared towards data driven authoritarianism like in China, what is the European answer, what is the proposition Europe can make the world? Ethics based AI. “Enlightenment Inside”.

French President Macron announced spending 1.5 billion in the coming years on AI last month. Wired published an interview with Macron. Below is an extended quote of I think key statements.

AI will raise a lot of issues in ethics, in politics, it will question our democracy and our collective preferences……It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution…..Europe has not exactly the same collective preferences as US or China. If we want to defend our way to deal with privacy, our collective preference for individual freedom versus technological progress, integrity of human beings and human DNA, if you want to manage your own choice of society, your choice of civilization, you have to be able to be an acting part of this AI revolution . That’s the condition of having a say in designing and defining the rules of AI. That is one of the main reasons why I want to be part of this revolution and even to be one of its leaders. I want to frame the discussion at a global scale….The key driver should not only be technological progress, but human progress. This is a huge issue. I do believe that Europe is a place where we are able to assert collective preferences and articulate them with universal values.

Macron’s actions are largely based on the report by French MP and Fields Medal winning mathematician Cédric Villani, For a Meaningful Artificial Intelligence (PDF)