Elizabeth Renieris and Dazza Greenwood give different words to my previously expressed concerns about the narrative frame of personal ownership of data and selling it as a tool to counteract the data krakens like Facebook. The key difference is in tying it to different regulatory frameworks, and when each of those comes into play. Property law versus human rights law.

I feel the human rights angle also will serve us better in coming to terms with the geopolitical character of data (and one that the EU is baking into its geopolitical proposition concerning data). In the final paragraph they point to the ‘basic social compact’ that needs explicit support. That I connect to my notion of how so much personal data is also more like communal data, not immediately created or left by me as an individual, but the traces I leave acting in public. At Techfestival Aza Raskin pointed to fiduciary roles for those holding data on those publicly left personal data traces, and Martin von Haller mentioned how those personal data traces also can serve communal purposes and create communal value, placing it in yet another legal setting (that of weighing privacy versus public interest)

Read Do we really want to “sell” ourselves? The risks of a property law paradigm for personal data ownership. (Medium)

….viewing this data as property that is capable of being bought, sold, and owned by others is in large part how we ended up with a broken internet funded by advertising — or the “ad tech model” of the Internet. A property law-based, ownership model of our data risks extending this broken ad tech model of the Internet to all other facets of our digital identity and digital lives expressed through data. While new technology solutions are emerging to address the use of our data online, the threat is not solved with technology alone. Rather, it is time for our attitudes and legal frameworks to catch up. The basic social compact should be explicitly supported and reflected by our business models, legal frameworks and technology architectures, not silently eroded and replaced by them.

Data, especially lots of it, is the feedstock of machine learning and algorithms. And there’s a race on for who will lead in these fields. This gives it a geopolitical dimension, and makes data a key strategic resource of nations. In between the vast data lakes in corporate silos in the US and the national data spaces geared towards data driven authoritarianism like in China, what is the European answer, what is the proposition Europe can make the world? Ethics based AI. “Enlightenment Inside”.

French President Macron announced spending 1.5 billion in the coming years on AI last month. Wired published an interview with Macron. Below is an extended quote of I think key statements.

AI will raise a lot of issues in ethics, in politics, it will question our democracy and our collective preferences……It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution…..Europe has not exactly the same collective preferences as US or China. If we want to defend our way to deal with privacy, our collective preference for individual freedom versus technological progress, integrity of human beings and human DNA, if you want to manage your own choice of society, your choice of civilization, you have to be able to be an acting part of this AI revolution . That’s the condition of having a say in designing and defining the rules of AI. That is one of the main reasons why I want to be part of this revolution and even to be one of its leaders. I want to frame the discussion at a global scale….The key driver should not only be technological progress, but human progress. This is a huge issue. I do believe that Europe is a place where we are able to assert collective preferences and articulate them with universal values.

Macron’s actions are largely based on the report by French MP and Fields Medal winning mathematician Cédric Villani, For a Meaningful Artificial Intelligence (PDF)

My current thinking about what to bring to my open data and data governance work, as well as to technology development, especially in the context of networked agency, can be summarised under the moniker ‘ethics by design’. In a practical sense this means setting non-functional requirements at the start of a design or development process, or when tweaking or altering existing systems and processes. Non-functional requirements that reflect the values you want to safeguard or ensure, or potential negative consequences you want to mitigate. Privacy, power asymmetries, individual autonomy, equality, and democratic control are examples of this.

Today I attended the ‘Big Data Festival’ in The Hague, organised by the Dutch Ministry of Infrastructure and Water Management. Here several government organisations presented themselves and the work they do using data as an intensive resource. Stuff that speaks to the technologist in me. In parallel there were various presentations and workshops, and there I was most interested in what was said about ethical issues around data.

Author and interviewer Bas Heijne set the scene at the start by pointing to the contrast between the technology optimism concerning digitisation of years back and the more dystopian discussion (triggered by things like the Cambridge Analytica scandal and cyberwars), and sought the balance in the middle. I think that contrast is largely due to the difference in assumptions underneath the utopian and dystopian views. The techno-optimist perspective, at least in the webscene I frequented in the late 90’s and early 00’s assumed the tools would be in the hands of individuals, who would independently weave the world wide web, smart at the edges and dumb at the center. The dystopian views, including those of early criticaster like Aron Lanier, assumed, and were proven at least partly right, a centralisation into walled gardens where individuals are mere passive users or an object, and no longer a subject with autonomy. This introduces wildly different development paths concerning power distribution, equality and agency.

In the afternoon a session with professor Jeroen van den Hoven, of Delft University, focused on making the ethical challenges more tangible as well as pointed to the beginnings of practical ways to address them. It was the second time I heard him present in a month. A few weeks ago I attended an Ethics and Internet of Things workshop at University of Twente, organised by UNESCO World Commission on the Ethics of Science and Technology (COMEST). There he gave a very worthwile presentation as well.


Van den Hoven “if we don’t design for our values…”

What I call ethics by design, a term I first heard from prof Valerie Frissen, Van den Hoven calls value sensitive design. That term sounds more pragmatic but I feel conveys the point less strongly. This time he also incorporated the geopolitical aspects of data governance, which echoed what Rob van Kranenburg (IoT Council, Next Generation Internet) presented at that workshop last month (and which I really should write down separately). It was good to hear it reinforced for today’s audience of mainly civil servants, as currently there is a certain level of naivety involved in how (mainly local governments) collaborate with commercial partners around data collection and e.g. sensors in the public space.

(Malfunctioning) billboard at Utrecht Central Station a few days ago, with not thought through camera in a public space (to measure engagement with adverts). Civic resistance taped over the camera.

Value sensitive design, said Van den Hoven, should seek to combine the power of technology with the ethical values, into services and products. Instead of treating it as a dilemma with an either/or choice, which is the usual way it is framed: Social networking OR privacy, security OR privacy, surveillance capitalism OR personal autonomy, smart cities OR human messiness and serendipity. In value sensitive design it is about ensuring the individual is still a subject in the philosophical sense, and not merely the object on which data based services feed. By addressing both values and technological benefits as the same design challenge (security AND privacy, etc.), one creates a path for responsible innovation.

The audience saw both responsibilities for individual citizens as well as governments in building that path, and none thought turning one’s back on technology to fictitious simpler times would work, although some were doubtful if there was still room to stem the tide.

Students from a minor ‘big data’ at the local university of applied sciences presented their projects a few weeks ago. As I had done a session with them on open data as a guest lecturer, I was invited to the final presentations. From those presentations in combination several things stood out for me. Things that I later repeated to a different group of students at the Leeuwarden university of applied sciences at the begining of their week of working on local open data projects for them to avoid. I thought I’d share them here too.

The projects students created
First of all let me quickly go through the presented projects. They were varied in types of data used, and types of issues to address:

  • A platform consulting Lithuanian businesses to target other EU markets, using migration patterns and socio-economic and market data
  • A route planner comparing car and train trips
  • A map combining buildings and address data with income per neighborhood from the statistics office to base investment decisions on
  • A project data mining Riot Games online game servers to help live-tweak game environments
  • A project combining retail data from Schiphol Airport with various other data streams (weather, delays, road traffic, social media traffic) to find patterns and interventions to increase sales
  • A project using the IMDB moviedatabase and ratings to predict whether a given team and genre have a chance of success

Patterns across the projects
Some of these projects were much better presented than others, others were more savvy in their data use. Several things stood out:

1) If you make an ‘easy’ decision on your data source it will hurt you further down your development path.

2) If you want to do ‘big data’ be really prepared to struggle with it to understand the potential and limitations

To illustrate both those points:
The Dutch national building and address database is large and complicated, so a team had opted to use the ‘easier’ processed data set released by a geodata company. Later they realized that the ‘easier’ dataset was updated only twice per year (the actual source being updated monthly), and that they needed a different coordinates system (present in the source, not in the processed data) to combine it with the data from the statistical office.

Similarly the route planner shied away from using the open realtime database on motorway traffic density and speed, opting for a derivative data source on traffic jams and then complaining that came in a format they couldn’t really re-use and did not cover all the roads they wanted to cover.
That same project used Google Maps, which is a closed data source, whereas a more detailed and fully open map is available. Google Maps comes with neat pre-configured options and services but in this case they were a hindrance, because they do not allow anything outside of it.

3) You must articulate and test your own assumptions

4) Correlation is not causation (duh!)

The output you get from working with your data is colored by the assumptions you build into your queries. Yes average neighbourhood income can likely be a predictor for certain investment decisions, but is there any indication that is the case for your type of investment, in this country? Is entering the Swedish market different for a Lithuanian company from let’s say a Greek one? What does it say about the usefulness of your datasource?

Data will tell you what happened, but not why. If airport sales of alcohol spike whenever a flight to Russia arrives or leaves (actual data pattern) can that really be attributed to the 2-300 people on that plane, or are other factors at work that may not be part of your data (intercontinental flights for instance that have roughly the same flight schedule but are not in the data set)?

Are you playing around enough with the timeline of your data, to detect e.g. seasonal patterns (like we see in big city crime), zooming out and zooming in enough, to notice that what seems a trend maybe isn’t.

5) Test your predictions, use your big data on yourself

The ‘big’ part of big data is that you are not dealing with a snapshot or a small subset (N= is a few) but with a complete timeline of the full data set (N = all). This means you can and need to test your model / algorithm / great idea on your own big data. If you think you can predict the potential of a movie, given genre and team, then test it with a movie from 2014 where you know the results (as they’re in your own dataset) on the database from before 2014 and see if your algorithm works. Did Lithuanian companies that already have entered the Swedish market fail or flourish in line with your data set? Did known past interventions into the retail experience have the impact your data patterns suggest they should?

6) Your data may be big, but does it contain what you need?

One thing I notice with government data is that most data is about what government knows (number of x, maps, locations of things, environmental measurements etc), and much less about what government does (decisions made, permits given, interventions made in any policy area). Often those are not available at all in data form but hidden somewhere in wordy meeting minutes or project plans. Financial data on spending and procurement is what comes closest to this.

Does your big data contain the things that tell what various actors around the problem you try to solve did to cause the patterns you spot in the data? The actual transactions of liquor stores connected to Russian flight’s boarding passes? The marketing decisions and their reasons for the Schiphol liquor stores? The actions of Lithuanian companies that tried different EU markets and failed or succeeded?

Issue-driven, not data-driven, and willing to do the hard bits
It was fun to work with these students, and there are a range of other things that come into play. Technical savviness, statistical skills, a real understanding of what problem you are trying to solve. It’s tempting to be data-driven, not issue-driven even if in the end that brings more value. With the former the data you have is always the right data, but with the latter you must acknowledge the limitations of your data and your own understanding.

Like I mentioned I used these lessons in a session for a different group of students in a different city, Leeuwarden. There a group worked for a week on data-related projects to support the city’s role as cultural capital of Europe in 2018. The two winning teams there both stood out because they had focussed very much on specific groups of people (international students in Leeuwarden, and elderly visitors to the city), and really tried to design solutions starting with the intended user at the center. That user-centered thinking really turned out to be the hardest part. Especially if you already have a list of available data sets in front of you. Most of the teacher’s time was spent on getting the students to match the datasets to use cases, and not the other way around.