Category Archives: data governance

US Press Admits Incompetence

Today is the day that enforcement of the GDPR, the new European data protection regulation starts. A novel part of the GDPR is that the rights of the individual described by the data follows the data. So if a US company collects my data, they are subject to the GDPR.

Compliance with the GDPR is pretty common sense, and not all that far from the data protection regulations that went before. You need to know which data you collect, have a proper reason why you collect it, have determined how long you keep data, and have protections in place to mitigate the risks of data exposure. On top of that you need to be able to demonstrate those points, and people described by your data have rights (to see what you know about them, to correct things or have data deleted, to export their data).

Compliance can be complicated if you don’t have your house fully in order, and need to do a lot of corrective steps to figure out what data you have, why you have it, whether it should be deleted and whether your protection measures are adequate enough.

That is why when the law entered into force on May 4th 2016, 2 years ago, a transition period was created in which no enforcement would take place. Those 2 years gave companies ample time to reach compliance, if they already weren’t.

The GDPR sets a de facto global norm and standard, as EU citizens data always falls under the GDPR, regardless where the data is located. US companies therefore need to comply as well when they have data about European people.

Today at the start of GDPR enforcement it turns out many US press outlets have not put the transition period to good use, although they have reported on the GDPR. They now block European IP addresses, while they ‘look at options’ to be available again to EU audiences.

From the east coast

to the west coast

In both cases the problem likely is how to deal with the 15 or so trackers those sites have that collect visitor data.

The LA Times for instance have previously reported on the GDPR, so they knew it existed.

A few days ago they asked their readers “Is your company ready?”, and last month they asked if the GDPR will help US citizens with their own privacy.

The LA Times own answers to that at the moment are “No” and “Not if you’re reading our newspaper”.

US May Restart Charging For Satellite Data 2019

The US government is looking at whether to start asking money again for providing satellite imagery and data from Landsat satellites, according to an article in Nature.

Officials at the Department of the Interior, which oversees the USGS, have asked a federal advisory committee to explore how putting a price on Landsat data might affect scientists and other users; the panel’s analysis is due later this year. And the USDA is contemplating a plan to institute fees for its data as early as 2019.

To “explore how putting a price on Landsat data might affect” the users of the data, will result in predictable answers, I feel.

  • Public digital government held data, such as Landsat imagery, is both non-rivalrous and non-exclusionary.
  • The initial production costs of such data may be very high, and surely is in the case of satellite data as it involves space launches. Yet these costs are made in the execution of a public and mandated task, and as such are sunk costs. These costs are not made so others can re-use the data, but made anyway for an internal task (such as national security in this case).
  • The copying costs and distribution costs of additional copies of such digital data is marginal, tending to zero
  • Government held data usually, and certainly in the case of satellite data, constitute a (near) monopoly, with no easily available alternatives. As a consequence price elasticity is above 1: when the price of such data is reduced, the demand for it will rise non-lineary. The inverse is also true: setting a price for government data that currently is free will not mean all current users will pay, it will mean a disproportionate part of current usage will simply evaporate, and the usage will be much less both in terms of numbers of users as well as of volume of usage per user.
  • Data sales from one public entity to another publicly funded one, such as in this case academic institutions, are always a net loss to the public sector, due to administration costs, transaction costs and enforcement costs. It moves money from one pocket to another of the same outfit, but that transfer costs money itself.
  • The (socio-economic) value of re-use of such data is always higher than the possible revenue of selling that data. That value will also accrue to the public sector in the form of additional tax revenue. Loss of revenue from data sales will always over time become smaller than that. Free provision or at most at marginal costs (the true incremental cost of providing the data to one single additional user) is economically the only logical path.
  • Additionally the value of data re-use is not limited to the first order of re-use (in this case e.g. academic research it enables), but knows “downstream” higher order and network effects. E.g. the value that such academic research results create in society, in this case for instance in agriculture, public health and climatic impact mitigation. Also “upstream” value is derived from re-use, e.g. in the form of data quality improvement.

This precisely was why the data was made free in 2008 in the first place:

Since the USGS made the data freely available, the rate at which users download it has jumped 100-fold. The images have enabled groundbreaking studies of changes in forests, surface water, and cities, among other topics. Searching Google Scholar for “Landsat” turns up nearly 100,000 papers published since 2008.

That 100-fold jump in usage? That’s the price elasticity being higher than 1, I mentioned. It is a regularly occurring pattern where fees for data are dropped, whether it concerns statistics, meteo, hydrological, cadastral, business register or indeed satellite data.

The economic benefit of the free Landsat data was estimated by the USGS in 2013 at $2 billion per year, while the programme costs about $80 million per year. That’s an ROI factor for US Government of 25. If the total combined tax burden (payroll, sales/VAT, income, profit, dividend etc) on that economic benefit would only be as low as 4% it still means it’s no loss to the US government.

It’s not surprising then, when previously in 2012 a committee was asked to look into reinstating fees for Landsat data, it concluded

“Landsat benefits far outweigh the cost”. Charging money for the satellite data would waste money, stifle science and innovation, and hamper the government’s ability to monitor national security, the panel added. “It is in the U.S. national interest to fund and distribute Landsat data to the public without cost now and in the future,”

European satellite data open by design

In contrast the European Space Agency’s Copernicus program which is a multiyear effort to launch a range of Sentinel satellites for earth observation, is designed to provide free and open data. In fact my company, together with EARSC, in the past 2 years and in the coming 3 years will document over 25 cases establishing the socio-economic impact of the usage of this data, to show both primary and network effects, such as for instance for ice breakers in Finnish waters, Swedish forestry management, Danish precision farming and Dutch gas mains preventative maintenance and infrastructure subsidence.

(Nature article found via Tuula Packalen)

Twitter Not GDPR Compliant (nor Flickr, nor ….)

Many tech companies are rushing to arrange compliance with GDPR, Europe’s new data protection regulations. What I have seen landing in my inbox thus far is not encouraging. Like with Facebook, other platforms clearly struggle, or hope to get away, with partially or completely ignoring the concepts of informed consent and unforced consent and proving consent. One would suspect the latter as Facebooks removal of 1.5 billion users from EU jurisdiction, is a clear step to reduce potential exposure.

Where consent by the data subject is the basis for data collection: Informed consent means consent needs to be explicitly given for each specific use of person related data, based on a for laymen clear explanation of the reason for collecting the data and how precisely it will be used.
Unforced means consent cannot be tied to core services of the controlling/processing company when that data isn’t necessary to perform a service. In other words “if you don’t like it, delete your account” is forced consent. Otherwise, the right to revoke one or several consents given becomes impossible.
Additionally, a company needs to be able to show that consent has been given, where consent is claimed as the basis for data collection.

Instead I got this email from Twitter earlier today:

“We encourage you to read both documents in full, and to contact us as described in our Privacy Policy if you have questions.”

and then

followed by

You can also choose to deactivate your Twitter account.

The first two bits mean consent is not informed and that it’s not even explicit consent, but merely assumed consent. The last bit means it is forced. On top of it Twitter will not be able to show content was given (as it is merely assumed from using their service). That’s not how this is meant to work. Non-compliant in other words. (IANAL though)

GDPR as De Facto Norm: Sonos Speakers

Just received an email from Sonos (the speaker system for streaming) about the changes they are making to their privacy statement. Like with FB in my previous posting this is triggered by the GDPR starting to be enforced from the end of May.

The mail reads in part

We’ve made these changes to comply with the high demands made by the GDPR, a law adopted in the European Union. Because we think that all owners of Sonos equipment deserve these protections, we are implementing these changes globally.

This is precisely the hoped for effect, I think. Setting high standards in a key market will lift those standards globally. It is usually more efficient to internally work according to one standard, than maintaining two or more in parallel. Good to see it happening, as it is a starting point for the positioning of Europe as a distinct player in global data politics, with ethics by design as the distinctive proposition. GDPR isn’t written as a source of red tape and compliance costs, but to level the playing field and enable companies to compete by building on data protection compliance (by demanding ‘data protection by design’ and following ‘state of the art’, which are both rising thresholds). Non-compliance in turn is becoming the more costly option (if GDPR really gets enforced, that is).

Facebook GDPR Changes Unimpressive

It seems, from a preview for journalists, that the GDPR changes that Facebook will be making to its privacy controls, and especially the data controls a user has, are rather unimpressive. I had hoped that with the new option to select ranges of your data for download, you would also be able to delete specific ranges of data. This would be a welcome change as current options are only deleting every single data item by hand, or deleting everything by deleting your account. Under the GDPR I had expected more control over data on FB.

It also seems they still keep the design imbalanced, favouring ‘let us do anything’ as the simplest route for users to click through, and presenting other options very low key, and the account deletion option still not directly accessible in your settings.

They may or may not be deemed to have done enough towards implementing GDPR by the data protection authorities in the EU after May 25th, but that’s of little use to anyone now.

So my intention to delete my FB history still means the full deletion of my account. Which will be effective end of this week, when the 14 day grace period ends.

Data Worlds, to Understand the Politics of Data

Jonathan Gray has published an article on Data Worlds, as a way to better understand and experiment with the consequences of the datafication of our lives. The article appeared in Krisis, an open access journal for contemporary philisophy, in its latest edition dealing with Data Activism.

Jonathan Gray writes

The notion of data worlds is intended to make space for thinking about data as more than simply a representational resource, and the politics of data as more than a matter of liberation and protection. It is intended to encourage exploration of the performative capacities of data infrastructures: what they do and could do differently, and how they are done and could be done differently. This includes consideration of, as Geoffrey Bowker puts it, “the ways in which our social, cultural and political values are braided into the wires, coded into the applications and built into the databases which are so much a part of our daily lives”

He describes 3 ‘data worlds’, and positions them as an instrument intended for practical usage.

The three aspects of data worlds which I examine below are not intended to be comprehensive, but illustrative of what is involved in data infrastructures, what they do, and how they are put to work. As I shall return to in the conclusion, this outline is intended to open up space for not only thinking about data differently, but also doing things with data differently. The test of these three aspects is therefore not only their analytical purchase, but also their practical utility.

Those 3 worlds mentioned are

  1. Data Worlds as Horizons of Intelligibility, where data is plays a role in changing what is sayable, knowable, intelligible and experienceable , where data allows us to explore new perspectives, arrive at new insights or even new overall understanding. Hans Rosling’s work with Gapminder falls in this space, and datavisualisations that combine time and geography. To me this feels like approaching what John Thackara calls Macroscopes, where one finds a way to understand complete systems and one’s own place and role in it, and not just the position of oneself. (a posting on Macroscopes will be coming)
  2. Data Worlds as Collective Accomplishments, where consequences (political, social, economic) result from not just one or a limited number of actors, but from a wide variety of them. Open data ecosystems and the shifts in how civil society, citizens and governments interact, but also big data efforts by the tech industry are examples Gray cites. “Looking at data worlds as collective accomplishments includes recognising the role of actors whose contributions may otherwise be under-recognised.
  3. Data Worlds as Transnational Coordination, in terms of networks, international institutions and norm setting, which aim to “shape the world through coordination of data“. In this context one can think of things like IATI, a civic initiative bringing standardisation and transparency to international aid globally, but also the GDPR through which the EU sets a new de-facto global standard on data protection.

This seems at first reading like a useful thinking tool in exploring the consequences and potential of various values and ethics related design choices.

(Disclosure: Jonathan Gray and I wore both active in the early European open data community, and are co-authors of the first edition/iteration of the Open Data Handbook in 2010)

Macron’s 1.5 Billion for Values, Data and AI

Data, especially lots of it, is the feedstock of machine learning and algorithms. And there’s a race on for who will lead in these fields. This gives it a geopolitical dimension, and makes data a key strategic resource of nations. In between the vast data lakes in corporate silos in the US and the national data spaces geared towards data driven authoritarianism like in China, what is the European answer, what is the proposition Europe can make the world? Ethics based AI. “Enlightenment Inside”.

French President Macron announced spending 1.5 billion in the coming years on AI last month. Wired published an interview with Macron. Below is an extended quote of I think key statements.

AI will raise a lot of issues in ethics, in politics, it will question our democracy and our collective preferences……It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution…..Europe has not exactly the same collective preferences as US or China. If we want to defend our way to deal with privacy, our collective preference for individual freedom versus technological progress, integrity of human beings and human DNA, if you want to manage your own choice of society, your choice of civilization, you have to be able to be an acting part of this AI revolution . That’s the condition of having a say in designing and defining the rules of AI. That is one of the main reasons why I want to be part of this revolution and even to be one of its leaders. I want to frame the discussion at a global scale….The key driver should not only be technological progress, but human progress. This is a huge issue. I do believe that Europe is a place where we are able to assert collective preferences and articulate them with universal values.

Macron’s actions are largely based on the report by French MP and Fields Medal winning mathematician Cédric Villani, For a Meaningful Artificial Intelligence (PDF)

SODW Notes: 5 Local is where you live, but not the data pro’s

This week, as part of the Serbian open data week, I participated in a panel discussion, talking about international developments and experiences. A first round of comments was about general open data developments, the second round was focused on how all of that plays out on the level of local governments. This is one part of a multi-posting overview of my speaking notes.

Local is where you are, but not the data professionals

The local government is closest to our everyday lives. The street we live on, the way we commute to our work, the schools our children attend, the shopping we do and where we park our vehicles for it, the trash to take away, the quality of life in our immediate surroundings, most if not all is shaped by what local government does. Using open data here means potentially the biggest impact for citizens.

This effect is even stronger where many tasks are delegated to local and regional levels of government and where central government is less seen to be leading on open data. This is the case in for instance Germany. In the past years the states and especially municipalities have been the trail blazers in Germany for open data. This because also important things like taking in refugees is very much a local communal matter. This has resulted in open data apps to help refugees navigate German bureaucracy, learn the local language, and find local volunteers to connect to. Similar initiatives were visible in Serbia, e.g. the Techfugee hackathons. In the Netherlands in recent years key tasks on social welfare, youth care and health care have been delegated to the local level.

There is however a crucial difference between local government and many national public sector bodies. At national level many institutions are data professionals and they are focused on one specific domain or tasks. These are for instance the national statistics body, the cadastral offices, the meteorological institute, the highway authorities, or the business register. Municipalities on the other hand are usually not data professionals. Municipalities have a wide variety of tasks, precisely because they are so close to our everyday lives. This is mirrored in the variety of types of data they hold. However local governments in general have a less well developed overall understanding of their information systems, let alone of which data they hold.

This is also apparent from the work I did to help evaluate the EU PSI Directive: where the maturity of the overall information household is lower, it is much harder to embed or do open data well and in a sustainable manner. The lack of mature data governance is holding open data progress and impact back.

Two Good Reads on GDPR

The transition period to the new European privacy regulations, GDPR, will end in May after which compliance is needed. To me the GDPR is extremely interesting. First because it introduces a few novel concepts. Second because good data governance means openness, personal data protection and information security are all approached in the same way, which makes the GDPR important for my open data work. That open data work has been steadily shifting towards creating meaningful digital-first data governance.

One of the exciting novel concepts in the GDPR is that the legal obligations follow the data. The GDPR applies to any organisation holding data about EU citizens, regardless where they reside themselves. Another is that EU citizens must be able to clearly understand how data about them is collected and used. Terms of service where the snake hides on page 312 of a document full of legalese is no longer acceptable. This means that your data usage must be out in the open, as every individual has the right to verify how their own data is being collected, stored and used, as well as to export that data and withdraw consent. Compliance is recast from being a disadvantage to being a precondition and source of competition. To me it seems the GDPR is bringing the law much closer to our digital times. It paves the way for ‘ethics by design’ concerning data, and use it as a distinguishing factor. It also sets a de-facto global standard (although not everyone seems to realize yet).

The GDPR creates or reinforces a range of rights in law. Some of my clients have mentioned how they perceive this as a large heap of new work, but to me that’s not really true. It is true if you approach the GDPR as yet another administrative exercise to proof you are compliant, yet that is the old way of approaching privacy: Do whatever you want internally, and take precautions on the edges with the outside world. To reliably implement the GDPR and to be able to provide audit trails and pro-active proof of compliance (note that absence of this ability is interpreted as non-compliance), the most efficient way forward is embedding compliance in the data systems themselves. The ‘by design’ approach is mandatory for new systems. Knowing where in your data sets personal data resides, having consent as part of the metadata etc. This brings personal data protection firmly at the level of data governance and at the level of data system and structure design. Openness, personal data protection and information security can no longer be gates put around the data, but need to be part of the data, an ‘everything by design’ approach.

Two good articles to read are:
The report of a Berlin panel discussion, addressing the more general meaning and impact of the GDPR in 8 insights, by Sebastian Greger. (HT Alper Çugun)
A handy overview of the rights created under the GDPR and their meaning for e.g.
website and other tech design
, by Cennydd Bowles.

Launching the Malaysia Open Data User Group

I spent the last week in Kuala Lumpur to support the Malaysian Administrative Modernisation and Management Planning Unit (MAMPU) with their open data implementation efforts (such as the Malaysian open data portal). Specifically this trip was about the launch of the Malaysia Open Data User Group (MODUG), as well as discussions with MAMPU on how we can help support their 2018 and 2019 open data plans. I was there together with my World Bank colleague Carolina Vaira, and with Baden Appleyard, a long time long distance friend of my company The Green Land. As he is from Australia, working together in Malaysia means meeting sort-of half way.

The MODUG comes from the action plan presented last May, after our Open Data Readiness Assessment last year, which I helped bring about when I first visited in spring 2015 as part of the Malaysian big data advisory board. In the action plan we suggested creating an informal and trusted place for government organisations to discuss their practical issues and concerns in creating more open data, learn from each other, and collaborate on specific actions as well as formulating government good practice. Similarly it called for creating a similar space for potential users of government open data, for individuals, coding community, NGO’s and civil society, academia and the business community. Next to having these two places where both government and non-government can discuss their questions and issues amongst themselves, regular interaction was proposed between the two, so that data custodians and users can collaborate on creating social and economic value with open data in Malaysia. The MODUG brings these three elements under one umbrella.

Last Tuesday MAMPU held an event to launch the MODUG, largely moderated by Carolina and me. MAMPU is within the remit of General Affairs Minister within the Prime Minister’s office, Joseph Entulu Belaun. The Minister officially opened the event and inaugurated the MODUG (by cutting a ribbon hanging from a drone hovering in front of him).

Malaysian Open Data User Group (MODUG) 2017 Malaysian Open Data User Group (MODUG) 2017
Minister Joseph Entulu Belaun cutting a ribbon from a drone, and Dr Yusminar of MAMPU presenting the current status of Malaysian open data efforts. (both images (c) MAMPU)

Dr Yusminar, who is the team lead with MAMPU for open data, and our direct counter part in our work with MAMPU, provided a frank overview of efforts so far, and things that still need to be tackled. This helped set the scene for the rest of the day by providing a shared understanding of where things currently stand.

Then we got to work with the participants, in two rounds of a plenary panel followed by roundtable discussions. The first round, after data holders and users in a panel discussed the current general situation, government and non-government groups discussed separately, looking at which data they see demand for, the challenges they encounter in publishing or using the data, and the suggestions they have overcoming those. The second round started with a panel bringing some international experiences and good practice examples, during which I got a new title, that of ‘open data psychologist’ because of stressing the importance of the social aspects, behaviour and attitude involved in making open data work. The panel was followed with round table conversations that mixed both data custodians and users. Conversations centered on finding a collective agenda to move open data forward. After each round the results from each table were briefly presented, and the output attached to the walls. Participants clearly appreciated having the time and space to thoroughly discuss the open data aspects they find important, and be heard by their colleagues and peers. They indicated wanting to do this more often, which is great to hear as creating the room for such conversations is exactly what the MODUG is meant for!

Malaysia Open Data User Group Malaysia Open Data User Group
Malaysia Open Data User Group Malaysia Open Data User Group
Roundtable discussions on a shared open data agenda for MODUG

The day(s) after the event we discussed the output and how moving forward into 2018 and 2019 we can further support MAMPU and the Malaysian open data efforts. This meant diving much deeper into the detailed actions that need to be taken. I’m very much looking forward to staying involved.

Malaysia Open Data User Group
Working with the MAMPU team on next steps

Kuala Lumpur Kuala Lumpur
After work catching up with Baden and enjoying the sights