Mandatory transparency to counteract data hunger

Some disturbing key data points, reported by the Guardian, from a Congressional hearing in the US last week on the usage of facial recognition by the FBI: “Approximately half of adult Americans’ photographs are stored in facial recognition databases that can be accessed by the FBI, without their knowledge or consent, in the hunt for suspected criminals. About 80% of photos in the FBI’s network are non-criminal entries, including pictures from driver’s licenses and passports. The algorithms used to identify matches are inaccurate about 15% of the time, and are more likely to misidentify black people than white people.” It makes you wonder how many false positives have ended up in jail because of this.

At GEGF2014
Me, if you look closely, reflected in an anonymous ‘portrait’ (part of an exhibit on Stalinist repression and disappearances in Kazakhstan, 2014)

I am in favor of mandatory radical transparency of government agencies. Not just in terms of releasing data to the public, but also / more importantly specifying exactly what it is they collect, for what purpose, and what amount of data they have in each collection. Such openness I think is key in reducing the ‘data hunger’ of agencies (the habit of just collecting stuff because it’s possible, and ‘well, you never know’), and forces them to more clearly think about information design and the purpose of the collection. If it is clear up-front that either the data itself, or the fact that you collect such data and in which form you hold them, will be public at a predictable point in time, this will likely lead to self-restraint / self-censorship by government agencies. The example above is a case in point: The FBI did not publish a privacy impact assessment, as legally required, and tried to argue it would not need to heed certain provisions of the US Privacy Act.

If you don’t do such up-front mandatory radical transparency you get scope creep and disturbing collections like above. It is also self-defeating as this type of all encompassing data collection is not increasing the amount of needles found, but merely enlarging the haystack.

Using tech to flip facial recognition in video stories from Iran, at SXSWi
image by Sheila Scarborough, CC-BY

Malaysian Open Data Readiness Assessment

The past 12 days I was in Malaysia on mission for the World Bank and Malaysia’s Administrative Modernisation planning unit (MAMPU). Malaysia is pushing forward both on Big Data and Open Data initiatives, and I was there to do an Open Data Readiness Assessment (ODRA) to help point to the logical and most promising steps to take, in order to unlock the full potential of open data. The ODRA was the result of conversations I had with MAMPU when I visited Malaysia last year as member of the Malaysian Big Data Advisory Panel.

A marathon of meetings
Over the course of my visit we met with representatives of over 70 organisations, ministries, departments and agencies (2/3 government), and some of those organisations several times, usually for 1 hour or 90 minute sessions. All of these focused on the federal level (Malaysia is a kingdom with a federal structure). In all these meetings we were trying to understand the way the Malaysian government works, and how data plays a role in that. From the output, using the ODRA methodology, we assess the logical and possible steps for Malaysia to take towards more open data.

One of the many meetings we had

Formal launch with the Minister
The first two days were filled with meetings with civil society, the private sector and academia. The third day we met with the Minister for Administrative Modernisation responsible for MAMPU, which in turn is the responsible agency running the open data efforts. Together we officially launched the ODRA effort in attendance of the press and some 300 representatives of various civil society, business and government organisations. The Minister pointed to the value of open data in light of Malaysia’s development goals in his opening speech. After a little exercise, by my WB colleague Carolina from Washington, to gauge the opinions in the room on the value of open data to help move to a more informal exchange of ideas, I gave a presentation to show how open data creates value, what ensures open data success and how the ODRA will help find the right ‘hooks’ to do that. The Q&A that followed showed the strong interest in the room, and also the commitment of the Minister and MAMPU as both he and MAMPU’s DG got involved in the discussion.

Kuala Lumpur
Seminar with the Minister and a 300 people audience

Result driven and diverse
Malaysia strikes me as a country with lots of diversity, and as very result driven. That diversity was further emphasized while reading the very beautiful book Garden of the Evening Mists by Malaysian author Tan Twan Eng, but is also visible in every meeting we had, and every little walk I took through the city. The public sector is driven by KPI’s and in general everything is very much progress and future oriented. That has yielded impressive results, such as removing poverty in just generation, and now walking the path to be a high-income country by 2020. At the same time, all those KPI’s can generate a lack of focus (if everything is a priority, nothing really is) and can create blind spots (softer aspects such as the quality of interaction between government and the public) because it is harder to quantify.

Kuala Lumpur
Last year I was only visiting for 2 days or so, and had no time to see more of my surroundings. This time I was here for a week and a half, although most of those days were very busy. Some of the evenings, and during the weekend however my WB colleagues Rob (based in KL) and Carolina go to explore the city a bit, and enjoyed the great food. I spend a few hours visiting the Menara Kuala Lumpur, a telecommunications tower that has an observation deck providing a great view from 300m up over the city. With 5 million people it is a sprawling city over a large area (we commuted everyday from the hotel to MAMPU offices, 35kms away, all within the city), and the view from the tower showed me how extended that area really is.

Kuala Lumpur Kuala Lumpur
Kuala Lumpur Kuala Lumpur
Some views from Menara KL over the city.

Already last year what stood out for me is that food is important in Malaysia, and is offered at every opportunity, even during every meeting. Also the variety of cuisines on offer is great, from all over Asia, as well as western and Latin American. MAMPU arranged great Malaysian food for breakfast and lunch, during the intensive days of interviews, allowing me to indulge in all the great tastes and enjoying the spicyness. Off hours Rob took us to several places, Malaysian, Mexican, Spanish-Japanese, Korean, and Peruvian. I also sampled some of the fine Chinese restaurants. Sunday evening we enjoyed a great open air diner on the 24th (or 23A, as in Malaysian 4 sounds like death), overlooking KLCC park at sunset, and seeing the lights come on in the iconic Petronas towers. I arrived home with a little more baggage then I left with, so part of the aftermath of my visit is not just writing the report, but also losing that additional weight 😉

Kuala Lumpur Kuala Lumpur
Kuala Lumpur Kuala Lumpur
From the open 24th floor of one of the Troika towers, enjoying Peruvian food, watching the sun set and lights come on in Petronas Towers. In the background Menara KL from which I had a great view over the city earlier that day.

Next steps
The coming weeks we’ll go through all the material we’ve collected in our meetings and during our desk research. From it a report will result that is action oriented to help MAMPU drive open data forward, and use it as a tool to attain the development goals Malaysia has set for itself. Most of the necessary building blocks are in place, but those blocks are all in their own silos and generally not connected. Likely most of the suggested actions will be about creating the connections between those building blocks and work on the quality of relationships between stakeholders and the awareness of how open data can be a tool for both the public and for the public sector. I am grateful to the great MAMPU and WB team for our collaboration these past days and the hospitality they have shown me.

Kuala Lumpur
The MAMPU and WB team

How a Small Municipality Shows the Way with Open Data

In 2014/2015 my colleague Frank and I worked with the Province of North-Holland and 9 municipalities in that province to position open data as a policy instrument: around specific local issues we would publish data, and reach out to potential re-users. Part of this process was to make open data a normal part of every day work on public tasks. Hollands Kroon, a rural municipality in the very north of the Province was one of the participants that succeeded in bringing open data into line management.

Now they have launched a new municipal website, following the so-called ‘top tasks’ model. In this model the most prominent information shown is the information citizens most need or want. I have interacted with many municipalities that because of moving to a ‘top-tasks’ website refused to publish data or the answers to the FOIA requests they received. They said “we’re in the process of limiting the information in our sites to the most sought after, so we’re not going to publish any data etc, that would be confusing.”

Not so in Hollands Kroon. This is how their new site looks, with open data a very prominent menu option.

HK Website

With this step, Hollands Kroon shows how they have embraced open data. Already after the program with the Province, called North-Holland Smarter, they had formed a data team, working to raise internal awareness for open data and data driven work, and working to raise interest in re-use. Now they’ve gone a step further in making open data a significant part of their external communications.

To me this is all the more remarkable, as when we started in 2014 Hollands Kroon as a small rural municipality doubted whether open data could be a useful tool to them, and assumed it would only make sense in urban environments, such as in Amsterdam, the biggest city in the Province of North-Holland. They then quickly realized there is potential for their own local context and policy issues as well, especially if you work together with neighbouring municipalities in the region, in collaboration with the Province.

FOSS4G Keynote: Open Data for Social Impact

Last week I had the pleasure to attend and to speak at the annual FOSS4G conference. This gathering of the community around free and open source software in the geo-sector took place in Bonn, in what used to be the German parliament. I’ve posted the outline, slides and video of my keynote already at my company’s website, but am now also crossposting it here.

Speaking in the former German Parliament
Speaking in the former plenary room of the German Parliament. Photo by Bart van den Eijnden

In my talk I outlined that it is often hard to see the real impact of open data, and explored the reasons why. I ended with a call upon the FOSS4G community to be an active force in driving ethics by design in re-using data.

Impact is often hard to see, because measurement takes effort
Firstly, because it takes a lot of effort to map out all the network effects, for instance when doing micro-economic studies like we did for ESA or when you need to look for many small and varied impacts, both socially and economically. This is especially true if you take a ‘publish and it will happen’ approach. Spotting impact becomes much easier if you already know what type of impact you actually want to achieve and then publish data sets you think may enable other stakeholders to create such impact. Around real issues, in real contexts, it is much easier to spot real impact of publishing and re-using open data. It does require that the published data is serious, as serious as the issues. It also requires openness: that is what brings new stakeholders into play, and creates new perspectives towards agency so that impact results. Openness needs to be vigorously defended because of it. And the FOSS4G community is well suited to do that, as openness is part of their value set.

Impact is often hard to see, because of fragmentation in availability
Secondly, because impact often results from combinations of data sets, and the current reality is that data provision is mostly much too fragmented to allow interesting combinations. Some of the specific data sets, or the right timeframe or geographic scope might be missing, making interesting re-uses impossible.
Emerging national data infrastructures, such as the Danish and the Dutch have been creating, are a good fix for this. They combine several core government data sets into a system and open it up as much as possible. Think of cadastral records, maps, persons, companies, adresses and buildings.
Geo data is at the heart of all this (maps, addresses, buildings, plots, objects), and it turns it into the linking pin for many re-uses where otherwise diverse data sets are combined.

Geo is the linking pin, and its role is shifting: ethics by design needed
Because of geo-data being the linking pin, the role of geo-data is shifting. First of all it puts geo-data in the very heart of every privacy discussion around open data. Combinations of data sets quickly can become privacy issues, with geo-data being the combinator. Privacy and other ethical questions arise even more now that geo-data is no longer about relatively static maps, but where sensors are making many more objects as well as human beings objects on the map in real time.
At the same time geo-data is becoming less visible in these combinations. ‘The map’ is not neccessarily a significant part of the result of combining data sets, just a catalyst on the way to get there. Will geo-data be a neutral ingredient, or will it be an ingredient with a strong attitude? An attitude that aims to actively promulgate ethical choices, not just concerning privacy, but also concerning what are statistically responsible combinations, and what are and are not legal steps in getting to an in itself legal result again? As with defending openness itself, the FOSS4G community is in a good position to push the ethical questions forward in the geo community as well as find ways of incorporating them directly in the tools they build and use.

The video of the keynote has been published by the FOSS4G conference organisers.
Slides are available from Slideshare and embedded below:

Sunday Serendipity Reading Links

Every day I save a bunch of links from my explorations over the interwebs. Stuff that passes my radar, may become fodder for my writing at some point, but often gets piled and forgotten.I thought maybe it is good to share some of the unsought links I encounter, and some of the notions why I bookmarked it. Blogging of course used to be linklogging, sharing links to your blog neighbourhood, so let’s say it’s returning to a respected tradition. Here are a fistful of links from this week.

    Distributed web

  • IPFS, a distributed way of delivering webpages and files. Pointed out to me in the context of my postings on distributedness and agency. Napsterizing/torrenting everything. Also seems to want to preserve everything on the web better.
  • Steem is a blockchain based social media platform. Aims to ‘pay’ you for contributing, and do the bookkeeping in a blockchain ledger. Not sure that may work, nor that permanent records of each social media utterance are desirable. Like with IPFS mentioned above, ’not forgetting’ may not be a feature but a very concerning social bug. My friend Boris Mann is trying it out, looking forward to reading more of his reflections. I may not understand, I never understood the purpose of Medium either, which superficially seems to be the same thing but without the bookkeeping.
  • Anil Dash reflects on the lost infrastructure of social media. This resonates strongly with me in terms of what made blogging so exciting 10-15 years ago, as well as with my recent writings about agency. Part of the picture is weaving a tapestry of functionality across different services and tools that together are a potent mix. It needs plumbing like RSS, trackback and discoverability over the lines of conversations distributed over the individual blogs of the participants. My friend Lilia did her Phd on those distributed conversations. And as Hoder wrote seeing the web again after six years in an Iranian prison: much of our web now, such as Facebook, is just TV, not coffee house interaction.

  • Free private cities. Sign up to live in one, so you have an ‘equal’ position based on contracted service provision. Because tinkering with democracy and the fact that others have different needs is bothersome, or such. Apparantly the social contract isn’t good enough. This has high overtones of Snowcrash Burbclaves, and the micro-democracy states (100.000 people each, and with every election there is freedom of movement globally to pick the government (corporate, value or ethnicity based) of your choice in the very entertaining near-future SF book Infomocracy by Malka Ann Older. These private city contracts don’t seem to account for the cost of leaving if you cancel your contract, as it is still territory bound, so finding a new service provider means physically moving. With all the social and monetary cost of doing that. Also seems to me that the Principality of Monaco held up as a good practice example, incorporated US towns, or the City of London for that matter provide ample demonstration of why this may not be the way forward to a more inclusive global society.

  • The Ribbon Farm, a blog by Venkatesh Rao, newly added to my feed-reader. His recent newsletter edition on premature synchronization as a cause of problems, chimes with a lot of my experience. Converging too early (because there are just 10 minutes left in the meeting), or forcing convergence in a group doesn’t help much usually. The leading example in the link being military reminds me of an anecdote I once heard about “the world championship of armies” where the US military units were failing because they waited or tried to confirm orders continuously, and the Dutch fared better because they upon receiving others did what seemed worth doing based on context and observation, not seeking further orders and disregarding the literal meaning of orders in the process. Desyncing, as a practice seems valuable advice, and similar to making stuff distributed by design, or probe-based evolution. Seek out new perspectives and let yourself be challenged as part of your routines.

Data Sovereignty as Prerequisite for Open Data Agency

As we are living in a networked world, increasingly government bodies execute their tasks while collaborating in networks of various other stakeholders. This also happens when it comes to collecting, providing or working with data as part of public tasks. One of the potential detrimental side effects is that it quickly becomes unclear who can decide to open such data up. Or whether a government entity, who wants to publish data as part of a policy intervention, still feels able to do so. This ability to decide over your own data, I call data sovereignty. I think without proper attention, the data sovereignty of public institutions is under pressure in collaborative situations and a threat to the freedom of public entities to decide and act on their own open data efforts. This is especially problematic where the lack of data sovereignty hinders public entities in deploying open data as a policy instrument.

I have just completed an inventory of the data sets that a Dutch province holds and the visible erosion of data sovereignty was the main unexpected outcome for me.
This erosion takes different shapes. Here are a few examples of it, encountered in the Province I mentioned:

  • Data collection on businesses locations and the number of people they employ (to track employment per municipality per sector) is being pooled by all provinces (as a national level data set is more useful). The pooling takes place in a separate legal entity. It is unclear if this entity still falls under FOIA and re-use regulations. This entity also exploits the data by selling it. Logical at the organisational level perhaps, but illogical in comparison with the provincial public task (and maybe not even legal under the Re-Use law). Opening up the data needs to be done through that new entity, meaning not just convincing yourself, but all other provinces as well as the entity who has commercial interest in not being convinced. The slowest will thus set the speed.
  • Data collection on traffic flows, collected by the Province, is stored directly in a national data warehouse (NDW). Again pooling data makes it more useful, but the Province cannot store cleaned data there (anomalies filtered out, pattern changes explained etc.), so always needs to redo that cleaning and filtering whenever they want to work or access their own data. Although the publicly owned NDW now publishes open data, until recently they saw themselves as a commercial outfit, adverse to the notion of open data.
  • Data collection on bicycle traffic, done by the Province, is stored in the online database of a French service provider active in the entire EU. Ownership of the data is unclear. The Province only accesses the data through the French website. If a FOIA request came, it would be unclear if providing the data runs counter to any rights the service provider is claiming.
  • Data collection on the prevalence of bird species is being collected in collaboration with nature preservation groups and large numbers of volunteers. The Province pays for the data collection, but the nature preservation groups claim their volunteers (by virtue of their voluntary efforts) are the rightful owners of the data. Without seeking internal legal advice, the discussion remains unsolved and stalls.

None of these situations are unsolvable, all of them can get a definitive answer. The issue however is that nobody is clearly in a position, or has the explicit role to make sure such an definitive answer gets formulated. Because of that, uncertainties remain, which easily leads to inaction. If and when the Province wants to act to open data up, it therefore easily runs into all kinds of questions that will slow action down, or ensure action does not get taken.

It is entirely logical that public entities are collaborating in networks with other public entities and domain-specific stakeholders for the collection, dissemination and use of data. It is also certain, given our networked society and the drive for efficiency, the number of situations where such collaboration takes place will only rise. However, for the drive towards more openness it is detrimental when ownership of public data becomes unclear, gets transferred to an entity that potentially falls outside the scope of FOIA, or falls under the rights of a private entity, just because nobody sought to clarify such matters at the outset.

Public entities should learn to strongly guard their data sovereignty if they want to maintain their own agency in using opening up data as a policy instrument. Moving to open by design as a default for the public sector, requires stopping the erosion of data sovereignty.

Serbian Information Commissioner Now Publishing Open Data

Today a tweet from the Serbian office of the Commissioner for Information of Public Importance and Personal Data Protection thanked me and colleagues for promoting open data. As a result the Commissioner’s Office has launched an open data site today, on the data subdomain of their regular website, This is very good news, and a welcome consequence of the open data readiness assessment I did with the World Bank and the UNDP last year. In June I spoke with the Commissioner about their work, and his deputy already took an active role last December at the conference where we presented the results of the assessment.

In a press release (Serbian only), the Commissioner’s Office states that as further encouragement to the Serbian public administration, the Commissioner is opening up data concerning their own work. Thirteen data sets have been published, one of which I think is very important: the list of public institutions that fall under the freedom of information and data protection frameworks (over 11.000!). Other data published concerns the complaints about information requests and their status the office received, as well as complaints and requests concerning data protection and privacy.

With the help of civil society organisation Edukacioni Centar (whom I had the pleasure of meeting as well) the data comes with some visualizations as well, to improve the understanding of what data is now available. One allows navigating through the network of over eleven thousand institutions that fall within the scope of the Commissioner’s Office, another the status and subject of the various complaints received.

Serbian institutions(Screenshot of over 11.000 public institutions)

Steps like these I find important, where institutions such as the Information Commissioner, or here in the Netherlands the Supreme Audit Institution, lead by example. By doing that they underline the importance of transparency also to the functioning of their own institutions.

Open Data Readiness Assessment Kyrgyzstan Published

The UNDP has published the open data readiness assessment for the Krygyz Republic. From November 2014 to June 2015 I visited Kyrgyzstan three times for a week on behalf of the World Bank. In collaboration with the Kyrgyz Government and the UNDP, as well as local companies, civil society organisations and the coding community, we looked for the right starting points for open data in Kyrgyzstan, and which steps to take to get going.

The UNDP has now published the resulting report, which is embedded below. Download link here.

Open Communities / Refugeehack Wuppertal

Last November I attended the yearly Open Communities North-Rhine Westphalia barcamp (OKNRW), which was combined with a hackday called #refugeehack. The latter focused on using open data to help refugees find their way in Germany.

I presented my experiences working with local governments to help them use open data as a policy instrument. We did a year long project with 9 municipalities and 1 province in 2014-2015. The driving thought behind it was that releasing data can be a deliberate intervention in a policy field, as having data in my hands changes a stakeholder’s agency. Slides shown below.

Now a video, showing how the OKNRW 2015 & Refugeehack played out has been released (in German).

Open Data Readiness in Serbia

Last June I spent time in Serbia doing an open data readiness assessment for the World Bank. Early this month I returned to present the findings, and to mentor a number of teams at the first Serbian open data hackathon. The report I wrote is now also available online through the UNDP website.

odrareportthe printed ODRA report

The UNDP organized a conference to present the outcome of the readiness assessment and discuss next steps with stakeholders. At the conference I presented my findings to the Minister for Public Administration and Local Self Government (MPALSG), and a printed version was made available to all present.

ministerme conf1
(l) the minister (center, me left of her) on open data (photo Ministry PALSG), (r) discussing presented app (photo

At the conference the 11 teams that created open data applications at the hackathon the weekend before, called, were also presented. The hackathon took place in the recently opened StartIT Centar, a coworking space (which got funded through kickstarter). I had the pleasure to be a mentor to the teams (together with Georges and Brett from Open Data Kosovo), to channel my experience with open data communities around Europe and open data app-building in the past 8 years. The quality of the results was I think impressive, and it was the first hackathon where I saw people trying to incorporate deep-learning tech. I aim to post separately on the different applications built.

mentorMentoring during the hackathon, with Milos and Nemanja. (photo

That the hackathon was about open data was possible because five public sector institutions (Ministry for Interior, Ministry of Education, Agency for Environmental Protection, Agency for Medicines and Medical Devices, and the Public Procurement Office) have been working constructively to publish data after our first visit in June. In the coming months I hope to return to Belgrade to provide further implementation support.

The report is also embedded below:

Serbia Open Data Readiness Assessment