The Mozilla foundation has launched a new service that looks promising, which is why I am bookmarking it here. Firefox Send allows you to send up to 1GB (or 2.5GB if logged in) files to someone else. This is the same as services like Dutch WeTransfer does, except it does so with end-to-end encryption.

Files are encrypted in your browser, before being send to Mozilla’s server until downloaded. The decryption key is contained in the download URL. That download URL is not send to the receiver by Mozilla, but you do that yourself. Files can be locked with an additional password that needs to be conveyed to the receiver by the sender through other means as well. Files are kept 5 minutes, 1 or 24 hours, or 7 days, depending on your choice, and for 1 or up to 100 downloads. This makes it suitable for quick shares during conference calls for instance. Apart from the encrypted file, Mozilla only knows the IP address of the uploader and the downloader(s). Unlike services like WeTransfer where the service also has e-mail addresses for both uploader and intended downloader, and you are dependent on them sending the receivers a confirmation with the download link first.


Firefox Send doesn’t send the download link to the recipient, you do

This is an improvement in terms of data protection, even if not fully water tight (nothing ever really is, especially not if you are a singled out target by a state actor). It does satisfy the need of some of my government clients who are not allowed to use services like WeTransfer currently.

Granularity - legos, crayons, and moreGranularity (photo by Emily, license: CC-BY-NC)

A client, after their previous goal of increasing the volume of open data provided, is now looking to improve data quality. One element in this is increasing the level of detail of the already published data. They asked for input on how one can approach and define granularity. I formulated some thoughts for them as input, which I am now posting here as well.

Data granularity in general is the level of detail a data set provides. This granularity can be thought of in two dimensions:
a) whether a combination of data elements in the set is presented in one field or split out into multiple fields: atomisation
b) the relative level of detail the data in a set represents: resolution

On Atomisation
Improving this type of granularity can be done by looking at the structure of a data set itself. Are there fields within a data set that can be reliably separated into two or more fields? Common examples are separating first and last names, zipcodes and cities, streets and house numbers, organisations and departments, or keyword collections (tags, themes) into single keywords. This allows for more sophisticated queries on the data, as well as more ways it can potentially be related to or combined with other data sets.

For currently published data sets improving this type of granularity can be done by looking at the existing data structure directly, or by asking the provider of the data set if they have combined any fields into a single field when they created the dataset for publication.

This type of granularity increase changes the structure of the data but not the data itself. It improves the usability of the data, without improving the use value of the data. The data in terms of information content stays the same, but does become easier to work with.

On Resolution
Resolution can have multiple components such as: frequency of renewal, time frames represented, geographic resolution, or splitting categories into sub-categories or multilevel taxonomies. An example is how one can publish average daily temperature in a region. Let’s assume it is currently published monthly with one single value per day. Resolution of such a single value can be increased in multiple ways: publishing the average daily temperature daily, not monthly. Split up the average daily temperature for the region, into average daily temperature per sensor in that region (geographic resolution). Split up the average single sensor reading into hourly actual readings, or even more frequent. The highest resolution would be publishing real-time individual sensor readings continuously.

Improving resolution can only be done in collaboration with the holder of the actual source of the data. What level of improvement can be attained is determined by:

  1. The level of granularity and frequency at which the data is currently collected by the data holder
  2. The level of granularity or aggregation at which the data is used by the data holder for their public tasks
  3. The level of granularity or aggregation at which the data meets professional standards.

Item 1 provides an absolute limit to what can be done: what isn’t collected cannot be published. Usually however data is not used internally in the exact form it was collected either. In terms of access to information the practical limit to what can be published is usually the way that data is available internally for the data holder’s public tasks. Internal systems and IT choices are shaped accordingly usually. Generally data holders can reliably provide data at the level of Item 2, because that is what they work with themselves.

However, there are reasons why data sometimes cannot be publicly provided the same way it is available to the data holder internally. These can be reasons of privacy or common professional standards. For instance energy companies have data on energy usage per household, but in the Netherlands such data is aggregated to groups of at least 10 households before publication because of privacy concerns. National statistics agencies comply with international standards concerning how data is published for external use. Census data for instance will never be published in the way it was collected, but only at various levels of aggregation.

Discussions on the desired level of resolution need to be in collaboration with potential re-users of the data, not just the data holders. At what point does data become useful for different or novel types of usage? When is it meeting needs adequately?

Together with data holders and potential data re-users the balance needs to be struck between re-use value and considerations of e.g. privacy and professional standards.

This type of granularity increase changes the content of the data. It improves the usage value of the data as it allows new types of queries on the data, and enables more nuanced contextualisation in combination with other datasets.

This week NBC published an article exploring the source of training data sets for facial recognition. It makes the claim that we ourselves are providing, without consent, the data that may well be used to put us under surveillance.

In January IBM made a database available for research into facial recognition algorithms. The database contains some 1 million face descriptions that can be used as a training set. Called “Diversity in Faces” the stated aim is to reduce bias in current facial recognition abilities. Such bias is rampant often due to too small and too heterogenous (compared to the global population) data sets used in training. That stated goal is ethically sound it seems, but the means used to get there raises a few questions with me. Specifically if the means live up to the same ethical standards that IBM says it seeks to attain with the result of their work. This and the next post explore the origins of the DiF data, my presence in it, and the questions it raises to me.

What did IBM collect in “Diversity in Faces”?
Let’s look at what the data is first. Flickr is a photo sharing site, launched in 2004, that started supporting publishing photos with a Creative Commons license from early on. In 2014 a team led by Bart Thomee at Yahoo, which then owned Flickr, created a database of 100 million photos and videos with any type of Creative Commons license published in previous years on Flickr. This database is available for research purposes and known as the ‘YFCC-100M’ dataset. It does not contain the actual photos or videos per se, but the static metadata for those photos and videos (urls to the image, user id’s, geo locations, descriptions, tags etc.) and the Creative Commons license it was released under. See the video below published at the time:

YFCC100M: The New Data in Multimedia Research from CACM on Vimeo.

IBM used this YFCC-100M data set as a basis, and selected 1 million of the photos in it to build a large collection of human faces. It does not contain the actual photos, but the metadata of that photo, and a large range of some 200 additional attributes describing the faces in those photos, including measurements and skin tones. Where YFC-100M was meant to train more or less any image recognition algorithm, IBM’s derivative subset focuses on faces. IBM describes the dataset in their Terms of Service as:

a list of links (URLs) of Flickr images that are publicly available under certain Creative Commons Licenses (CCLs) and that are listed on the YFCC100M dataset (List of URLs together with coding schemes aimed to provide objective measures of human faces, such as cranio-facial features, as well as subjective annotations, such as human-labeled annotation predictions of age and gender(“Coding Schemes Annotations”). The Coding Schemes Annotations are attached to each URL entry.

My photos are in IBM’s DiF
NBC, in their above mentioned reporting on IBM’s DiF database, provide a little tool to determine if photos you published on Flickr are in the database. I am an intensive user of Flickr since early 2005, and published over 25.000 photos there. A large number of those carry a Creative Commons license, BY-NC-SA, meaning that as long as you attribute me, don’t use an image commercially and share your result under the same license you’re allowed to use my photos. As the YFCC-100M covers the years 2004-2014 and I published images for most of those years, it was likely my photos are in it, and by extension likely my photos are in IBM’s DiF. Using NBC’s tool, based on my user name, it turns out 68 of my photos are in IBM’s DiF data set.

One set of photos that apparently is in IBM’s DiF cover the BlogTalk Reloaded conference in Vienna in 2006. There I made various photos of participants and speakers. The NBC tool I mentioned provides one photo from that set as an example:

Thomas Burg

My face is likely in IBM’s DiF
Although IBM doesn’t allow a public check who is in their database, it is very likely that my face is in it. There is a half-way functional way to explore the YFCC-100M database, and DiF is derived from the YFCC-100M. It is reasonable to assume that faces that can be found in YFCC-100M are to be found in IBM’s DiF. The German university of Kaiserslautern at the time created a browser for the YFCC-100M database. Judging by some tests it is far from complete in the results it shows (for instance if I search for my Flickr user name it shows results that don’t contain the example image above and the total number of results is lower than the number of my photos in IBM’s DiF) Using that same browser to search for my name, and for Flickr user names that are likely to have taken pictures of me during the mentioned BlogTalk conference and other conferences, show that there is indeed a number of pictures of my face in YFCC-100M. Although the limited search in IBM’s DiF possible with NBC’s tool doesn’t return any telling results for those Flickr user names. it is very likely my face is in IBM’s DiF therefore. I do find a number of pictures of friends and peers in IBM’s DiF that way, taken at the same time as pictures of myself.


Photos of me in YFCC-100M

But IBM won’t tell you
IBM is disingenuous when it comes to being transparent about what is in their DiF data. Their TOS allows anyone whose Flickr images have been incorporated to request to be excluded from now on, but only if you can provide the exact URLs of the images you want excluded. That is only possible if you can verify what is in their data, but there is no public way to do so, and only university affiliated researchers can request access to the data by stating their research interest. Requests can be denied. Their TOS says:

3.2.4. Upon request from IBM or from any person who has rights to or is the subject of certain images, Licensee shall delete and cease use of images specified in such request.

Time to explore the questions this raises
Now that the context of this data set is clear, in a next posting we can take a closer look at the practical, legal and ethical questions this raises.

In januari schreef ik aangenaam verrast over de Provincie Overijssel die hun iconenset uit de huisstijl onder een Creative Commons licentie hadden gepubliceerd. Ik schreef de Provincie er een complimenterende e-mail over, en stelde de vraag welke Creative Commons licentie er precies bedoeld werd. Want dat was niet duidelijk op de website. Zo was niet helder of naamsvermelding gewenst was, of commercieel hergebruik was toegestaan, en of afgeleid werk onder dezelfde condities moest worden gelicentieerd. Ik kreeg een mail terug met de aankondiging dat ze een aanpassing zouden doen.

Tot mijn verbazing was de aanpassing niet een verduidelijking maar een terugdraaiing van het geheel. De Creative Commons licentie is verdwenen en de site laat nu alleen het gebruik van de iconen toe voor en door de Provincie en hun leveranciers.

Ik stuurde een teleurgestelde e-mail, waarin ik vragen stelde over hoe de nieuwe keuze tot stand gekomen is. Dat wordt natuurlijk al snel een lange mail, omdat het bij dit soort zaken snel over details gaat. Elke vlottere formulering roept dan weer al gauw nieuwe vragen op. Het was dan ook prettig dat een van de communicatie-teamleden me vanmiddag belde om wat context te verschaffen.

Het toevoegen van CC aan de iconen was een door een medewerker gedaan experiment , op basis van ervaringen met eerdere iconen die onder CC beschikbaar waren. De intentie was om CC wat meer gebruik te geven. Dat maakt het bijvoorbeeld ook voor andere overheden makkelijker om dingen van elkaar her te gebruiken. Daar heeft iedereen profijt van. Maar juist bij creatieve uitingen (anders dan bijvoorbeeld bij data waar landelijk beleid geldt t.a.v. CC gebruik) zijn er meer auteursrechtelijke aspecten om rekening mee te houden. Commercieel hergebruik van de creatieve uitingen van een ander zijn dan praktisch en gevoelsmatig een andere stap. We hebben het over de huisstijl van de Provincie, dus wil je wel dat diezelfde iconen ‘overal’ kunnen opduiken? Het is niet de bedoeling dat andermans uitingen met die van jou worden geassocieerd.

Voortschrijdend inzicht op grond van die afwegingen, zijn de oorzaak dat men is teruggekeerd van de oorspronkelijke goede intentie. Dat is goed, al is het resultaat dat er jammer genoeg toch geen CC licentie aan de iconenset hangt. Een experiment is precies dat: een experiment, en dat betekent dat je ook kunt concluderen dat het niet voldeed.

Er zijn natuurlijk opener, minder open, en meer gesloten vormen van CC licenties. Dat is het hele punt van CC: dat je selectief op voorhand voor bepaalde hergebruiksvormen al toestemming verleent, zonder dat iedereen dat bij de auteursrechthebbende moet vragen. Van alle rechten voorbehouden naar sommige rechten voorbehouden.

Het blijft lovenswaardig dat het communicatieteam de intentie had en heeft om met CC te werken. En het is heel prettig dat er contact is opgenomen, dat praat makkelijker. Hopelijk leidt het er toe dat bij een volgende kans er wel een CC licentie gehanteerd kan worden.

In algemene zin, zou het helpen als het Ministerie van BZK, als houder van het dossier rond open overheid en open data, en de directie van decentrale overheden zoals een provincie hier sterker sturend in zouden zijn. Dan zijn experimenten niet nodig, en ontstaat er ook geen angst of zorg op de werkvloer voor mogelijk onbedoelde gevolgen, waardoor je voorzichtige terugtrekkingen als dit krijgt. Die voorzichtigheid is een normale voorspelbare menselijke reactie, maar die kun je in je organisatie onnodig maken. BZK stelt al als beleidslijn dat CC0 en CCBY voor data publicaties gehanteerd moeten worden. Open standaarden zijn al 11 jaren verplicht (maar weinig overheden houden zich daaraan in de praktijk). Het hanteren van een eenduidige praktische interpretatie van de auteurswet ook voor creatieve uitingen van overheden en de daarmee verbonden logische licentiekeuzes door BZK, en het bekrachtigen daarvan door het bestuur van decentrale overheden zou hier helpen. Er is voldoende ervaring inmiddels om het BZK mogelijk te maken hierin normerend op te treden.

After California, now the Washington State senate has adopted a data protection and privacy act that takes the EU General Data Protection Regulation (GDPR) as an example to emulate.

This is definitely a hoped for effect of the GDPR when it was launched. European environmental and food safety standards have had similar global norm setting impact. This as for businesses it generally is more expensive to comply with multiple standards, than it is to only comply with the strictest one. We saw it earlier in companies taking GDPR demands and applying them to themselves generally. That the GDPR might have this impact, is an intentional part of how the EC is developing a third proposition in data geopolitics, between the surveillance capitalism of the US data lakes, and the data driven authoritarianism of China.

To me the GDPR is a quality assurance instrument, with its demands increasing over time. So it is encouraging to see other government entities outside the EU taking a cue from the GDPR. California and Washington State now have adopted similar laws. Five other States in the USA have introduced similar laws for debate in the past 2 months: Hawaii, Massachusetts, New Mexico, Rhode Island, and Maryland.

This article is a good description of the Freedom of Information (#foia #opengov #opendata) situation in the Balkans. Due to my work in the region, I recognise lots of what is described here. My work in the region, such as in Serbia, has let me encounter various institutions willing to use evasive action to prevent the release of information.

In essence this is not all that different from what (decentral) government entities in other European countries do as well. Many of them still see increased transparency and access as a distraction absorbing work and time they’d rather spend elsewhere. Yet, there’s a qualitative difference in the level of obstruction. It’s the difference between acknowledging there is a duty to be transparant but being hesitant, and not believing that there’s such a duty in governance at all.

Secrecy, sometimes in combination with corruption, has a long and deep history. In Central Asia for instance I encountered an example that the number of agricultural machines wasn’t released, as a 1950’s Soviet law still on the books marked it as a state secret (because tractors could be mobilised in case of war). More disturbingly such state secrecy laws are abused to tackle political opponents in Central Asia as well. When a government official releases information based on a transparency regulation, or as part of policy implementation, political opponents might denounce them for giving away state secrets and take them to court risking jail time even.

There is a strong effort to increase transparency visible in the Balkan region as well. Both inside government, as well as in civil society. Excellent examples exist. But it’s an ongoing struggle between those seeing power as its own purpose and those seeking high quality governance. We’ll see steps forward, backwards, rear guard skirmishes and a mixed bag of results for a long time. Especially there where there are high levels of distrust amongst the wider population, not just towards government but towards each other.

One such excellent example is the work of the Serbian information commissioner Sabic. Clearly seeing his role as an ombudsman for the general population, he and his office led by example during the open data work I contributed to in the past years. By publishing statistics on information requests, complaints and answer times, and by publishing a full list of all Serbian institutions that fall under the remit of the Commission for Information of Public Importance and Personal Data Protection. This last thing is key, as some institutions will simply stall requests by stating transparency rules do not apply to them. Mr. Sabic’s term ended at the end of last year. A replacement for his position hasn’t been announced yet, which is both a testament to Mr Sabic’s independent role as information commissioner, and to the risk of less transparency inclined forces trying to get a much less independent successor.

Bookmarked Right to Know: A Beginner’s Guide to State Secrecy / Balkan Insight by Dusica Pavlovic (Balkan Insight)
Governments in the Balkans are chipping away at transparency laws to make it harder for journalists and activists to hold power to account.

SimCity200, adapted from image by m01229 CC-BY)

Came across an interesting article, and by extension the techzine it was published in: Logic.
The article was about the problematic biases and assumptions in the model of urban development used in the popular game SimCity (one of those time sinks where my 10.000 hours brought me nothing 😉 ). And how that unintentionally (the SimCity creator just wanted a fun game) may have influenced how people look at the evolution of cityscapes in real life, in ways the original 1960’s work the game is based on never has. The article is a fine example of cyber history / archeology.

The magazine it was published in, Logic (twitter), started in the spring of 2017 and is now reaching issue 7. Each issue has a specific theme, around which contributions are centered. Intelligence, Tech against Trump, Sex, Justice, Scale, Failure, Play, and soon China, have been the topics until now.

The zine is run by Moira Weigel, Christa Hartsock, Ben Tarnoff, and Jim Fingal.

I’ve ordered the back issues, and subscribed (though technically it is cheaper to keep ordering back-issues). They pay their contributors, which is good.


Cover for the upcoming edition on tech in China. Design (like all design for Logic) by Xiaowei R. Wang.

It obviously makes no sense to block the mail system if you disagree with some of the letters sent. The deceptive method of blocking used here, targeting the back-end servers so that mail traffic simply gets ignored, while Russian Protonmail users still seemingly can access the service, is another sign that they’d rather not let you know blocking goes on at all. This is an action against end-to-end encryption.

The obvious answer is to use more end-to-end encryption, and so increase the cost of surveillance and repression. Use my protonmail address as listed on the right, or use PGP using my public key on the right to contact me. Other means of reaching me with end-to-end encryption are the messaging apps Signal and Threema, as well as Keybase (listed on the right as well).

Bookmarked Russia blocks encrypted email provider ProtonMail (TechCrunch)
Russia has told internet providers to enforce a block against encrypted email provider ProtonMail, the company’s chief has confirmed. The block was ordered by the state Federal Security Service, formerly the KGB, according to a Russian-language blog, which obtained and published the order aft…

Aral Balkan talks about how to design tools and find ways around the big social media platforms. He calls for the design and implementation of Small Tech. I fully agree. Technology to provide us with agency needs to be not just small, but smaller than us, i.e. within the scope of control of the group of people deploying a technology or method.

My original fascination with social media, back in the ’00s when it was blogs and wikis mostly, was precisely because it was smaller than us, it pushed publication and sharing in the hands of all of us, allowing distributed conversations. The concentration of our interaction in the big tech platforms made social media ‘bigger than us’ again. We don’t decide what FB shows us, breaking out of your own bubble (vital in healthy networks) becomes harder because sharing is based on pre-existing ‘friendships’ and discoverability has been removed. The erosion has been slow, but very visible. Networked Agency, to me, is only possible with small tech, and small methods. It’s why I find most ‘digital transformation’ efforts disappointing, and feel we need to focus much more on human digital networks, on distributed digital transformation. Based on federated small tech, networks of small tech instances. Where our tools are useful on their own, and more useful in concert with others.

Aral’s posting (and blog in general) is worth a read, and as he is a coder and designer, he acts on those notions too.

This weekend an online virtual IndieWebCamp took place. One of the topics discussed and worked upon has my strong interest: making it possible to authorise selective access to posts.

Imagine me writing here about my intended travel. This I would want to share with my network, but not necessarily announce publicly until after the fact. Similarly, let’s say I want some of those reading here to get an update about our little one, then I’d want to be able to indicate who can have access to that posting.

In a platform like FB, and previously on Google plus with its circles, you can select audiences for specific postings. Especially the circles in Google allowed fine grained control. On my blog it is much less obvious how to do that. Yet, there are IndieWeb components that would allow this. For instance IndieAuth already allows you to log-in to this website and other platforms using your own URL (much like Facebook’s login can be used on multiple sites, although you really don’t want to do that as it lets FB track you across other sites you use). However, for reading individual postings that have restricted access, it would require an action made by a human (accepting the authorisation request), which makes it impractical. Enter AutoAuth, based on IndieAuth, that allows your site to log-in to mine without human intervention.

Martijn van de Ven and Sven Knebel worked on this, as sketched out in the graph below.

Selective access to content inside a posting
Now, once this is working, I’d like to take it one step further still. The above still assumes I have postings for all and postings for some, and that implies writing entire postings with a specific audience in mind. More often I find I am deliberately vague on some details in my public postings, even though I know some of my network reading here can be trusted with the specifics. Like names, places, photos etc. In those instances writing another posting with that detailed info for restricted access does not make much sense. I’d want to be able to restrict access to specific sentences, paragraphs or details in an otherwise public posting.

This is akin to the way government document management systems are slowly being adapted, where specific parts in a document are protected by data protection laws, while the document itself is public by law. Currently balancing those two obligations means human intervention before sharing, but slowly systems are being adapted to knowing where in documents restricted access material is located. Ideally I want a way of marking up text in a posting like this so that it is only send out by the webserver when an authorisation like sketched above is available.

So that a posting like this is entirely possible:

“Today we went to the zoo with the < general access > little one < / general access > < friends only > our little one’s name < / friends only >

< general access > general IMAGE of zoo visit< / general access >
< friends only >
IMAGE with little one’s face< / friends only >

There were several points made in the conversation after my presentation yesterday at Open Belgium 2019. This is a brief overview to capture them here.

1) One remark was about the balance between privacy and openness, and asking about (negative) privacy impacts.

The framework assumes government as the party being interested in measurement (given that that was the assignment for which it was created). Government held open data is by default not personal data as re-use rules are based on access regimes which in turn all exclude personal data (with a few separately regulated exceptions). What I took away from the remark is that, as we know new privacy and other ethical issues may arise from working with data combinations, it might be of interest if we can formulate indicators that try to track negative outcomes or spot unintended consequences, in the same way as we are trying to track positive signals.

2) One question was about if I had included all economic modelling work in academia etc.

I didn’t. This isn’t academic research either. It seeks to apply lessons already learned. What was included were existing documented cases, studies and research papers looking at various aspects of open data impact. Some of those are academic publications, some aren’t. What I took from those studies is two things: what exactly did they look at (and what did they find), and how did they assess a specific impact? The ‘what’ was used as potential indicator, the ‘how’ as the method. It is of interest to keep tracking new research as it gets published, to augment the framework.

3) Is this academic research?

No, its primary aim is as a practical instrument for data holders as well as national open data policy makers. It’s is not meant to establish scientific truth, and completely quantify impact once and for all. It’s meant to establish if there are signs the right steps are taken, and if that results in visible impact. The aim, and this connects to the previous question as well, is to avoid extensive modelling techniques, and favor indicators we know work, where the methods are straightforward. This to ensure that government data holders are capable to do these measurements themselves, and use it actively as an instrument.

4) Does it include citizen science (open data) efforts?

This is an interesting one (asked by Lukas of Luftdaten.info). The framework currently does include in a way the existence and emergence of citizen science projects, as that would come up in any stakeholder mapping attempts and in any emerging ecosystem tracking, and as examples of using government open data (as context and background for citizen science measurements). But the framework doesn’t look at the impact of such efforts, not in terms of socio-economic impact and not in terms of government being a potential user of citizen science data. Again the framework is to make visible the impact of government opening up data. But I think it’s not very difficult to adapt the framework to track citizen science project’s impact. Adding citizen science projects in a more direct way, as indicators for the framework itself is harder I think, as it needs more clarification of how it ties into the impact of open government data.

5) Is this based only on papers, or also on approaching groups, and people ‘feeling’ the impact?

This was connected to the citizen science bit. Yes, the framework is based on existing documented material only. And although a range of those base themselves on interviewing or surveying various stakeholders, that is not a default or deliberate part of how the framework was created. I do however recognise the value of for instance participatory narrative inquiry that makes the real experiences of people visible, and the patterns across those experiences. Including that sort of measurements would be useful especially on the social and societal impacts of open data. But currently none of the studies that were re-used in the framework took that approach. It does make me think about how one could set-up something like that to monitor impact e.g. of local government open data initiatives.

At Open Belgium 2019 today Daniel Leufer gave an interesting session on bringing philosophy and technology closer together. He presented the Open Philosophy Network, as an attempt to bring philosophy questions into tech discussions while preventing a) the overly abstract work going on in academia, b) not having all stakeholders at the table in an equal setting. He aims at local gatherings and events. Such as a book reading group, on Shoshana Zuboff’s The Age of Surveillance Capitalism. Or tech-ethics round table discussions where there isn’t a panel of experts that gets interviewed but where philosophers, technologists and people who use the technology are all part of the discussion.

This resonated with me at various levels. One level is that I recognise a strong interest in naive explorations of ethical questions around technology. For instance at our Smart Stuff That Matters unconference last summer, in various conversations ethical discussions emerged naturally from the actual context of the session and the event.
Another is that, unlike some of the academic efforts I know, the step towards practical applicability is expected and needed sooner by many. In the end it all has to inform actions and choices in the here and now, even when nobody expects definitive answers. It is also why I myself dislike how many ethical discussions pretending to be action oriented are primarily connected to future or emergent technologies, not to current technology choices. Then it’s just a fig leaf for inaction, and removing agency. I’m more a pragmatist, and am interested in what achieves actual improvements in the here and now, and what increases agency.
Thirdly I also felt that there are many more connections to make in terms of open session formats, such as Open Space, knowledge cafés, blogwalks, and barcamps, and indeed the living room experience of our birthday unconferences. I’ve organised many of those, and I feel the need to revisit those experiences and think about how to deploy them for something like this.This also applies to formulating a slightly more structured approach to assist groups in organisations with naive ethical explorations.

The point of ethics is not to provide definitive answers, but to prevent us using terrible answers

I hope to interact a bit more with Daniel Leufer in the near future.

Today I gave a brief presentation of the framework for measuring open data impact I created for UNDP Serbia last year, at the Open Belgium 2019 Conference.

The framework is meant to be relatable and usable for individual organisations by themselves, and based on how existing cases, papers and research in the past have tried to establish such impact.

Here are the slides.

This is the full transcript of my presentation:

Last Friday, when Pieter Colpaert tweeted the talks he intended to visit (Hi Pieter!), he said two things. First he said after the coffee it starts to get difficult, and that’s true. Measuring impact is a difficult topic. And he asked about measuring impact: How can you possibly do that? He’s right to be cautious.

Because our everyday perception of impact and how to detect it is often too simplistic. Where’s the next Google the EC asked years ago. but it’s the wrong question. We will only know in 20 years when it is the new tech giant. But today it is likely a small start-up of four people with laptops and one idea, in Lithuania or Bulgaria somewhere, and we are by definition not be able to recognize it, framed this way. Asking for the killer app for open data is a similarly wrong question.

When it comes to impact, we seem to want one straightforward big thing. Hundreds of billions of euro impact in the EU as a whole, made up of a handful of wildly successful things. But what does that actually mean for you, a local government? And while you’re looking for that big impact you are missing all the smaller craters in this same picture, and also the bigger ones if they don’t translate easily into money.

Over the years however, there have been a range of studies, cases and research papers documenting specific impacts and effects. Me and my colleagues started collecting those a long time ago. And I used them to help contextualise potential impacts. First for the Flemish government, and last year for the Serbian government. To show what observed impact in for instance a Spanish sector would mean in the corresponding Belgian context. How a global prediction correlates to the Serbian economy and government strategies.

The UNDP in Serbia, asked me to extend that with a proposal for indicators to measure impact as they move forward with new open data action plans in follow up of the national readiness assessment I did for them earlier. I took the existing studies and looked at what they had tried to measure, what the common patterns are, and what they had looked at precisely. I turned that into a framework for impact measurement.

In the following minutes I will address three things. First what makes measuring impact so hard. Second what the common patterns are across existing research. Third how, avoiding the pitfalls, and using the commonalities we can build a framework, that then in itself is an indicator.Let’s first talk about the things that make measuring impact hard.

Judging by the available studies and cases there are several issues that make any easy answers to the question of open data impact impossible.There are a range of reasons measurement is hard. I’ll highlight a few.
Number 3, context is key. If you don’t know what you’re looking at, or why, no measurement makes much sense. And you can only know that in specific contexts. But specifying contexts takes effort. It asks the question: Where do you WANT impact.

Another issue is showing the impact of many small increments. Like how every Dutch person looks at this most used open data app every morning, the rain radar. How often has it changed a decision from taking the car to taking a bike? What does it mean in terms of congestion reduction, or emission reduction? Can you meaningfully quantify that at all?

Also important is who is asking for measurement. In one of my first jobs, my employer didn’t have email for all yet, so I asked for it. In response the MD asked me to put together the business case for email. This is a classic response when you don’t want to change anything. Often asking for measurement is meant to block change. Because they know you cannot predict the future. Motives shape measurements. The contextualisation of impact elsewhere to Flanders and Serbia in part took place because of this. Use existing answers against such a tactic.

Maturity and completeness of both the provision side, government, as well as the demand side, re-users, determine in equal measures what is possible at all, in terms of open data impact. If there is no mature provision side, in the end nothing will happen. If provision is perfect but demand side isn’t mature, it still doesn’t matter. Impact demands similar levels of maturity on both sides. It demands acknowledging interdependencies. And where that maturity is lacking, tracking impact means looking at different sets of indicators.

Measurements often motivate people to game the system. Especially single measurements. When number of datasets was still a metric for national portals the French opened with over 350k datasets. But really it was just a few dozen, which they had split according to departments and municipalities. So a balance is needed, with multiple indicators that point in different directions.

Open data, especially open core government registers, can be seen as infrastructure. But we actually don’t know how infrastructure creates impact. We know that building roads usually has a certain impact (investment correlates to a certain % rise in GDP), but we don’t know how it does so. Seeing open data as infrastructure is a logical approach (the consensus seems that the potential impact is about 2% of GDP), but it doesn’t help us much to measure impact or see how it creates that.

Network effects exist, but they are very costly to track. First order, second order, third order, higher order effects. We’re doing case studies for ESA on how satellite data gets used. We can establish network effects for instance how ice breakers in the Botnian gulf use satellite data in ways that ultimately reduce super market prices, but doing 24 such cases is a multi year effort.

E puor si muove! Galileo said Yet still it moves. The same is true for open data. Most measurements are proxies. They show something moving, without necessarily showing the thing that is doing the moving. Open data often is a silent actor, or a long range one. Yet still it moves.

Yet still it moves. And if we look at the patterns of established studies, that is what we indeed see. There are communalities in what movement we see. In the list on the slide the last point, that open data is a policy instrument is key. We know publishing data enables other stakeholders to act. When you do that on purpose you turn open data into a policy instrument. The cheapest one you have next to regulation and financing.

We all know the story of the drunk that lost his keys. He was searching under the light of a street lamp. Someone who helped him else asked if he lost the keys there. No, the drunk said, but at least there is light here. The same is true for open data. If you know what you published it for, at least you will be able to recognise relevant impact, if not all the impact it creates. Using it as policy instrument is like switching on the lights.

Dealing with lack of maturity means having different indicators for every step of the way. Not just seeing if impact occurs, but also if the right things are being done to make impact possible: Lead and lag indicators

The framework then is built from what has been used to establish impact in the past, and what we see in our projects as useful approaches. The point here is that we are not overly simplifying measurement, but adapt it to whatever is the context of a data provider or user. Also there’s never just one measurement, so a balanced approach is possible. You can’t game the system. It covers various levels of maturity from your first open dataset all the way to network effects. And you see that indicators that by themselves are too simple, still can be used.

Additionally the framework itself is a large scale sensor. If one indicator moves, you should see movement in other indicators over time as well. If you throw a stone in the pond, you should see ripples propagate. This means that if you start with data provision indicators only, you should see other measurements in other phases pick up. This allows you to both use a set of indicators across all phases, as well as move to more progressive ones when you outgrow the initial ones.finally some recommendations.

Some final thoughts. If you publish by default as integral part of processes, measuring impact, or building a business case is not needed as such. But measurement is very helpful in the transition to that end game. Core data and core policy elements, and their stakeholders are key. Measurement needs to be designed up front. Using open data as policy instrument lets you define the impact you are looking for at the least. The framework is the measurement: Only micro-economic studies really establish specific economic impact, but they only work in mature situations and cost a lot of effort, so you need to know when you are ready for them. But measurement can start wherever you are, with indicators that reflect the overall open data maturity level you are at, while looking both back and forwards. And because measurement can be done, as a data holder you should be doing it.

Kilroy black edited

Social geolocation services over the years have been very useful for me. The value is in triggering serendipitous meetings: being in a city outside my normal patterns at the same time someone in (or peripheral to) my network is in the city too, outside their normal patterns. It happened infrequently, about once a year, but frequently enough to be useful and keep checking in. I was a heavy user of Plazes and Dopplr, both long since disappeared. As with other social platforms I and my data quickly became the marketable product, instead of the customer. So ultimately I stopped using Foursquare/Swarm much, only occasionally for international travel, and completely in 2016. Yet I still long for that serendipitous effect, so I am looking to make my location and/or travel plans available, for selected readers, through this site.

There are basically three ways in which I could do that.
1) The POSSE way. I post my location or travel plan on this blog, and it gets shared to platforms like Foursquare, and through RSS. I would need to be able to show these postings only to my followers/ readers, and have a password protected RSS feed and subscription workflow.
2) The PESOS way. I use an existing platform to create my check-ins, like Foursquare, and share that back to my blog. Where it is only accessible for followers/readers, and has a password protected rss feed.
3) The ‘just my’ way. I use only my blog to create check-ins and share them selectively with followers and readers, and have a password protected rss feed for it.

Option 3 is the one that provides the most control over my data, but likely limits the way in which I can allow others to follow me, and needs a flexible on-the-go way to add check-ins through mobile.
Option 2 is the one that comes with easy mobile apps, allows followers to use their own platform apps to do so, as well as through my site.
Option 1 is the one that is in between those two. It has the problems of option 3, but still allows others to use their own platforms like in option 2.

I decided to try and do both Option 2, and Option 3. If I can find a way to make Option 3 work well, getting to Option 1 is an extension of it.
Option 2 at first glance was the easiest to create. This because Aaron Parecki already created ‘Own Your Swarm‘ (OYS) which is a bridge between my existing Foursquare/Swarm account and Micropub, an open protocol for which my site has an endpoint. It means I can let OYS talk both to my Swarm account and my site, so that it posts something to this blog every time I check-in in Swarm on my mobile. OYS not just posts the check-ins but also keeps an eye on my Swarm check-ins, so that when there are comments or likes, they too get reflected to my blog.

My blog uses the Posts Kinds plugin, that has a posting type for check-ins, so they get their own presentation in the blog. OYS allows me to automatically tag what it posts, which gets matched to the existing categories and tags in my blog.

I from now on use a separate category for location related postings, called plazes. Plazes was the original geolocation app I started using in 2004, when co-founder Felix Petersen showed it to me on the very first BlogWalk I co-organised in the Netherlands. Plazes also was the first app to quickly show me the value of creating serendipitous meetings. So as an expression of geo-serendipic (serendipity-epic?) nostalgia, I named the postings category after it.

Dave Winer writes “we all feel disempowered“:

… people who feel disempowered figure there’s nothing they can do, no one would listen to me anyway, so I’ll just go on doing what I do. I know I feel that way.

He’s talking in the context of the US political landscape, but it applies in general too. Part of the solution he suggests is to

Invest in local news. And btw, I have a lot more to invest than money.

Two things stand out for me.

One is, that we’ve come accustomed to view everything through the lens of individualism. Yes, we’ve gained much from individualism, but by now we’ve also landed in a false dichotomy. The false dichotomy is the presumption that you need to solve something as an individual, or if you individually can’t then all is lost. It puts all responsibility for any change on the individual, while it is clear no-one can change the world on their own. It pitches individuals against society as a whole, but ignores the intermediate level: groups with agency.

The second false dichotomy is the choice between either the (hyper)local or the global. You remove litter from your street, or you set out to save the ozone layer. Here again there’s a bridge possible between those two extremes, the (hyper)local and the global. Where you do something useful locally that also has some impact on a global issue. Or where you translate a global issue to how it manifests locally and solves a local need. You can worry about global fossil fuel use and with a cooperative in your area generate green energy. You can run your own parts of a global infrastructure, while basically only looking to create a local service. It is not either local or global. It can be local action, leveraging the opportunities global connection brings, or to mitigate the fall-out of global issues. It can be global, as scaling of local efforts.

Local / global, individual / society aren’t opposites, they’re layers. Complexity resides in that layeredness. To help deal with complexity the intermediate levels between the individual and the masses, bridging the local and the global (note: the national level is not that bridge) is what counts. The false dichotomies, and the narratives they are used in, obscure that, and create disempowerment that way.

Disempowerment is a kind of despair. The answer to despair isn’t hope but action. Networked agency, looks at groups in context to solve their own issues, in the full awareness of the global networks that surrounds us. Group action in its own context, overlapping into other contexts, layered into global context, like Russian dolls.