This week NBC published an article exploring the source of training data sets for facial recognition. It makes the claim that we ourselves are providing, without consent, the data that may well be used to put us under surveillance.

In January IBM made a database available for research into facial recognition algorithms. The database contains some 1 million face descriptions that can be used as a training set. Called “Diversity in Faces” the stated aim is to reduce bias in current facial recognition abilities. Such bias is rampant often due to too small and too heterogenous (compared to the global population) data sets used in training. That stated goal is ethically sound it seems, but the means used to get there raises a few questions with me. Specifically if the means live up to the same ethical standards that IBM says it seeks to attain with the result of their work. This and the next post explore the origins of the DiF data, my presence in it, and the questions it raises to me.

What did IBM collect in “Diversity in Faces”?
Let’s look at what the data is first. Flickr is a photo sharing site, launched in 2004, that started supporting publishing photos with a Creative Commons license from early on. In 2014 a team led by Bart Thomee at Yahoo, which then owned Flickr, created a database of 100 million photos and videos with any type of Creative Commons license published in previous years on Flickr. This database is available for research purposes and known as the ‘YFCC-100M’ dataset. It does not contain the actual photos or videos per se, but the static metadata for those photos and videos (urls to the image, user id’s, geo locations, descriptions, tags etc.) and the Creative Commons license it was released under. See the video below published at the time:

YFCC100M: The New Data in Multimedia Research from CACM on Vimeo.

IBM used this YFCC-100M data set as a basis, and selected 1 million of the photos in it to build a large collection of human faces. It does not contain the actual photos, but the metadata of that photo, and a large range of some 200 additional attributes describing the faces in those photos, including measurements and skin tones. Where YFC-100M was meant to train more or less any image recognition algorithm, IBM’s derivative subset focuses on faces. IBM describes the dataset in their Terms of Service as:

a list of links (URLs) of Flickr images that are publicly available under certain Creative Commons Licenses (CCLs) and that are listed on the YFCC100M dataset (List of URLs together with coding schemes aimed to provide objective measures of human faces, such as cranio-facial features, as well as subjective annotations, such as human-labeled annotation predictions of age and gender(“Coding Schemes Annotations”). The Coding Schemes Annotations are attached to each URL entry.

My photos are in IBM’s DiF
NBC, in their above mentioned reporting on IBM’s DiF database, provide a little tool to determine if photos you published on Flickr are in the database. I am an intensive user of Flickr since early 2005, and published over 25.000 photos there. A large number of those carry a Creative Commons license, BY-NC-SA, meaning that as long as you attribute me, don’t use an image commercially and share your result under the same license you’re allowed to use my photos. As the YFCC-100M covers the years 2004-2014 and I published images for most of those years, it was likely my photos are in it, and by extension likely my photos are in IBM’s DiF. Using NBC’s tool, based on my user name, it turns out 68 of my photos are in IBM’s DiF data set.

One set of photos that apparently is in IBM’s DiF cover the BlogTalk Reloaded conference in Vienna in 2006. There I made various photos of participants and speakers. The NBC tool I mentioned provides one photo from that set as an example:

Thomas Burg

My face is likely in IBM’s DiF
Although IBM doesn’t allow a public check who is in their database, it is very likely that my face is in it. There is a half-way functional way to explore the YFCC-100M database, and DiF is derived from the YFCC-100M. It is reasonable to assume that faces that can be found in YFCC-100M are to be found in IBM’s DiF. The German university of Kaiserslautern at the time created a browser for the YFCC-100M database. Judging by some tests it is far from complete in the results it shows (for instance if I search for my Flickr user name it shows results that don’t contain the example image above and the total number of results is lower than the number of my photos in IBM’s DiF) Using that same browser to search for my name, and for Flickr user names that are likely to have taken pictures of me during the mentioned BlogTalk conference and other conferences, show that there is indeed a number of pictures of my face in YFCC-100M. Although the limited search in IBM’s DiF possible with NBC’s tool doesn’t return any telling results for those Flickr user names. it is very likely my face is in IBM’s DiF therefore. I do find a number of pictures of friends and peers in IBM’s DiF that way, taken at the same time as pictures of myself.


Photos of me in YFCC-100M

But IBM won’t tell you
IBM is disingenuous when it comes to being transparent about what is in their DiF data. Their TOS allows anyone whose Flickr images have been incorporated to request to be excluded from now on, but only if you can provide the exact URLs of the images you want excluded. That is only possible if you can verify what is in their data, but there is no public way to do so, and only university affiliated researchers can request access to the data by stating their research interest. Requests can be denied. Their TOS says:

3.2.4. Upon request from IBM or from any person who has rights to or is the subject of certain images, Licensee shall delete and cease use of images specified in such request.

Time to explore the questions this raises
Now that the context of this data set is clear, in a next posting we can take a closer look at the practical, legal and ethical questions this raises.

Kars Alfrink pointed me to a report on AI Ethics by the Nuffield Foundation, and from it lifts a specific quote, adding:

Good to see people pointing this out: “principles alone are not enough. Instead of representing the outcome of meaningful ethical debate, to a significant degree they are just postponing it”

This postponing of things, is something I encounter all the time. In general I feel that many organisations who claim to be looking at ethics of algorithms, algorithmic fairness etc, currently actually don’t have anything to do with AI, ML or complicated algorithms. To me it seems they just do it to place the issue of ethics well into the future, that as yet unforeseen point they will actually have to deal with AI and ML. That way they prevent having to look at ethics and de-biasing their current work, how they now collect, process data and the governance processes they have.

This is not unique to AI and ML though. I’ve seen it happen with open data strategies too. Where the entire open data strategy of for instance a local authority was based on working with universities and research entities to figure out how decades after now data might play a role. No energy was spent on how open data might be an instrument in dealing with actual current policy issues. Looking at future issues as fig leaf to not deal with current ones.

This is qualitatively different from e.g. what we see in the climate debates, or with smoking, where there is a strong current to deny the very existence of issues. In this case it is more about being seen to solve future issues, so no-one notices you’re not addressing the current ones.

To me there seems to be something fundamentally wrong with plans I come across where companies would pay people for access to their personal data. This is not a well articulated thing, it just feels like the entire framing of the issue is off, so the next paragraphs are a first attempt to jot down a few notions.

To me it looks very much like a projection by companies on people of what companies themselves would do: treating data as an asset you own outright and then charging for access. So that those companies can keep doing what they were doing with data about you. It doesn’t strike me as taking the person behind that data as the starting point, nor their interests. The starting point of any line of reasoning needs to be the person the data is about, not the entity intending to use the data.

Those plans make data release, or consent for using it, fully transactional. There are several things intuitively wrong with this.

One thing it does is put everything in the context of single transactions between individuals like you and me, and the company wanting to use data about you. That seems to be an active attempt to distract from the notion that there’s power in numbers. Reducing it to me dealing with a company, and you dealing with them separately makes it less likely groups of people will act in concert. It also distracts from the huge power difference between me selling some data attributes to some corp on one side, and that corp amassing those attributes over wide swaths of the population on the other.

Another thing is it implies that the value is in the data you likely think of as yours, your date of birth, residence, some conscious preferences, type of car you drive, health care issues, finances etc. But a lot of value is in data you actually don’t have about you but create all the time: your behaviour over time, clicks on a site, reading speed and pauses in an e-book, minutes watched in a movie, engagement with online videos, the cell towers your phone pinged, the logs about your driving style of your car’s computer, likes etc. It’s not that the data you’ll think of as your own is without value, but that it feels like the magician wants you to focus on the flower in his left hand, so you don’t notice what he does with his right hand.
On top of that it also means that whatever they offer to pay you will be too cheap: your data is never worth much in itself, only in aggregate. Offering to pay on individual transaction basis is an escape for companies, not an emancipation of citizens.

One more element is the suggestion that once such a transaction has taken place everything is ok, all rights have been transferred (even if limited to a specific context and use case) and that all obligations have been met. It strikes me as extremely reductionist. When it comes to copyright authors can transfer some rights, but usually not their moral rights to their work. I feel something similar is at play here. Moral rights attached to data that describes a person, which can’t be transferred when data is transacted. Is it ok to manipulate you into a specific bubble and influence how you vote, if they paid you first for the type of stuff they needed to be able to do that to you? The EU GDPR I think takes that approach too, taking moral rights into account. It’s not about ownership of data per se, but the rights I have if your data describes me, regardless of whether it was collected with consent.

The whole ownership notion is difficult to me in itself. As stated above, a lot of data about me is not necessarily data I am aware of creating or ‘having’, and likely don’t see a need for to collect about myself. Unless paying me is meant as incentive to start collecting stuff about me for the sole purpose of selling it to a company, who then doesn’t need my consent nor make the effort to collect it about me themselves. There are other instances where me being the only one able to determine to share some data or withhold it mean risks or negative impact for others. It’s why cadastral records and company beneficial ownership records are public. So you can verify that the house or company I’m trying to sell you is mine to sell, who else has a stake or claim on the same asset, and to what amount. Similar cases might be made for new and closely guarded data, such as DNA profiles. Is it your sole individual right to keep those data closed, or has society a reasonable claim to it, for instance in the search for the cure for cancer? All that to say, that seeing data as a mere commodity is a very limited take, and that ownership of data isn’t a clear cut thing. Because of its content, as well as its provenance. And because it is digital data, meaning it has non-rivalrous and non-excludable characteristics, making it akin to a public good. There is definitely a communal and network side to holding, sharing and processing data, currently conveniently ignored in discussions about data ownership.

In short talking about paying for personal data and data lockers under my control seem to be a framing that presents data issues as straightforward but doesn’t solve any of data’s ethical aspects, just pretends that it’s taken care of. So that things may continue as usual. And that’s even before looking into the potential unintended consequences of payments.

This is a very interesting article to read. A small French adtech company Vectaury has been ordered to stop using and delete the personal data of tens of millions of Europeans, as it cannot show proper consent as required under the GDPR. Of interest here is that Vectaury tried to show consent using a branche wide template by IAB. A French judge has ruled this is not enough. This is an early sign that as Doc Searls says GDPR is able to, though at the speed of legal proceedings, put a stake through the heart of ad-tech. Provided enforcement goes forward.

A month after the verdict, Vectaury’s website still proudly claims that they’re GDPR compliant because they use the concept of a ‘consent management provider’. Yet that is exactly what has now been ruled as not enough to show actual consent.

This Twitter thread by NYT’s Robin Berjon about the case is also interesting.

Some things I thought worth reading in the past days

  • A good read on how currently machine learning (ML) merely obfuscates human bias, by moving it to the training data and coding, to arrive at peace of mind from pretend objectivity. Because of claiming that it’s ‘the algorithm deciding’ you make ML a kind of digital alchemy. Introduced some fun terms to me, like fauxtomation, and Potemkin AI: Plausible Disavowal – Why pretend that machines can be creative?
  • These new Google patents show how problematic the current smart home efforts are, including the precursor that are the Alexa and Echo microphones in your house. They are stripping you of agency, not providing it. These particular ones also nudge you to treat your children much the way surveillance capitalism treats you: as a suspect to be watched, relationships denuded of the subtle human capability to trust. Agency only comes from being in full control of your tools. Adding someone else’s tools (here not just Google but your health insurer, your landlord etc) to your home doesn’t make it smart but a self-censorship promoting escape room. A fractal of the panopticon. We need to start designing more technology that is based on distributed use, not on a centralised controller: Google’s New Patents Aim to Make Your Home a Data Mine
  • An excellent article by the NYT about Facebook’s slide to the dark side. When the student dorm room excuse “we didn’t realise, we messed up, but we’ll fix it for the future” defence fails, and you weaponise your own data driven machine against its critics. Thus proving your critics right. Weaponising your own platform isn’t surprising but very sobering and telling. Will it be a tipping point in how the public views FB? Delay, Deny and Deflect: How Facebook’s Leaders Fought Through Crisis
  • Some of these takeaways from the article just mentioned we should keep top of mind when interacting with or talking about Facebook: FB knew very early on about being used to influence the US 2016 election and chose not to act. FB feared backlash from specific user groups and opted to unevenly enforce their terms or service/community guidelines. Cambridge Analytica is not an isolated abuse, but a concrete example of the wider issue. FB weaponised their own platform to oppose criticism: How Facebook Wrestled With Scandal: 6 Key Takeaways From The Times’s Investigation
  • There really is no plausible deniability for FB’s execs on their “in-house fake news shop” : Facebook’s Top Brass Say They Knew Nothing About Definers. Don’t Believe Them. So when you need to admit it, you fall back on the ‘we messed up, we’ll do better going forward’ tactic.
  • As Aral Balkan says, that’s the real issue at hand because “Cambridge Analytica and Facebook have the same business model. If Cambridge Analytica can sway elections and referenda with a relatively small subset of Facebook’s data, imagine what Facebook can and does do with the full set.“: We were warned about Cambridge Analytica. Why didn’t we listen?
  • [update] Apparently all the commotion is causing Zuckerberg to think FB is ‘at war‘, with everyone it seems, which is problematic for a company that has as a mission to open up and connect the world, and which is based on a perception of trust. Also a bunker mentality probably doesn’t bode well for FB’s corporate culture and hence future: Facebook At War.

This is a naive exercise to explore what ethics by design would look like for networked agency. There’s plenty of discussion about ethics by design in various places. Mostly in machine learning, where algorithmic bias is a very real issue already, and where other discussions such as around automated driving are misguided for lack of imagination and scope. It’s also an ongoing concern in adtech, especially since we know business practices don’t limit themselves to selling you stuff but also deceive you to sell political ideas. Data governance is an area where I encounter ethics by design as a topic on a regular basis, in decisions on what data to collect or not, and in questions of balancing or combining the need for transparency with the need for data protection. But I want to leave that aside, also because many organisations in those areas already have failed their customers and users. Which would make this posting a complaint and not constructive.

My current interest is in exploring what ethics means, and can be done by design, in the context of networked agency, and by extension a new civil society emerging in distributed digital transformation. A naive approach helps me find a first batch of questions and angles.

The notions that are the building blocks of networked agency are a starting point. Ethical questions follow directly from those building blocks.

First there are the building blocks related to the agency element in networked agency. These are technology and methods/processes, striking power, resilience and agility.
a) For the technologies and methods/processes involved, relevant are issues relating to who controls those tools, how these tools can be deployed by their users, and if a user group can alter the tools, adapt them to new needs and tinker with them.
b) Low thresholds of adoption need an exploration of what those thresholds are and how they play out for different groups. These are thresholds of technological and financial nature, but also barriers concerning knowledge, practicality, usability, and understandability.
c) Striking power, the actual acting part of agency provides questions about if a tool provides actual agency, and isn’t actually a pacifier. Not every action or activity constitutes agency. It’s why words like slacktivism and clicktivism have emerged.
d) Resilience in networked agency is about reducing the vulnerability to propagating failures from outside the group, and the manner in which mitigation is possible. Reduction of critical dependencies outside the group’s scope of control is something to consider here. That also works in reverse. Are you creating dependencies for others? In a similar vein, are you externalising costs onto others? Are you causing unintended consequences elsewhere, and can you be aware of them arising, or pre-empt them?
e) Agility in networked agency is about spotting and leveraging opportunities relative to your own needs in your wider network. Are you able to do that from a constructive perspective, or only a competitive/scarcity one? Do your opportunities come at the cost of other groups? When you leverage opportunities are you externalising costs or claiming exclusivity? In a networked environment externalising costs will return as feedback to your system. Networks almost by definition are endless repeats of the prisoners dilemma. Another side of this is which ways exist in which you can provide leverage to others simultaneously to creating your own, or when to be the lever in a situation.

Second there are notions that follow from the networked part of networked agency. The unit of agency in networked agency is a group of people that share some relationship (team, family, org, location, interest, history, etc), that together act upon a need shared across that group. This introduces three levels to evaluate ethical questions on, at the level of the individual in a group, at the level of the group itself, and between groups in a network. Group dynamics are thus firmly put into focus: power, control, ownership, voice, inclusion, decision making, conflict resolution, dependencies within a group, reciprocity, mutuality, verifiability, boundaries, trust, contributions, engagement, and reputations.
This in part translates back to the agency part, in terms of technology and skills to work with it. Skills won’t be evenly distributed in groups seeking agency, so potentially introduce power asymmetries, when unique capabilities mean de-facto gatekeepers or single points of failure are introduced. These may be counteracted with some mutual dependencies perhaps. More likely operational transparency in a group is of more importance so that the group can see such issues arise and calling them out is a normal thing to do, not something that has a threshold in itself. Operational transparency might build on an obligation to explain, which also is a logical element in ensuring (networked) agility.

The above output of this first exercise I will try and put in an overview. Not sure what will be useful here, a tree-like map, or a network, or a matrix. A next step is fleshing out the ethical issues in play. Then projecting them on for instance specific technologies, methods and group settings, to see what specific actions or design principles emerge from that.

Aaron Swartz would have turned 32 November 8th. He died five years and 10 months ago, and since then, like this weekend, the annual Aaron Swartz weekend takes place with all kinds of hackathons and events in his memory. At the time of his suicide Swartz was being prosecuted for downloading material in bulk from JSTOR, a scientific papers archive (even though he had legitimate access to it).

In 2014 the Smart New World exhibition took place in Kunsthalle Düsseldorf, which Elmine and I visited. Part of it was the installation “18.591 Articles Sold By JSTOR for $19 = $353.229” with those 18.591 articles printed out, showing what precisely is behind the paywall, and what Swartz was downloading. Articles, like those shown, from the 19th century, since long in the public domain, sold for $19 each. After Swartz’ death JSTOR started making a small percentage of their public domain content freely accessible, limited to a handful papers per month.

The Düsseldorf exhibit was impressive, as it showed the volumes of material, but the triviality of most material too. It’s a long tail of documents with extremely low demand, being treated equally as recent papers in high demand.

Smart New World

Smart New World Smart New World
Smart New World Smart New World
Smart New World

Scientific journal publishers are increasingly a burden on the scientific world, rent-seeking gatekeepers. Their original value added role, that of multiplication and distribution to increase access, has been completely eroded, if not actually fully reversed.

This is a start to more fully describe and explore a distributed version of digitisation, digitalisation and specifically digital transformation, and state why I think bringing distributed / networked thinking into them matters.

Digitising stuff, digitalising routines, the regular way

Over the past decades much more of the things around us became digitised, and in recent years much of the things we do, our daily routines and work processes, have become digitalised. Many of those digitalised processes are merely digitised replicas of their paper predecessors. Asking for a government permit for instance, or online banking. There’s nothing there that wasn’t there in the paper version. Sometimes even small steps in those processes still force you to use paper. At the start of this year I had to apply for a declaration that my company had never been involved in procurement fraud. All the forms I needed for it (30 pages in total!), were digitised and I filled them out online, but when it came to sending it in, I had to print the PDF resulting from those 30 pages, and send it through snail mail. I have no doubt that the receiving government office’s first step was to scan it all before processing it. Online banking similarly is just a digitised paper process. Why don’t all online bank accounts provide nifty visualisation, filtering and financial planning tools (like alerts for dates due, saving towards a goal, maintaining a buffer etc.), now that everything is digital? The reason we laugh at Little Britains ‘computer says no’ sketches, is because we recognise all too well the frustration of organisations blindly trusting their digitalised processes, and never acknowledging or addressing their crappy implementation, or the extra work and route-arounds their indifference inflicts.

Digital transformation, digital societies

Digital transformation is the accumulated societal impact of all those digital artefacts and digitalised processes, even if they’re incomplete or half-baked. Digital transformation is why I have access to all those books in the long tail that never reached the shelves of any of the book shops I visited in decades part, yet now come to my e-reader instantly, resulting in me reading more and across a wider spectrum than ever before. Digital transformation is also the impact on elections that almost individually targeted data-driven Facebook advertising caused by minutely profiling undecided voters.

Digital transformation is often referred to these days, in my work often also in the context of development and the sustainable development goals.
Yet, it often feels to me that for most intents and purposes this digital transformation is done to us, about us but not of us. It’s a bit like the smart city visions corporations like Siemens and Samsung push(ed), that were basically devoid of life and humanity. Quality of life reduced and equated to security only, in sterilised cities, ignoring that people are the key actors, as critiqued by Adam Greenfield in 2013.

Human digital networks: distributed digital transformation

The Internet is a marvellous thing. At least it is when we use it actively, to assist us in our routines and in our efforts to change, learn and reach out. As social animals, our human interaction has always been networked where we fluently switch between contexts, degrees of trust and disclosure, and routing around undesired connections. In that sense human interaction and the internet’s original design principle closely match up, they’re both distributed. In contrast most digitalisation and digital transformation happens from the perspective of organisations and silos. Centralised things, where some decide for the many.

To escape that ‘done to us, about us, not of us’, I think we need to approach digitisation, digitalisation and digital transformation from a distributed perspective, matching up our own inherently networked humanity with our newly (since 30 yrs) networked global digital infrastructure. We need to think in terms of distributed digital transformation. Distributed digital transformation (making our own digital societal impact), building on distributed digitisation (making our things digital), and on distributed digitalisation (making our routines digital).

Signs of distributed digitisation and digitalisation

Distributed digitisation can already be seen in things like the quantified self movement, where individuals create data around themselves to use for themselves. Or in the sensors I have in the garden. Those garden measurements are part of something you can call distributed digitalisation, where a network of similar sensors create a map of our city that informs climate adaptation efforts by local government. My evolving information strategies, with a few automated parts, and the interplay of different protocols and self-proposed standards that make up the Indieweb also are examples of distributed digitalisation. My Networked Agency framework, where small groups of relationships fix something of value with low threshold digital technology, and network/digital based methods and processes, is distributed digitisation and distributed digitalisation combined into a design aid for group action.

Distributed digital transformation needs a macroscope for the new civil society

Distributed digital transformation, distributed societal impact seems a bit more elusive though.
Civil society is increasingly distributed too, that to me is clear. New coops, p2p groups, networks of individual actors emerge all over the world. However they are largely invisible to for instance the classic interaction between government and the incumbent civil society, and usually cut-off from the scaffolding and support structures that ‘classic’ activities can build on to get started. Because they’re not organised ‘the right way’, not clearly representative of a larger whole. Bootstrapping is their only path. As a result these initiatives are only perceived as single elements, and the scale they actually (can) achieve as a network remains invisible. Often even in the eyes of those single elements themselves.

Our societies, including the nodes that make up the network of this new type of civil society, lack the perception to recognise the ‘invisible hand of networks’. A few years ago already I discussed with a few people, directors of entities in that new civil society fabric, how it is that we can’t seem to make our newly arranged collective voices heard, our collective efforts and results seen, and our collective power of agency recognised and sought out for collaboration? We’re too used, it seems, to aggregating all those things, collapsing them into a single voice of a mouthpiece that has the weight of numbers behind it, in order to be heard. We need to learn to see the cumulative impact of a multitude of efforts, while simultaneously keeping all those efforts visible on their own. There exist so many initiatives I think that are great examples of how distributed digitalisation leads to transformation, but they are largely invisible outside their own context, and also not widely networked and connected enough to reach their own full potential. They are valuable on their own, but would be even more valuable to themselves and others when federated, but the federation part is mostly missing.
We need to find a better way to see the big picture, while also seeing all pixels it consists of. A macroscope, a distributed digital transformation macroscope.

We’re in a time where whatever is presented to us as discourse on Facebook, Twitter or any of the other platforms out there, may or may not come from humans, bots, or someone/a group with a specific agenda irrespective of what you say or respond. We’ve seen it at the political level, with outside influences on elections, we see it in things like gamer gate, and in critiques of the last Star Wars movie. It creates damage on a societal level, and it damages people individually. To quote Angela Watercutter, the author of the mentioned Star Wars article,

…it gets harder and harder to have an honest discussion […] when some of the speakers are just there to throw kerosene on a flame war. And when that happens, when it’s impossible to know which sentiments are real and what motivates the people sharing them, discourse crumbles. Every discussion […] could turn into a […] fight — if we let it.

Discourse disintegrates I think specifically when there’s no meaningful social context in which it takes place, nor social connections between speakers in that discourse. The effect not just stems from that you can’t/don’t really know who you’re conversing with, but I think more importantly from anyone on a general platform being able to bring themselves into the conversation, worse even force themselves into the conversation. Which is why you never should wade into newspaper comments, even though we all read them at times because watching discourse crumbling from the sidelines has a certain addictive quality. That this can happen is because participants themselves don’t control the setting of any conversation they are part of, and none of those conversations are limited to a specific (social) context.

Unlike in your living room, over drinks in a pub, or at a party with friends of friends of friends. There you know someone. Or if you don’t, you know them in that setting, you know their behaviour at that event thus far. All have skin in the game as well misbehaviour has immediate social consequences. Social connectedness is a necessary context for discourse, either stemming from personal connections, or from the setting of the place/event it takes place in. Online discourse often lacks both, discourse crumbles, entropy ensues. Without consequence for those causing the crumbling. Which makes it fascinating when missing social context is retroactively restored, outing the misbehaving parties, such as the book I once bought by Tinkebell where she matches death threats she received against the sender’s very normal Facebook profiles.

Two elements therefore are needed I find, one in terms of determining who can be part of which discourse, and two in terms of control over the context of that discourse. They are point 2 and point 6 in my manifesto on networked agency.

  • Our platforms need to mimick human networks much more closely : our networks are never ‘all in one mix’ but a tapestry of overlapping and distinct groups and contexts. Yet centralised platforms put us all in the same space.
  • Our platforms also need to be ‘smaller’ than the group using it, meaning a group can deploy, alter, maintain, administrate a platform for their specific context. Of course you can still be a troll in such a setting, but you can no longer be one without a cost, as your peers can all act themselves and collectively.
  • This is unlike on e.g. FB where the cost of defending against trollish behaviour by design takes more effort than being a troll, and never carries a cost for the troll. There must, in short, be a finite social distance between speakers for discourse to be possible. Platforms that dilute that, or allow for infinite social distance, is where discourse can crumble.

    This points to federation (a platform within control of a specific group, interconnected with other groups doing the same), and decentralisation (individuals running a platform for one, and interconnecting them). Doug Belshaw recently wrote in a post titled ‘Time to ignore and withdraw?‘ about how he first saw individuals running their own Mastodon instance as quirky and weird. Until he read a blogpost of Laura Kalbag where she writes about why you should run Mastodon yourself if possible:

    Everything I post is under my control on my server. I can guarantee that my Mastodon instance won’t start profiling me, or posting ads, or inviting Nazis to tea, because I am the boss of my instance. I have access to all my content for all time, and only my web host or Internet Service Provider can block my access (as with any self-hosted site.) And all blocking and filtering rules are under my control—you can block and filter what you want as an individual on another person’s instance, but you have no say in who/what they block and filter for the whole instance.

    Similarly I recently wrote,

    The logical end point of the distributed web and federated services is running your own individual instance. Much as in the way I run my own blog, I want my own Mastodon instance.

    I also do see a place for federation, where a group of people from a single context run an instance of a platform. A group of neighbours, a sports team, a project team, some other association, but always settings where damaging behaviour carries a cost because social distance is finite and context defined, even if temporary or emergent.

    Last week the 2nd annual Techfestival took place in Copenhagen. As part of this there was a 48 hour think tank of 150 people (the ‘Copenhagen 150‘), looking to build the Copenhagen Catalogue, as a follow-up of last year’s Copenhagen Letter of which I am a signee. Thomas, initiator of the Techfestival had invited me to join the CPH150 but I had to decline the invitation, because of previous commitments I could not reschedule. I’d have loved to contribute however, as the event’s and even more the think tank’s concerns are right at the heart of my own. My concept of networked agency and the way I think about how we should shape technology to empower people in different ways runs in parallel to how Thomas described the purpose of the CPH150 48 hour think tank at its start last week.

    For me the unit of agency is the individual and a group of meaningful relationships in a specific context, a networked agency. The power to act towards meaningful results and change lies in that group, not in the individual. The technology and methods that such a group deploys need to be chosen deliberately. And those tools need to be fully within scope of the group itself. To control, alter, extend, tinker, maintain, share etc. Such tools therefore need very low adoption thresholds. Tools also need to be useful on their own, but great when federated with other instances of those tools. So that knowledge and information, learning and experimentation can flow freely, yet still can take place locally in the (temporary) absence of such wider (global) connections. Our current internet silos such as Facebook and Twitter clearly do not match this description. But most other technologies aren’t shaped along those lines either.

    As Heinz remarked earlier musing about our unconference, effective practices cannot be separated from the relationships in which you live. I added that the tools (both technology and methods) likewise cannot be meaningfully separated from the practices. Just like in the relationships you cannot fully separate between the hyperlocal, the local, regional and global, due to the many interdependencies and complexity involved: what you do has wider impact, what others do and global issues express themselves in your local context too.

    So the CPH150 think tank effort to create a list of principles that takes a human and her relationships as the starting point to think about how to design tools, how to create structures, institutions, networks fits right with that.

    Our friend Lee Bryant has a good description of how he perceived the CPH150 think tank, and what he shared there. Read the whole thing.

    Meanwhile the results are up: 150 principles called the Copenhagen Catalogue, beautifully presented. You can become signatory to those principles you deem most valuable to stick to.

    At State of the Net 2018 in Trieste Hossein Derakshan (h0d3r on Twitter) talked about journalism and its future. Some of his statements stuck with me in the past weeks so yesterday I took time to watch the video of his presentation again.

    In his talk he discussed the end of news. He says that discussions about the erosion of business models in the news business, quality of news, trust in sources and ethics are all side shows to a deeper shift. A shift that is both cultural and social. News is a two century old format, representative of the globalisation of communications with the birth of the telegraph. All of a sudden events from around the globe were within your perspective, and being informed made you “a man of the world”. News also served as a source of drama in our lives. “Did you hear,…”. These days those aspects of globalisation, time and drama have shifted.
    Local, hyperlocal, has become more important again at the cost of global perspectives, which Hossein sees taking place in things like buying local, but also in Facebook to keep up with the lives of those around you. Similarly identity politics reduces the interest in other events to those pertaining to your group. Drama shifted away from news to performances and other media (Trumps tweets, memes, our representation on social media platforms). News and time got disentangled. Notifications and updates come at any time from any source, and deeper digging content is no longer tied to the news cycle. Journalism like the Panama Papers takes a long time to produce, but can also be published at any time without that having an impact on its value or reception.

    News and journalism have become decoupled. News has become a much less compelling format, and in the words of Derakshan is dying if not dead already. With the demise of text and reason and the rise of imagery and emtions, the mess that journalism is in, what formats can journalism take to be all it can be?

    Derakshan points to James Carey who said Democracy and Journalism are the same thing, as they are both defined as public conversation. Hossein sees two formats in which journalism can continue. One is literature, long-form non-fiction. This can survive away from newspapers and magazines, both online and in the form of e.g. books. Another is cinema. There’s a rise in documentaries as a way to bring more complex stories to audiences, which also allows for conveying of drama. It’s the notion of journalism as literature that stuck with me most at State of the Net.

    For a number of years I’ve said that I don’t want to pay for news, but do want to pay for (investigative) journalism, and often people would respond news and journalism are the same thing. Maybe I now finally have the vocabulary to better explain the difference I perceive.

    I agree that the notion of public conversation is of prime importance. Not the screaming at each-other on forums, twitter or facebook. But the way that distributed conversations can create learning, development and action, as a democratic act. Distributed conversations, like the salons of old, as a source of momentum, of emergent collective action (2013). Similarly, I position Networked Agency as a path away from despair of being powerless in the face of change, and therefore as an alternative to falling for populist oversimplification. Networked agency in that sense is very much a democratising thing.

    To celebrate the launch of the GDPR last week Friday, Jaap-Henk Hoekman released his ‘little blue book’ (pdf)’ on Privacy Design Strategies (with a CC-BY-NC license). Hoekman is an associate professor with the Digital Security group of the ICS department at the Radboud University.

    I heard him speak a few months ago at a Tech Solidarity meet-up, and enjoyed his insights and pragmatic approaches (PDF slides here).

    Data protection by design (together with a ‘state of the art’ requirement) forms the forward looking part of the GDPR where the minimum requirements are always evolving. The GDPR is designed to have a rising floor that way.
    The little blue book has an easy to understand outline, which cuts up doing privacy by design into 8 strategies, each accompanied by a number of tactics, that can all be used in parallel.

    Those 8 strategies (shown in the image above) are divided into 2 groups, data oriented strategies and process oriented strategies.

    Data oriented strategies:
    Minimise (tactics: Select, Exclude, Strip, Destroy)
    Separate (tactics: Isolate, Distribute)
    Abstract (tactics: Summarise, Group, Perturb)
    Hide (tactics: Restrict, Obfuscate, Dissociate, Mix)

    Process oriented strategies:
    Inform (tactics: Supply, Explain, Notify)
    Control (tactics: Consent, Choose, Update, Retract)
    Enforce (tactics: Create, Maintain, Uphold)
    Demonstrate (tactics: Record, Audit, Report)

    All come with examples and the final chapters provide suggestions how to apply them in an organisation.

    The Washington Post now has a premium ‘EU’ option, suggesting you pay more for them to comply with the GDPR.

    Reading what the offer entails of course shows something different.
    The basic offer is the price you pay to read their site, but you must give consent for them to track you and to serve targeted ads.
    The premium offer is the price you pay to have an completely ad-free, and thus tracking free, version of the WP. Akin to what various other outlets and e.g. many mobile apps do too.

    This of course has little to do with GDPR compliance. For the free and basic subscription they still need to be compliant with the GDPR but you enter into a contract that includes your consent to get to that compliance. They will still need to explain to you what they collect and what they do with it for instance. And they do, e.g. listing all their partners they exchange visitor data with.

    The premium version gives you an ad-free WP so the issue of GDPR compliance doesn’t even come up (except of course for things like commenting which is easy to handle). Which is an admission of two things:

    1. They don’t see any justification for how their ads work other than getting consent from a reader. And they see no hassle-free way to provide informed consent options, or granular controls to readers, that doesn’t impact the way ad-tech works, without running afoul of the rule that consent cannot be tied to core services (like visiting their website).
    2. They value tracking you at $30 per year.

    Of course their free service is still forced consent, and thus runs afoul of the GDPR, as you cannot see their website at all without it.

    Yet, just to peruse an occasional article, e.g. following a link, that forced consent is nothing your browser can’t handle with a blocker or two, and VPN if you want. After all your browser is your castle.

    Today I was at a session at the Ministry for Interior Affairs in The Hague on the GDPR, organised by the center of expertise on open government.
    It made me realise how I actually approach the GDPR, and how I see all the overblown reactions to it, like sending all of us a heap of mail to re-request consent where none’s needed, or taking your website or personal blog even offline. I find I approach the GDPR like I approach a quality assurance (QA) system.

    One key change with the GDPR is that organisations can now be audited concerning their preventive data protection measures, which of course already mimics QA. (Next to that the GDPR is mostly an incremental change to the previous law, except for the people described by your data having articulated rights that apply globally, and having a new set of teeth in the form of substantial penalties.)

    AVG mindmap
    My colleague Paul facilitated the session and showed this mindmap of GDPR aspects. I think it misses the more future oriented parts.

    The session today had three brief presentations.

    In one a student showed some results from his thesis research on the implementation of the GDPR, in which he had spoken with a lot of data protection officers or DPO’s. These are mandatory roles for all public sector bodies, and also mandatory for some specific types of data processing companies. One of the surprising outcomes is that some of these DPO’s saw themselves, and were seen as, ‘outposts’ of the data protection authority, in other words seen as enforcers or even potentially as moles. This is not conducive to a DPO fulfilling the part of its role in raising awareness of and sensitivity to data protection issues. This strongly reminded me of when 20 years ago I was involved in creating a QA system from scratch for my then employer. Some of my colleagues saw the role of the quality assurance manager as policing their work. It took effort to show how we were not building a straightjacket around them that kept them within strict boundaries, but providing a solid skeleton to grow on, and move faster. Where audits are not hunts for breaches of compliance but a way to make emergent changes in the way people worked visible, and incorporate professionally justified ones in that skeleton.

    In another presentation a civil servant of the Ministry involved in creating a register of all person related data being processed. What stood out most for me was the (rightly) pragmatic approach they took with describing current practices and data collections inside the organisation. This is a key element of QA as well. You work from descriptions of what happens, and not at what ’should’ happen or ‘ideally’ happens. QA is a practice rooted in pragmatism, where once that practice is described and agreed it will be audited.
    Of course in the case of the Ministry it helps that they only have tasks mandated by law, and therefore the grounds for processing are clear by default, and if not the data should not be collected. This reduces the range of potential grey areas. Similarly for security measures, they already need to adhere to national security guidelines (called the national baseline information security), which likewise helps with avoiding new measures, proves compliance for them, and provides an auditable security requirement to go with it. This no doubt helped them to be able to take that pragmatic approach. Pragmatism is at the core of QA as well, it takes its cues from what is really happening in the organisation, what the professionals are really doing.

    A third one dealt with open standards for both processes and technologies by the national Forum for Standardisation. Since 2008 a growing list of currently some 40 or so standards is mandatory for Dutch public sector bodies. In this list of standards you find a range of elements that are ready made to help with GDPR compliance. In terms of support for the rights of those described by the data, such as the right to export and portability for instance, or in terms of preventive technological security measures, and ‘by design’ data protection measures. Some of these are ISO norms themselves, or, as the mentioned national baseline information security, a compliant derivative of such ISO norms.

    These elements, the ‘police’ vs ‘counsel’ perspective on the rol of a DPO, the pragmatism that needs to underpin actions, and the building blocks readily to be found elsewhere in your own practice already based on QA principles, made me realise and better articulate how I’ve been viewing the GDPR all along. As a quality assurance system for data protection.

    With a quality assurance system you can still famously produce concrete swimming vests, but it will be at least done consistently. Likewise with GDPR you will still be able to do all kinds of things with data. Big Data and developing machine learning systems are hard but hopefully worthwile to do. With GDPR it will just be hard in a slightly different way, but it will also be helped by establishing some baselines and testing core assumptions. While making your purposes and ways of working available for scrutiny. Introducing QA upon its introduction does not change the way an organisation works, unless it really doesn’t have its house in order. Likewise the GDPR won’t change your organisation much if you have your house in order either.

    From the QA perspective on GDPR, it is perfectly clear why it has a moving baseline (through its ‘by design’ and ‘state of the art’ requirements). From the QA perspective on GDPR it is perfectly clear what the connection is to how Europe is positioning itself geopolitically in the race concerning AI. The policing perspective after all only leads to a luddite stance concerning AI, which is not what the EU is doing, far from it. From that it is clear how the legislator intends the thrust of GDPR. As QA really.

    At least I think it is…. Personal blogs don’t need to comply with the new European personal data protection regulations (already in force but enforceable from next week May 25th), says Article 2.2.c. However my blog does have a link with my professional activities, as I blog here about professional interests. One of those interests is data protection (the more you’re active in transparency and open data, the more you also start caring about data protection).

    In the past few weeks Frank Meeuwsen has been writing about how to get his blog GDPR compliant (GDPR and the IndieWeb 1, 2 and 3, all in Dutch), and Peter Rukavina has been following suit. Like yours, my e-mail inbox is overflowing with GDPR related messages and requests from all the various web services and mailing lists I’m using. I had been thinking about adding a GDPR statement to this blog, but clearly needed a final nudge.

    That nudge came this morning as I updated the Jetpack plugin of my WordPress blog. WordPress is the software I use to create this website, and Jetpack is a module for it, made by the same company that makes WordPress itself, Automattic. After the update, I got a pop-up stating that in my settings a new option now exists called “Privacy Policy”, which comes with a guide and suggested texts to be GDPR compliant. I was pleasantly surprised by this step by Automattic.

    So I used that to write a data protection policy for this site. It is rather trivial in the sense that this website doesn’t do much, yet it is also surprisingly complicated as there are many different potential rabbit holes to go down. As it concerns not just comments or webmentions but also server logs my web hoster makes, statistics tools (some of which I don’t use but cannot switch off either), third party plugins for WordPress, embedded material from data hungry platforms like Youtube etc. I have a relatively bare bones blog (over the years I made it ever more minimalistic, stripping out things like sharing buttons most recently), and still as I’m asking myself questions that normally only legal departments would ask themselves, there are many aspects to consider. That is of course the whole point, that we ask these types of questions more often, not just of ourselves, but of every service provider we engage with.

    The resulting Data Protection Policy is now available from the menu above.