As of today it is final: the new EU copyright directive has been adopted (ht Julia Reda). I am pleased to see my government voted against, as it has in earlier stages, and as my MEPs did. Sadly it hasn’t been enough to cut Article 11 and 13, despite the mountain of evidence and protests against both articles. It is interesting and odd to see both Spain and Germany vote in favour, given the failure of their respective laws on which Article 11 is based, and the German government coalition parties stated position of being against content filters (i.e. Article 13).

Over the next two years it is important to track the legislative efforts in Member States implementing this Directive. Countries that voted against or abstained will try to find the most meaningless implementation of both Articles 11 and 13, and will be emphasising the useful bits in other parts of the Directive I suspect, while subjected to intense lobbying efforts both for and against. The resulting differences in interpretation across MS will be of interest. Also looking forward to following the court challenges that will undoubtedly result.

In the mean time, you as an internet-citizen have two more years to build and extend your path away from the silos where Article 11 and 13 will be an obstacle to you. Run your own stuff, decentralise and federate. Walkaway from the big platforms. But most of all, interact with creators and makers directly. Both when it comes to re-using or building on their creations, as when it comes to supporting them. Article 11 and 13 will not bring any creator any new revenue, dominant entertainment industry mediators are the ones set to profit from rent seeking. Vote with your feet and wallet.

The Mozilla foundation has launched a new service that looks promising, which is why I am bookmarking it here. Firefox Send allows you to send up to 1GB (or 2.5GB if logged in) files to someone else. This is the same as services like Dutch WeTransfer does, except it does so with end-to-end encryption.

Files are encrypted in your browser, before being send to Mozilla’s server until downloaded. The decryption key is contained in the download URL. That download URL is not send to the receiver by Mozilla, but you do that yourself. Files can be locked with an additional password that needs to be conveyed to the receiver by the sender through other means as well. Files are kept 5 minutes, 1 or 24 hours, or 7 days, depending on your choice, and for 1 or up to 100 downloads. This makes it suitable for quick shares during conference calls for instance. Apart from the encrypted file, Mozilla only knows the IP address of the uploader and the downloader(s). Unlike services like WeTransfer where the service also has e-mail addresses for both uploader and intended downloader, and you are dependent on them sending the receivers a confirmation with the download link first.

Firefox Send doesn’t send the download link to the recipient, you do

This is an improvement in terms of data protection, even if not fully water tight (nothing ever really is, especially not if you are a singled out target by a state actor). It does satisfy the need of some of my government clients who are not allowed to use services like WeTransfer currently.

This week NBC published an article exploring the source of training data sets for facial recognition. It makes the claim that we ourselves are providing, without consent, the data that may well be used to put us under surveillance.

In January IBM made a database available for research into facial recognition algorithms. The database contains some 1 million face descriptions that can be used as a training set. Called “Diversity in Faces” the stated aim is to reduce bias in current facial recognition abilities. Such bias is rampant often due to too small and too heterogenous (compared to the global population) data sets used in training. That stated goal is ethically sound it seems, but the means used to get there raises a few questions with me. Specifically if the means live up to the same ethical standards that IBM says it seeks to attain with the result of their work. This and the next post explore the origins of the DiF data, my presence in it, and the questions it raises to me.

What did IBM collect in “Diversity in Faces”?
Let’s look at what the data is first. Flickr is a photo sharing site, launched in 2004, that started supporting publishing photos with a Creative Commons license from early on. In 2014 a team led by Bart Thomee at Yahoo, which then owned Flickr, created a database of 100 million photos and videos with any type of Creative Commons license published in previous years on Flickr. This database is available for research purposes and known as the ‘YFCC-100M’ dataset. It does not contain the actual photos or videos per se, but the static metadata for those photos and videos (urls to the image, user id’s, geo locations, descriptions, tags etc.) and the Creative Commons license it was released under. See the video below published at the time:

YFCC100M: The New Data in Multimedia Research from CACM on Vimeo.

IBM used this YFCC-100M data set as a basis, and selected 1 million of the photos in it to build a large collection of human faces. It does not contain the actual photos, but the metadata of that photo, and a large range of some 200 additional attributes describing the faces in those photos, including measurements and skin tones. Where YFC-100M was meant to train more or less any image recognition algorithm, IBM’s derivative subset focuses on faces. IBM describes the dataset in their Terms of Service as:

a list of links (URLs) of Flickr images that are publicly available under certain Creative Commons Licenses (CCLs) and that are listed on the YFCC100M dataset (List of URLs together with coding schemes aimed to provide objective measures of human faces, such as cranio-facial features, as well as subjective annotations, such as human-labeled annotation predictions of age and gender(“Coding Schemes Annotations”). The Coding Schemes Annotations are attached to each URL entry.

My photos are in IBM’s DiF
NBC, in their above mentioned reporting on IBM’s DiF database, provide a little tool to determine if photos you published on Flickr are in the database. I am an intensive user of Flickr since early 2005, and published over 25.000 photos there. A large number of those carry a Creative Commons license, BY-NC-SA, meaning that as long as you attribute me, don’t use an image commercially and share your result under the same license you’re allowed to use my photos. As the YFCC-100M covers the years 2004-2014 and I published images for most of those years, it was likely my photos are in it, and by extension likely my photos are in IBM’s DiF. Using NBC’s tool, based on my user name, it turns out 68 of my photos are in IBM’s DiF data set.

One set of photos that apparently is in IBM’s DiF cover the BlogTalk Reloaded conference in Vienna in 2006. There I made various photos of participants and speakers. The NBC tool I mentioned provides one photo from that set as an example:

Thomas Burg

My face is likely in IBM’s DiF
Although IBM doesn’t allow a public check who is in their database, it is very likely that my face is in it. There is a half-way functional way to explore the YFCC-100M database, and DiF is derived from the YFCC-100M. It is reasonable to assume that faces that can be found in YFCC-100M are to be found in IBM’s DiF. The German university of Kaiserslautern at the time created a browser for the YFCC-100M database. Judging by some tests it is far from complete in the results it shows (for instance if I search for my Flickr user name it shows results that don’t contain the example image above and the total number of results is lower than the number of my photos in IBM’s DiF) Using that same browser to search for my name, and for Flickr user names that are likely to have taken pictures of me during the mentioned BlogTalk conference and other conferences, show that there is indeed a number of pictures of my face in YFCC-100M. Although the limited search in IBM’s DiF possible with NBC’s tool doesn’t return any telling results for those Flickr user names. it is very likely my face is in IBM’s DiF therefore. I do find a number of pictures of friends and peers in IBM’s DiF that way, taken at the same time as pictures of myself.

Photos of me in YFCC-100M

But IBM won’t tell you
IBM is disingenuous when it comes to being transparent about what is in their DiF data. Their TOS allows anyone whose Flickr images have been incorporated to request to be excluded from now on, but only if you can provide the exact URLs of the images you want excluded. That is only possible if you can verify what is in their data, but there is no public way to do so, and only university affiliated researchers can request access to the data by stating their research interest. Requests can be denied. Their TOS says:

3.2.4. Upon request from IBM or from any person who has rights to or is the subject of certain images, Licensee shall delete and cease use of images specified in such request.

Time to explore the questions this raises
Now that the context of this data set is clear, in a next posting we can take a closer look at the practical, legal and ethical questions this raises.

SimCity200, adapted from image by m01229 CC-BY)

Came across an interesting article, and by extension the techzine it was published in: Logic.
The article was about the problematic biases and assumptions in the model of urban development used in the popular game SimCity (one of those time sinks where my 10.000 hours brought me nothing 😉 ). And how that unintentionally (the SimCity creator just wanted a fun game) may have influenced how people look at the evolution of cityscapes in real life, in ways the original 1960’s work the game is based on never has. The article is a fine example of cyber history / archeology.

The magazine it was published in, Logic (twitter), started in the spring of 2017 and is now reaching issue 7. Each issue has a specific theme, around which contributions are centered. Intelligence, Tech against Trump, Sex, Justice, Scale, Failure, Play, and soon China, have been the topics until now.

The zine is run by Moira Weigel, Christa Hartsock, Ben Tarnoff, and Jim Fingal.

I’ve ordered the back issues, and subscribed (though technically it is cheaper to keep ordering back-issues). They pay their contributors, which is good.

Cover for the upcoming edition on tech in China. Design (like all design for Logic) by Xiaowei R. Wang.

Aral Balkan talks about how to design tools and find ways around the big social media platforms. He calls for the design and implementation of Small Tech. I fully agree. Technology to provide us with agency needs to be not just small, but smaller than us, i.e. within the scope of control of the group of people deploying a technology or method.

My original fascination with social media, back in the ’00s when it was blogs and wikis mostly, was precisely because it was smaller than us, it pushed publication and sharing in the hands of all of us, allowing distributed conversations. The concentration of our interaction in the big tech platforms made social media ‘bigger than us’ again. We don’t decide what FB shows us, breaking out of your own bubble (vital in healthy networks) becomes harder because sharing is based on pre-existing ‘friendships’ and discoverability has been removed. The erosion has been slow, but very visible. Networked Agency, to me, is only possible with small tech, and small methods. It’s why I find most ‘digital transformation’ efforts disappointing, and feel we need to focus much more on human digital networks, on distributed digital transformation. Based on federated small tech, networks of small tech instances. Where our tools are useful on their own, and more useful in concert with others.

Aral’s posting (and blog in general) is worth a read, and as he is a coder and designer, he acts on those notions too.

Kilroy black edited

Social geolocation services over the years have been very useful for me. The value is in triggering serendipitous meetings: being in a city outside my normal patterns at the same time someone in (or peripheral to) my network is in the city too, outside their normal patterns. It happened infrequently, about once a year, but frequently enough to be useful and keep checking in. I was a heavy user of Plazes and Dopplr, both long since disappeared. As with other social platforms I and my data quickly became the marketable product, instead of the customer. So ultimately I stopped using Foursquare/Swarm much, only occasionally for international travel, and completely in 2016. Yet I still long for that serendipitous effect, so I am looking to make my location and/or travel plans available, for selected readers, through this site.

There are basically three ways in which I could do that.
1) The POSSE way. I post my location or travel plan on this blog, and it gets shared to platforms like Foursquare, and through RSS. I would need to be able to show these postings only to my followers/ readers, and have a password protected RSS feed and subscription workflow.
2) The PESOS way. I use an existing platform to create my check-ins, like Foursquare, and share that back to my blog. Where it is only accessible for followers/readers, and has a password protected rss feed.
3) The ‘just my’ way. I use only my blog to create check-ins and share them selectively with followers and readers, and have a password protected rss feed for it.

Option 3 is the one that provides the most control over my data, but likely limits the way in which I can allow others to follow me, and needs a flexible on-the-go way to add check-ins through mobile.
Option 2 is the one that comes with easy mobile apps, allows followers to use their own platform apps to do so, as well as through my site.
Option 1 is the one that is in between those two. It has the problems of option 3, but still allows others to use their own platforms like in option 2.

I decided to try and do both Option 2, and Option 3. If I can find a way to make Option 3 work well, getting to Option 1 is an extension of it.
Option 2 at first glance was the easiest to create. This because Aaron Parecki already created ‘Own Your Swarm‘ (OYS) which is a bridge between my existing Foursquare/Swarm account and Micropub, an open protocol for which my site has an endpoint. It means I can let OYS talk both to my Swarm account and my site, so that it posts something to this blog every time I check-in in Swarm on my mobile. OYS not just posts the check-ins but also keeps an eye on my Swarm check-ins, so that when there are comments or likes, they too get reflected to my blog.

My blog uses the Posts Kinds plugin, that has a posting type for check-ins, so they get their own presentation in the blog. OYS allows me to automatically tag what it posts, which gets matched to the existing categories and tags in my blog.

I from now on use a separate category for location related postings, called plazes. Plazes was the original geolocation app I started using in 2004, when co-founder Felix Petersen showed it to me on the very first BlogWalk I co-organised in the Netherlands. Plazes also was the first app to quickly show me the value of creating serendipitous meetings. So as an expression of geo-serendipic (serendipity-epic?) nostalgia, I named the postings category after it.

The number and frequency of 51% attacks on blockchains is increasing. Ethereum last month being the first of the top 20 cryptocoins to be hit. Other types of attacks mostly try to exploit general weaknesses in how exchanges operate, but this is fundamental to how blockchain is supposed to work. Combined with how blockchain projects don’t seem to deliver and are basically vaporware, we’ve definitely gone from the peak of inflated expectations to the trough of disillusion. Whether there will be a plateau of productivity remains an open question.

A team of people, including Jeremy Keith whose writings are part of my daily RSS infodiet, have been doing some awesome web archeology. Over the course of 5 days at CERN, they recreated the browser experience as it was 30 years ago with the (fully text based) WorldWideWeb application for the NeXT computer

Hypertext’s root, the CERN page in 1989

This is the type of pages I visited before inline images were possible.
The cool bit is it allows you to see your own site as it would have looked 30 years ago. (Go to Document, then Open from full document reference and fill in your url) My site looks pretty well, which is not surprising as it is very text centered anyway.

Hypertexting this blog like it’s 1989

Maybe somewhat less obvious, but of key importance to me in the context of my own information strategies and workflows, as well as in the dynamics of the current IndieWeb efforts is that this is not just a way to view a site, but you can also edit the page directly in the same window. (See the sentence in all capitals in the image below.)

Read and write, the original premise of the WWW

Hypertext wasn’t meant as viewing-only, but as an interactive way of linking together documents you were actively working on. Closest come current wiki’s. But for instance I also use Tinderbox, a hypertext mindmapping, outlining and writing tool for Mac, that incorporates this principle of linked documents and other elements that can be changed as you go along. This seamless flow between reading and writing is something I feel we need very much for effective information strategies. It is present in the Mother of all Demos, it is present in the current thinking of Aaron Parecki about his Social Reader, and it is a key element in this 30 year old browser.

Help jij ons mee organiseren? We gaan een IndieWebCamp organiseren in Utrecht, een event om het gebruik van het Open Web te bevorderen, en met elkaar praktische zaken aan je eigen site te verbeteren. We zoeken nog een geschikte datum en locatie in Utrecht. Je hulp is dus van harte welkom.

Op het Open Web bepaal jij zelf wat je publiceert, hoe het er uit ziet, en met wie je in gesprek gaat. Op het Open Web bepaal je zelf wie en wat je volgt en leest. Het Open Web was er altijd al, maar in de loop van de tijd zijn we allemaal min of meer opgesloten geraakt in de silo’s van Facebook, Twitter, en al die anderen. Hun algoritmes en timelines bepalen nu wat jij leest. Dat kan ook anders. Bouw je eigen site, waar anderen niet tussendoor komen fietsen omdat ze advertentie-inkomsten willen genereren. Houd je eigen nieuwsbronnen bij, zonder dat andermans algoritme je opsluit in een bubbel. Dat is het IndieWeb: jouw content, jouw relaties, jij zit aan het stuur.

Frank Meeuwsen en ik zijn al heel lang onderdeel van internet en dat Open Web, maar brengen/brachten ook veel tijd in websilo’s als Facebook door. Inmiddels zijn we beiden actieve ‘terugkeerders’ op het Open Web. Afgelopen november waren we samen op het IndieWebCamp Nürnberg, waar een twintigtal mensen met elkaar discussieerde en ook zelf actief aan de slag gingen met hun eigen websites. Sommigen programmeerden geavanceerde dingen, maar de meesten zoals ikzelf bijvoorbeeld, deden juist kleine dingen (zoals het verwijderen van een link naar de auteur van postings op deze site). Kleine dingen zijn vaak al lastig genoeg. Toen we terugreden met de trein naar Nederland waren we het er al snel over eens: er moet ook een IndieWebCamp in Nederland komen. In Utrecht dus, dit voorjaar.

Om Frank te citeren:

Voel je je aangesproken door de ideeën van het open web, indieweb, wil je aan de slag met een eigen site die meer vrij staat van de invloeden sociale silo’s en datatracking? Wil je een nieuwsvoorziening die niet meer primair wordt gevoed door algoritmen en polariserende roeptoeters? Dan verwelkomen we je op twee dagen IndieWebCamp Utrecht.

Laat weten of je er bij wilt zijn.
Laat weten of je kunt helpen met het vinden van een locatie.
Laat weten hoe wij jou kunnen helpen bij je stappen op het Open Web.

Je bent uitgenodigd!

Donald Clark writes about the use of voice tech for learning. I find I struggle enormously with voice. While I recognise several aspects put forward in that posting as likely useful in learning settings (auto transcription, text to speech, oral traditions), there are others that remain barriers to adoption to me.

For taking in information as voice. Podcasts are mentioned as a useful tool, but don’t work for me at all. I get distracted after about 30 seconds. The voices drone on, there’s often tons of fluff as the speaker is trying to get to the point (often a lack of preparation I suppose). I don’t have moments in my day I know others use to listen to podcasts: walking the dog, sitting in traffic, going for a run. Reading a transcript is very much faster, also because you get to skip the bits that don’t interest you, or reread sections that do. Which you can’t do when listening, because you don’t know when a uninteresting segment will end, or when it might segue into something of interest. And then you’ve listened to the end and can’t get those lost minutes back. (Videos have the same issue, or rather I have the same issue with videos)

For using voice to ask or control things. There are obvious privacy issues with voice assistants. Having active microphones around for one. Even if they are supposed to only fully activate upon the use of the wake-up word, they get triggered by false positives. And don’t distinguish between me and other people that maybe it shouldn’t respond to. A while ago I asked around in my network how people use their Google and Amazon microphones, and the consensus was that most settle on a small range of specific uses. For those it shouldn’t be needed to have cloud processing of what those microphones tape in your living room, those should be able to be dealt with locally, with only novel questions or instructions being processed in the cloud. (Of course that’s not the business model of these listening devices).

A very different factor in using voice to control things, or for instance dictate is self-consciousness. Switching on a microphone in a meeting has a silencing effect usually. For dictation, I won’t dictate text to software e.g. at a client’s office, or while in public (like on a train). Nor will I talk to my headset while walking down the street. I might do it at home, but only if I know I’m not distracting others around me. In the cases where I did use dictation software (which nowadays works remarkably well), I find it clashes with my thinking and formulation. Ultimately it’s easier for me to shape sentences on paper or screen where I see them take shape in front of me. When dictating it easily descends into meaninglessness, and it’s impossible to structure. Stream of thought dictation is the only bit that works somewhat, but that needs a lot of cleaning up afterwards. Judging by all podcasts I sampled over the years, it is something that happens to more people when confronted with a microphone (see the paragraph above). Maybe if it’s something more prepared like a lecture, or presentation, it might be different, but those types of speech have been prepared in writing usually, so there is likely a written source for it already. In any case, dictation never saved me any time. It is of course very different if you don’t have the use of your hands. Then dictation is your door to the world.

It makes me wonder how voice services are helping you? How is it saving you time or effort? In which cases is it more novelty than effectiveness?

Alan Levine recently posted his description of how to add an overview to your blog of postings from previous years on the same date as today. He turned it into a small WordPress plugin, allowing you to add such an overview using a shortcode wherever in your site you want it. It was something I had on my list of potential small hacks, so it was a nice coincidence my feedreader presented me with Alan’s posting on this. It has become ‘small hack’ 4.

I added his WP plugin, but it didn’t work as the examples he provided. The overview was missing the years. Turns out a conditional loop that should use the posting’s year, only was provided with the current year, thus never fulfilling the condition. A simple change in how the year of older postings was fetched fixed it. Which has now been added to the plugin.

In the right hand sidebar you now find a widget listing postings from earlier years, and you can see the same on the page ‘On This Blog Today In‘. I am probably my own most frequent reader of the archives, and having older postings presented to me like this adds some serendipity.

From todays historic postings, the one about the real time web is still relevant to me in how I would like a social feed reader to function. And the one about a storm that kept me away from home, I still remember (ah, when Jaiku was still a thing!).

Adding these old postings is as simple as adding the shortcode ‘postedtoday’:

There are 6 posts found on this site published on April 24

  • April 24, 2018
    • Suggested Reading: GDPR, Fintech, China and more Some links I think worth reading today. ICANN struggles with the GDPR for the WHOIS database, and has now run out of time:EFF: GDPR forces ICANN to improve WHOIS andEFF: Privacy as afterthought at ICANN Facebook removes 1.5 billion users from EU jurisdiction while maintaining they’re totally committed to applying the ‘spirit’ of the GDPR […]
    • GDPR as De Facto Norm: Sonos Speakers Just received an email from Sonos (the speaker system for streaming) about the changes they are making to their privacy statement. Like with FB in my previous posting this is triggered by the GDPR starting to be enforced from the end of May. The mail reads in part We’ve made these changes to comply with […]
    • Facebook GDPR Changes Unimpressive It seems, from a preview for journalists, that the GDPR changes that Facebook will be making to its privacy controls, and especially the data controls a user has, are rather unimpressive. I had hoped that with the new option to select ranges of your data for download, you would also be able to delete specific […]
  • April 24, 2015
    • Big Data for Malaysia and ASEAN This week I was invited to Malaysia as one of 8 members of the advisory panel on big data to the Malaysian government. The meeting was part of the Big Data Week taking place in Kuala Lumpur where I gave two presentations and was part of a panel discussion. Malaysia intends to become a big […]
  • April 24, 2005
    • Microlearning Conference and BlogWalk 8 Sebastian Fiedler is busy trying to organize a BlogWalk meeting in Innsbruck on June 25th 2005. This on the day after the Microlearning Conference in the same city, which takes place on June 23rd and 24th. It certainly looks like a great conference also from a KM and social software viewpoint. Are you going to […]
    • Blognomics, a much needed event Finally the Netherlands has seen it’s first symposium on the use of weblogs. Drawing a mixed crowd of journalists, politicians, business people and of course bloggers, Blognomics was a succes to my eyes. Have a look at Technorati for impressions, in words, video and pictures. I will upload my pictures to Flickr. I have written […]

Noise cancelling in cars isn’t a no brainer I think. When I first got my noise cancelling headphones and had put them to good use on trains and airplanes, I tried to use them while driving in my car as well. I took them off again rather quickly, once I noticed that I actually use the car’s noises as feedback, e.g. for shifting gears, to determine road conditions, and other things. So with noise cancelling active I felt that part of my sensorium was cut off. It will take replacing those observations by ear for ones by other senses, actively rewiring entrained behaviour. For passengers it’s likely a different thing.

Replied to Noise cancelling for cars is a no-brainer
We’re all familiar with noise cancelling headphones. I’ve got some that I use for transatlantic trips, and they’re great for minimising any repeating background noise...It doesn’t surprise me, therefore, to find that BOSE, best known for its headphones, are offering car manufacturers something similar

A while ago Peter wrote about energy security and how having a less reliable grid may actually be useful to energy security.

This the difference between having tightly coupled systems and loosely coupled systems. Loosely coupled systems can show more robustness because having failing parts will not break the whole. It also allows for more resilience that way, you can locally fix things that fell apart.

It may clash however with our current expectations of having electricity 24/7. Because of that expectation we don’t spend much time about being clever in our timing and usage of energy. A long time ago I provided training to a group of some 20 Iraqi water provision managers, as part of the rebuilding efforts after the US invasion of Iraq. They had all kinds of issues obviously, and often issues arising in parallel. What I remember connected to Peter’s post is how they described Iraqi citizens had adapted to the intermittent availability of electricity and water. How they made things work, at some level, by incorporating the intermittent availability of things into their routines. When there was no electricity they used water for cooling, and vice versa for instance. A few years ago at a Border Sessions conference in The Hague, one speaker talked about resilience and intermittent energy sources too. He mentioned the example that historically Dutch millers had dispensation of visiting church on Sundays if it was windy enough to mill.

The past few days in Dutch newspapers a discussion is taking place that some local solar energy plans can’t be implemented because the grid maintainers can’t deal with the inputs. Now this isn’t necessarily true, but more the framing that comes with the current always on macro-grid. Tellingly any mention of micro grids, or local storage is absent from that framing.

In a different discussion with Peter Rukavina and with Peter Bihr, it was mentioned that resilience is, and needs to be, rising on the list of design principles. It’s also the reason why resilience is one of three elements of agency in my networked agency thinking.

Line 'Em Up
Power lines in Canada, photo Ian Muttoo, license CC BY SA