This week NBC published an article exploring the source of training data sets for facial recognition. It makes the claim that we ourselves are providing, without consent, the data that may well be used to put us under surveillance.

In January IBM made a database available for research into facial recognition algorithms. The database contains some 1 million face descriptions that can be used as a training set. Called “Diversity in Faces” the stated aim is to reduce bias in current facial recognition abilities. Such bias is rampant often due to too small and too heterogenous (compared to the global population) data sets used in training. That stated goal is ethically sound it seems, but the means used to get there raises a few questions with me. Specifically if the means live up to the same ethical standards that IBM says it seeks to attain with the result of their work. This and the next post explore the origins of the DiF data, my presence in it, and the questions it raises to me.

What did IBM collect in “Diversity in Faces”?
Let’s look at what the data is first. Flickr is a photo sharing site, launched in 2004, that started supporting publishing photos with a Creative Commons license from early on. In 2014 a team led by Bart Thomee at Yahoo, which then owned Flickr, created a database of 100 million photos and videos with any type of Creative Commons license published in previous years on Flickr. This database is available for research purposes and known as the ‘YFCC-100M’ dataset. It does not contain the actual photos or videos per se, but the static metadata for those photos and videos (urls to the image, user id’s, geo locations, descriptions, tags etc.) and the Creative Commons license it was released under. See the video below published at the time:

YFCC100M: The New Data in Multimedia Research from CACM on Vimeo.

IBM used this YFCC-100M data set as a basis, and selected 1 million of the photos in it to build a large collection of human faces. It does not contain the actual photos, but the metadata of that photo, and a large range of some 200 additional attributes describing the faces in those photos, including measurements and skin tones. Where YFC-100M was meant to train more or less any image recognition algorithm, IBM’s derivative subset focuses on faces. IBM describes the dataset in their Terms of Service as:

a list of links (URLs) of Flickr images that are publicly available under certain Creative Commons Licenses (CCLs) and that are listed on the YFCC100M dataset (List of URLs together with coding schemes aimed to provide objective measures of human faces, such as cranio-facial features, as well as subjective annotations, such as human-labeled annotation predictions of age and gender(“Coding Schemes Annotations”). The Coding Schemes Annotations are attached to each URL entry.

My photos are in IBM’s DiF
NBC, in their above mentioned reporting on IBM’s DiF database, provide a little tool to determine if photos you published on Flickr are in the database. I am an intensive user of Flickr since early 2005, and published over 25.000 photos there. A large number of those carry a Creative Commons license, BY-NC-SA, meaning that as long as you attribute me, don’t use an image commercially and share your result under the same license you’re allowed to use my photos. As the YFCC-100M covers the years 2004-2014 and I published images for most of those years, it was likely my photos are in it, and by extension likely my photos are in IBM’s DiF. Using NBC’s tool, based on my user name, it turns out 68 of my photos are in IBM’s DiF data set.

One set of photos that apparently is in IBM’s DiF cover the BlogTalk Reloaded conference in Vienna in 2006. There I made various photos of participants and speakers. The NBC tool I mentioned provides one photo from that set as an example:

My face is likely in IBM’s DiF
Although IBM doesn’t allow a public check who is in their database, it is very likely that my face is in it. There is a half-way functional way to explore the YFCC-100M database, and DiF is derived from the YFCC-100M. It is reasonable to assume that faces that can be found in YFCC-100M are to be found in IBM’s DiF. The German university of Kaiserslautern at the time created a browser for the YFCC-100M database. Judging by some tests it is far from complete in the results it shows (for instance if I search for my Flickr user name it shows results that don’t contain the example image above and the total number of results is lower than the number of my photos in IBM’s DiF) Using that same browser to search for my name, and for Flickr user names that are likely to have taken pictures of me during the mentioned BlogTalk conference and other conferences, show that there is indeed a number of pictures of my face in YFCC-100M. Although the limited search in IBM’s DiF possible with NBC’s tool doesn’t return any telling results for those Flickr user names. it is very likely my face is in IBM’s DiF therefore. I do find a number of pictures of friends and peers in IBM’s DiF that way, taken at the same time as pictures of myself.


Photos of me in YFCC-100M

But IBM won’t tell you
IBM is disingenuous when it comes to being transparent about what is in their DiF data. Their TOS allows anyone whose Flickr images have been incorporated to request to be excluded from now on, but only if you can provide the exact URLs of the images you want excluded. That is only possible if you can verify what is in their data, but there is no public way to do so, and only university affiliated researchers can request access to the data by stating their research interest. Requests can be denied. Their TOS says:

3.2.4. Upon request from IBM or from any person who has rights to or is the subject of certain images, Licensee shall delete and cease use of images specified in such request.

Time to explore the questions this raises
Now that the context of this data set is clear, in a next posting we can take a closer look at the practical, legal and ethical questions this raises.

I’ve been using Flickr to store photos since March 2005. It’s at the same time an easy way to embed photos in my blog without using up storage space in the hosting account, and an online remote back-up. Over the years I’ve uploaded some 24.000 photos, though I’ve been using Flickr less in the last 2 years.

My account is from just before the moment Yahoo bought Flickr from its founders, which was also in March 2005, and it forced me to create a Yahoo account for it in 2007. Yahoo never seemed to have much vision for Flickr, but as an early user (Flickrs was founded in 2004) the original functionality I signed up and paid for was all I really needed.

Yahoo has been bought by Verizon last year, and since then it was likely they’d sell some parts of it. SmugMug has acquired Flickr last week, and that at least means that photography is now the main focus again. That hopefully means further evolution of Flickr, or it might mean a switch to SmugMug in the future.

Tellingly one needs to accept the new terms of service by 25th May 2018, which is the day the EU data protection regulation GDPR enters into force.

It also means that I will be able to delete my Yahoo account, which I only had because Flickr users were forced to.
Yahoo is an internet dinosaur, launched in 1994. Its best days already lie way back. Deleting my Yahoo account as such is also an end of an era, an end that felt long overdue for years already.

In the past week a storm raged through Flickr, in the past weeks and months we’ve seen a couple more already.
I’d think that Flickr would not have many feet left to shoot themselves in. Apparantly Yahoo’s lawyers (whom I guess are the initiatiors of these cock-ups) however are good at finding more feet for Flickr to keep shooting.

First let me mention a couple of ‘minor’ issues that we saw recently.
The smallest one was making it mandatory to have a Yahoo-ID to use Flickr. This upset the community because they don’t see themselves as a Yahoo customer but a Flickr customer. Confusing your customers with mixing your different brands is not a good idea.

Being Cut Off if You Stand Out
Last month there was the removal of a photo and comments of Rebekka Godleifsdottir without warning. Presumably because some people in the comments uttered threats to a UK company that had been violating Godleifsdottirs copyright. Also apparantly this got to the attention of Flickr staff because of the high number of page views and comments the photo attracted. They in the end admitted their mistake and apologized.
Recently Flickr changed the way content is categorized and filtered.

From now on Flickr users should actively moderate their own content. Which in itself is not too much to ask. But the thing is they ask me to mark photo’s that might be insulting or unprudent to a ‘global’ audience as moderate or even restricted. This can be interpreted as a call to moderate everything according to the smallest common denominator. My pictures that show women e.g. talking to males that are not their relatives in public will certainly feel offensive to some people. But of course that is unenforceable, as Flickr staff well know.
I received a cheery message my account was considered ‘safe’, as if that should make my day. But what was irritating that suddenly I saw greyed out pictures when visiting friends’ photo streams.

Switching off the ‘Safety Filter’ that Flickr provides me with as a great new functionality, which they default to Safe (which means their default is to not let you decide to see less information, but let you decide to see more information. A plain weird standpoint in the age of information abundance/overflow), showed that the filtered out stuff consisted of screenshots and graphics. The kind of thing they filtered out of public search before, because Flickr is a photo-site.

Other users however saw their entire account being flagged ‘Restricted’. Without notice, and with very slow response as to why it happened, and how to change it. In the linked case, the trigger again seems to be a response to a) complaints, but apparantly without checking the validity b) a high number of views and comments (as if that alone indicates something dodgy. Seems like projection on the side of the Flickr Staff to me: only naughty stuff attracts eyeballs). That is a repeating pattern so it seems.

Again Flickr admitted their mistake, and apologized, but again it took decisive action on behalf of the customer.
So we have as a pattern:
If you attract attention, you’ll be flagged as suspect.
If we change something, we won’t tell you first, but wait until you complain.
We are slow to respond.

Dumping PayPal and Other Payment Woes
Yahoo is promoting their own payment system (Yahoo Wallet) which supports creditcards only (at least outside the US). A lot of European users do not own a credit card, because you can do almost anything with your debitcard across the entire continent, and yearly fees for credit cards are often high. That is why PayPal is popular, as you can connect it to your bank account.
But they’ve cut PayPal as a payment option. Again without warning. Leaving scores of users without credit card with no way to continue their Pro account by paying through PayPal. And without time to arrange a different solution, because there was no warning the service would be cut.

Also those that use the Portuguese language version of Flickr, suddenly find themselves left with using a Brazilian e-banking option only to pay. Which of course is entirely logical if you live in Portugal, isn’t it?
Confusing languages with countries is a major no-no guys. Useability 101.

Offering Localized Versions with Easter Egg
The really big issue this week is the start of localized versions. While the official blog was extolling the parties around the launch, and how the Flickr team was jetting around the world, the users in Germany, Hong Kong, Singapore and Korea found a little easter egg in those localized versions: they cannot decide their own Safety Filter settings. It is on Safe always, if you have a Yahoo ID based in those countries.

Of course this means that those Swiss and Austrian users that created a German Yahoo ID because they wanted to enjoy a German speaking site, now also see their filters in Flickr being locked in Safe mode. Confusing languages with nations again. This means ‘flowers and landscapes only’ for the German speaking users. Even for your own photo’s. Meanwhile Yahoo’s stockholders rejected a principled stand on censorship.
Again, this change was effected without warning. Again response has been extremely slow when users started to complain. German users demand to know what the legal basis is for this decision, but only get vague indications (e.g. age verification is mentioned) that don’t make much sense at all (except that they seem to be taken pro-actively out of fear, real or imagined). An action accusing Flickr of widespread censorship ensued.

Until now Flickr staff only let their customers know how painfull it is to them, and how sorry they feel, but no tangible information as to reasons why is forthcoming.


(censorship has been a hot tag in the past week on Flickr)

It all boils down to this, from my viewpoint:

  • Flickr is currently treating their customers as objects, whereas the customers see themselves and Flickr staff as a community.
  • Flickr is taking measures without informing their customers, or giving them a chance to prepare for those changes.
  • Flickr is stonewalling requests for information.

Meanwhile customers are considering their options, putting uploading on hold, and moving away to other services (such as the Danish 23 and Zoomr)
Flickr, in short, is flushing their brand down the drain. Or rather Yahoo is, as Flickr staff seem to feel predominantly sorry for themselves at this point.

(a good overview, if you read German, of what is going on in the German blogosphere can be found at Sprechblase, by Cem Basman from Hamburg)