Author Archives: Ton Zijlstra

Student’s Six Big Data Lessons

Students from a minor ‘big data’ at the local university of applied sciences presented their projects a few weeks ago. As I had done a session with them on open data as a guest lecturer, I was invited to the final presentations. From those presentations in combination several things stood out for me. Things that I later repeated to a different group of students at the Leeuwarden university of applied sciences at the begining of their week of working on local open data projects for them to avoid. I thought I’d share them here too.

The projects students created
First of all let me quickly go through the presented projects. They were varied in types of data used, and types of issues to address:

  • A platform consulting Lithuanian businesses to target other EU markets, using migration patterns and socio-economic and market data
  • A route planner comparing car and train trips
  • A map combining buildings and address data with income per neighborhood from the statistics office to base investment decisions on
  • A project data mining Riot Games online game servers to help live-tweak game environments
  • A project combining retail data from Schiphol Airport with various other data streams (weather, delays, road traffic, social media traffic) to find patterns and interventions to increase sales
  • A project using the IMDB moviedatabase and ratings to predict whether a given team and genre have a chance of success

Patterns across the projects
Some of these projects were much better presented than others, others were more savvy in their data use. Several things stood out:

1) If you make an ‘easy’ decision on your data source it will hurt you further down your development path.

2) If you want to do ‘big data’ be really prepared to struggle with it to understand the potential and limitations

To illustrate both those points:
The Dutch national building and address database is large and complicated, so a team had opted to use the ‘easier’ processed data set released by a geodata company. Later they realized that the ‘easier’ dataset was updated only twice per year (the actual source being updated monthly), and that they needed a different coordinates system (present in the source, not in the processed data) to combine it with the data from the statistical office.

Similarly the route planner shied away from using the open realtime database on motorway traffic density and speed, opting for a derivative data source on traffic jams and then complaining that came in a format they couldn’t really re-use and did not cover all the roads they wanted to cover.
That same project used Google Maps, which is a closed data source, whereas a more detailed and fully open map is available. Google Maps comes with neat pre-configured options and services but in this case they were a hindrance, because they do not allow anything outside of it.

3) You must articulate and test your own assumptions

4) Correlation is not causation (duh!)

The output you get from working with your data is colored by the assumptions you build into your queries. Yes average neighbourhood income can likely be a predictor for certain investment decisions, but is there any indication that is the case for your type of investment, in this country? Is entering the Swedish market different for a Lithuanian company from let’s say a Greek one? What does it say about the usefulness of your datasource?

Data will tell you what happened, but not why. If airport sales of alcohol spike whenever a flight to Russia arrives or leaves (actual data pattern) can that really be attributed to the 2-300 people on that plane, or are other factors at work that may not be part of your data (intercontinental flights for instance that have roughly the same flight schedule but are not in the data set)?

Are you playing around enough with the timeline of your data, to detect e.g. seasonal patterns (like we see in big city crime), zooming out and zooming in enough, to notice that what seems a trend maybe isn’t.

5) Test your predictions, use your big data on yourself

The ‘big’ part of big data is that you are not dealing with a snapshot or a small subset (N= is a few) but with a complete timeline of the full data set (N = all). This means you can and need to test your model / algorithm / great idea on your own big data. If you think you can predict the potential of a movie, given genre and team, then test it with a movie from 2014 where you know the results (as they’re in your own dataset) on the database from before 2014 and see if your algorithm works. Did Lithuanian companies that already have entered the Swedish market fail or flourish in line with your data set? Did known past interventions into the retail experience have the impact your data patterns suggest they should?

6) Your data may be big, but does it contain what you need?

One thing I notice with government data is that most data is about what government knows (number of x, maps, locations of things, environmental measurements etc), and much less about what government does (decisions made, permits given, interventions made in any policy area). Often those are not available at all in data form but hidden somewhere in wordy meeting minutes or project plans. Financial data on spending and procurement is what comes closest to this.

Does your big data contain the things that tell what various actors around the problem you try to solve did to cause the patterns you spot in the data? The actual transactions of liquor stores connected to Russian flight’s boarding passes? The marketing decisions and their reasons for the Schiphol liquor stores? The actions of Lithuanian companies that tried different EU markets and failed or succeeded?

Issue-driven, not data-driven, and willing to do the hard bits
It was fun to work with these students, and there are a range of other things that come into play. Technical savviness, statistical skills, a real understanding of what problem you are trying to solve. It’s tempting to be data-driven, not issue-driven even if in the end that brings more value. With the former the data you have is always the right data, but with the latter you must acknowledge the limitations of your data and your own understanding.

Like I mentioned I used these lessons in a session for a different group of students in a different city, Leeuwarden. There a group worked for a week on data-related projects to support the city’s role as cultural capital of Europe in 2018. The two winning teams there both stood out because they had focussed very much on specific groups of people (international students in Leeuwarden, and elderly visitors to the city), and really tried to design solutions starting with the intended user at the center. That user-centered thinking really turned out to be the hardest part. Especially if you already have a list of available data sets in front of you. Most of the teacher’s time was spent on getting the students to match the datasets to use cases, and not the other way around.

My Radar, Finding New Sources of Interest

People often ask me how I stay informed, and always seem to know even about smaller initiatives around the topics I work on. Part of that is what I call ‘Radar’. With Radar I automatically collect all the Twitter messages that mention keywords I am interested in, and detect the web addresses they mention. Those web addresses are evaluated on their type (is it a blog, a video, a general site, a presentation, a photo?) and counted as to how often they are mentioned.

runningtotalsradar

Running totals for Radar: found 350k people, mentioning over 1 million URLs


Radar then presents me with overviews of all URLs mentioned on Twitter in the past day, or week, on the key words I follow. This way I find not just the ‘big’ websites, but also the smaller events, initiatives and discussions, that are mentioned by smaller communities. Next to URLs Radar also tracks who is mentioning certain topics, which basically gives me a list of suggestions of who to maybe follow on Twitter, or who’s profile I may want to look at to see if they also blog about the topics I am interested in.

urlmentionsopendata

Most mentioned URLs in 4566 tweets on Open Data in past 24 hours

peoplementioningfablab

The 47 people tweeting about FabLabs today, new people highlighted


What comes out of my Radar then may get added to my feedreader, or to my bookmark collection, or to my notes collection in Evernote. Radar is the serendipity antenna that scoops up a wide variety of things. To me, whatever is being mentioned on Twitter is like the froth on the waves: it is not all that meaningful by itself, but shows me where there is movement and energy of interaction. That points me to the places and people that make up the wave below the froth. Which is where the significant info is.

Radar at first was a bunch of php scripts I wrote myself that ran on my laptop and which I started manually in sequence. My coding skills aren’t all that great though, so ultimately I asked Flemming Funch to clean things up for me. That meant he coded the scripts from scratch, with only my original outline of what I wanted remaining. Now it runs permanently on my VPS with a basic web front-end for me to explore the output (see screenshots).

Using Lime Survey Locally for Self Reflection

I have installed the open source survey tool Lime Survey on my laptop’s local web server, to use it for self reflection.

Through the years I have often journaled and measured various parts of my life, and I track certain aspects of my daily routines for habit forming. Not in the sense of Quantified Self (which is more about measuring things with sensors, like number of steps taken), but along the lines of things I do (did I blog 2 times this week, did I initiate new business contacts). Journaling I’ve done on and off, mostly when I didn’t feel ok, but it takes quite a bit of time to write, and more importantly, the journaling can’t be used to e.g. detect patterns and correlations. So I was looking for a way to combine my normal tracking and measuring with the things I experienced. This can be done by combining capturing personal experiences with asking questions about those experiences and other tracking questions. Using the questions as context and metadata for the experiences, you can then look at patterns across experiences. What you end up with however is not a journal (which you can use a locally hosted blog or physical journal for), nor a list of measurements (which you can use a spreadsheet for), but more a survey with a need to do some statistical analysis on the output. So I needed a survey tool, and given the personal nature of the data, I don’t want to use a service or server where the data is outside my own control.

This is the set-up I now use:

  • MAMP (a package of Apache, MySQL and PHP for Mac), which I already was running for various others things such as a locally hosted blog, test environments and php scripts I regularly use)
  • Lime Survey installed on MAMP. Limesurvey is an open source survey tool, which allows you to define surveys with a wide variety of questions types (and you can play around with building your own as well).

The survey I created, and will be testdriving in the coming weeks, has three distinct question blocks. A block asking questions about what happened during the day and stood out, and why it stood out for me. A block with more mindfulness oriented questions on how I felt in the here and now. A block about my current outlook. All in all a mix of qualitative and quantitative elements. Self administered participatory narrative inquiry of sorts, so I’ve dubbed it self-pni. Let’s see if it provides some insights in the coming three months.

man in mirror
Looking at the man in the mirror with Lime Survey

The 2014 Tadaa! List

Another year is coming to a close, so keeping up with my tradition of the last few years (since 2010, see last year’s edition) I am writing down the things in the past 12 months that gave me a sense of accomplishment or joy. It is often easy to focus on things not achieved, or left unfinished, as those are the things demanding attention. Often I find that in my daily routines I focus on what’s next, and I tend to forget a lot of what I actually did do. Obviously any year also has its hard moments, disappointments and failures. So to remind myself that this year was a full year where things happened that I loved doing or enjoyed (sticking to mostly business related, some personal), here’s the ‘Tadaa!-list’ of 2014

  • With Marc, Paul and Frank, I formally incorporated The Green Land and had our first (temporary) employee
  • Got to work with the supreme audit authority on the Dutch first national ‘Trend report Open Data’, and now working on the next edition
  • Did an open data workshop with the Dutch and British supreme audit authorities with an audience of all European audit authorities, as well as a study day with the Belgian and Dutch audit authorities. Impressed with their dedication and professional attitude. (It does of course help clarity, if your mission statement is in the constitution)
  • Worked for the Flemish Chancellary on open data scenario’s for their consolidated database of laws and regulations
  • Explored internet security and privacy in more detail, geeking out on running my own cloud in a Swiss datacenter
  • Spent a week and a half in Berlin with Elmine exploring and learning, visiting conferences like Things Con and Re:Publica, while also spending time just hanging out with fun people locally

    Out of comfortzone behind a sewing machine
  • Got to (finally!) visit Gabriela and Ray in Limerick where Elmine and I both presented at 3D Camp at the University of Limerick
    @ the beach
    With Gabriela, Ray and Elmine on an Irish beach
  • Presenting with Ernst and Elmine at Sia’s retirement farewell party, and feeling the lasting impact, emotion, and energy of our work together in Rotterdam 2007-2009, reconnecting to several team members. It is a rare treat to get to see the ripples of a (personal) change process years on like that. I was honored by your invitation, Sia.
    L1020751
    Cocreated Sia’s Lifehack Calendar with party participants
  • Stepped out of a large tendering process that could have provided for 3 years because it felt all wrong, realizing I can’t stomach the opportunists who aren’t really interested in delivering value, just feeding at the trough
  • Quit working on a company I was helping establish, even though it has loads of potential (realistically more than my other activities even), because I needed to free up thinking time and shed energy sinks
  • Worked in France, Belgium, Switzerland, Germany, Denmark, Netherlands, Kazachstan, and Kyrgyzstan, enjoying the differences in stories, experiences, perspectives and outlooks that it provides
    In Bishkek, Kyrgyzstan
    Bishkek, Kyrgyzstan, against mountains
  • Spent a thoroughly enjoyable and relaxing summer week in a gem of an apartment in Copenhagen with Elmine, just enjoying each other, the sun and the city
  • Organized the Make Stuff That Matters Unconference & BBQ, at our home, bringing friends, clients, peers, family, and strangers together for two exciting days of inspiration, with the outstanding help of the Frysklab team and their mobile FabLab
  • Got to be there with and for friends in good and bad times, which is the definition of being alive and human
  • Taking more time with Elmine to explore exhibits, festivals, such as Gogbot, Dutch Design Week, Rijksmuseum Amsterdam, 3D Print Canal House, Reina Sofia Museum, Smart New World in Düsseldorf, Ai WeiWei in Berlin etc.
    Ai Wei Wei
    Worked on seeing, noticing more
  • Better balanced long term goals and dreams with actions across quarters of the year, yielding improved results.
  • Worked for the World Bank as a senior consultant / external expert on open data readiness
    At GEGF2014 in Astan
    Presenting in Kazachstan at Global e-Gov Forum
  • Got to celebrate the 3 year existence of the local Enschede FabLab which I helped start, still going strong and having yielded a wide variety of amazing projects
  • Knowing we’ve touched people, and made it possible for others to inspire people, with MSTM, based on the beautiful feedback we got, and seeing the ripples propagate in Denmark, Netherlands and Canada
  • Ending the year with a final dinner at a great Swiss restaurant that is closing, in the excellent company of dear friends

My absolute highlight in 2014 was our third birthday unconference in June, Making Stuff That Matters. Not only because of the energy and joy we got from getting to host such an amazing bunch of people at our home, but also because of the things Elmine and I did in the run up to prepare (in Berlin and Limerick e.g.), the help we got doing that (thanks @trox!), the connections we’ve seen grow from it amongst those we invited, and how it is still creating impact months later where participants have taken their own additional steps around making. It was wonderful to create the place and circumstances in which that could happen. We can’t thank all who attended enough for the gift of their participation.


Created with flickr slideshow.

A week in Kyrgyzstan on Open Data

I spent a bit more than a week in Kyrgyzstan, at the invitation of the Kyrgyz prime minister and on behalf of the World Bank, to start an open data readiness assessment and present and facilitate at the Kyrgyz Open Data Days.
Kyrgyzstan is a lower middle income country, with a parliamentary democracy. The people I met are frank, straightforward, and action oriented. Anything longer than 6 months seems to be perceived as long term. This meant that with the right introduction it was possible to arrange meetings with high level officials at short notice. Like arranging a meeting with a deputy minister during lunch for later that afternoon. I did not get to see anything really from the city or the country, except from what I could see from the car that brought me from one office building to another, and from hotel to conference center.

In Bishkek, Kyrgyzstan In Bishkek, Kyrgyzstan
Press coverage of Prime Minister opening the Kyrgyzstan Open Data Days, and my name tag in cyrillic

Towards the end of my stay the Open Data Days took place, for which many other open data people from Moldova, Georgia, Russia, USA, UK, Germany and France came on behalf of the World Bank. It was good fun to meet them, and together we pulled off a good program to kick start open data (also see World Economic Forum blog) in Kyrgyzstan. The Kyrgyz government adopted an e-governance strategy only last week, and open data is part of that new strategy. Our visit was therefore very timely. The first morning was spent explaining open data and sharing experiences with the Kyrgyz prime minister and full cabinet attending, followed by good discussions in the afternoon when we zoomed in on a slightly more practical level. There was quite a bit of press interest, and I had the opportunity to get misquoted in the Kyrgyz press. The second day we did workshops with civil society organizations, and the business community, followed by a developer meet-up in the evening. Two more meetings on the last day completed my program, before the 10 hour flight back home.

In Bishkek, Kyrgyzstan In Bishkek, Kyrgyzstan
Mountainview on a clear morning from my hotel room, and a group photo at the end of the Open Data Days

Bishkek is only a short distance from the mountains (the country’s highest peak is over 7000m), and on clear mornings form a great backdrop for the city. It was a snowy day when I left, so no views of the mountains as the plane took off. Instead I made triple selfies with Victoria from Moldova, and Vitaly from St. Petersburg in departures, as we were on the same flight back. Odd, spending a week 6000km away from home, and more or less no idea where you’ve been. I may return however in January and late spring, both for completing the assessment as well as providing ealry implementation support.

Edible Growth

3D Printed Food II: Edible Growth

While at Dutch Design Week in Eindhoven I came across ‘Edible Growth‘: 3d printed edible pastries.

The interesting part it is that spores of mushrooms and seeds of small plants are printed within a little ‘basket’ of pastry, on the basis of your design. The spores and seeds sprout and grow over a period of five days, and then your little starter is finished. If you let it grow longer it will get richer in taste (more mature mushroom, bigger green plants), allowing for your personal preference. The pastry serves as food source and packaging for the seeds and spores.
What you end up with is an edible item that comes without waste products (no packaging, no left-over material etc.)

The project was conceived by Chloe Rutzerveld. She did it as her graduation project for a bachelor in industrial design at TUe, and in cooperation with TNO, a Dutch research firm. In the past she has worked on other food related projects. Reducing agricultural foodprint, waste streams, and food miles are part of the values she incorporates in her designs.

Edible Growth
Three stages of growth after printing.

Falscher Hase

3D Printed Food I: Bugs Bunny

During the Dutch Design Week I came across Carolin Schulze’s “Bugs Bunny” (Falscher Hase in German), a project 3D printing foodstuffs using mealworms as material. She sought to work on both the general western aversion against eating insects, and the reduction of resource use to provide proteins. With a home built 3D printer she printed bunny (and grashopper) shaped snacks made of mealworms.

On 14 October she both won the public design award and the design award for most interesting experiment of the Burg Giebichenstein art academy in Halle, Germany where she is in her 2nd semester of an MA in industrial design.

Starting from ‘raising’ your own crop of mealworms, which you then shred into a past for your 3D printer, you can create various shapes that no longer generate the aversive reactions insect shapes normally create in western Europe.

Bugs Bunny / Falscher Hase Bugs Bunny / Falscher Hase
Start with mealworm raising, end up with bunny shaped 3d printed snack

Bugs Bunny / Falscher Hase
The 3d printer (working on compressed air) and some printed mealworm based edible objects

A short video showing various steps in the project was posted by Carolin Schulze

Falscher Hase / Bugs´Bunny von Carolin Schulze from Carolin Schulze on Vimeo.

Schermafbeelding 2014-10-25 om 12.56.07

40 Kids on Minecraft for 3D Printing

Today 40 kids are gathering in the coworking place Zpot in Utrecht, to build in Minecraft and then print their creations with 3D printers.

The event is organized by my colleague Frank (also initiator of the coworking space itself). He and his son Floris participated in our Make Stuff That Matters unconference & bbq where Peter and Oliver Rukavina demo’d how to 3D print from Minecraft. Floris used it right then to print a castle he built. Earlier Frank already had hosted a Minecraft party for kids in the neighbourhood. His son’s continued enthusiasm for printcrafting, in combination with the earlier event has turned into “Meet2Minecraft” today.

Minecraft / Printcraft Minecraft / Printcraft
Minecraft lan party!

Seven 3D printers (including our own trusty Ultimaker Classic, and 6 Felix printers) are lined up to print the creations of 40 children today. Pizza, soda, Minecraft and 3D printers == Perfect Saturday!

Minecraft / Printcraft Minecraft / Printcraft
Prepping 7 3D-printers for printing Minecraft designs


Minecraft / Printcraft Minecraft / Printcraft
More printcrafting kids

[update]
Comparing some of the printed results

Minecraft 3D Printing Minecraft 3D Printing

See more pictures

Audit Authorities and Open Data

Last Friday I participated in a study day of the Dutch and Belgian audit authorities (the Algemene Rekenkamer and the Rekenhof). Topic of discussion was how open data can play a role in audit work.

Noël van Herreweghe, the open data program manager of the Flemish government, first sketched the situation of open data in Flanders. Afterwards I talked about the current status of open data in the Netherlands, and the lessons learned about doing open data well from the past years. (see my slides embedded below)

A few elements that I think are relevant in the context of the work of audit authorities are:

  • current open data is mostly about what government knows, not about what government does. The latter is what matters to auditors however. More transactional data is needed, maybe from the back-end of e-government services.
  • open data can be a pre-hypothesis tool, showing patterns that generate questions or give direction to/ help focus audits on areas where it matters most.
  • open data can be used to assess impact of policies, also/specifically/even when the data is not directly describing a certain policy area, but serves as a proxy from further down the chain of causality.

  • And then there is the many-eyes aspect of open data of course: if there is a ‘scandal’ hiding in the data, it may be found more easily through increased eyeballs (although there might be more false positives/noise as well).

    We split up in groups and rotated through three short workshops exploring these notions. One session where specific audit questions were connected (or attempted) to open data sources which could contain pointers, and stakeholders involved. One session showing how free open source online tools can help clean up and explore data and show first patterns. One session with a quick routine to brainstorm indicators that can be proxies for a certain question. In this case we looked at proxy indicators for the quality of school buildings. The Dutch court of audit is currently doing a pilot involving the collection of opinions as well as pictures as part of an audit, concerning the quality of school buildings.

    Open Data Roundtable in Kazachstan

    After arriving in Kazachstan at 4AM, and a bit of rest, my first item on the schedule was key-noting at a roundtable of CIO and CTO level representatives of about a dozen CIS countries. The session was hosted at NITEC in the House of Ministries. The aim was to convey how open government data can be of value, and to provide a few starting points that the participants see possibilities to act on.

    Dashboard
    Dashboard of e-government metrics, in the hall of the House of Ministries

    My World Bank colleagues Oleg Petrov and Mikhail Bunchuk presented the World Bank work, and the ways and instruments with which it can support open data efforts of the nations present.

    Tair Sabyrgaliyev and Cornelia Amihalachioae presented the open data program of Kazachstan and the impressive e-government and open data work of Moldova (which I had opportunity to work on and experience first hand in 2012).

    My own contribution was basically a compressed Open Data course, addressing the what, why and how. My slides are embedded below in both English and Russian. (During the session I used Russian slides.)

    Also Cornelia Amihalachioae’s slides are shown below, that are well worth a read.