In a conversation in Prague last weekend, I formulated some thoughts on data quality I am blogging here so I can find them back again later.
Often in the context of opening up government data the data quality gets mentioned as a barrier. Data quality, or rather absence thereof, is put forward as a reason to not publish the data, or as a reason why re-use is not happening. (To the former Andrew Stott always replies that keeping the data inside government for the past decades has not improved it, so why think not publishing now would change anything?)
To me data quality is not an intrinsic aspect of the data. It is an external aspect. Data quality only becomes visible, gets noticed, in the context of usage. The job for which the data is being used determines whether the data is of the right quality to do so.
Also data quality is not the same as data granularity.
Only through making data available for re-use, and attempting to re-use that data in various settings, do notions of quality and questions on quality get formulated and discussed, and eventually dealt with (such as when Open Street Map corrected the location of 18.000 out of 360.000 busstops in the UK). This then may or may not reflect back on the public task for which the data was originally collected, and hence on the original data collection process.