In a conversation in Prague last weekend, I formulated some thoughts on data quality I am blogging here so I can find them back again later.
Often in the context of opening up government data the data quality gets mentioned as a barrier. Data quality, or rather absence thereof, is put forward as a reason to not publish the data, or as a reason why re-use is not happening. (To the former Andrew Stott always replies that keeping the data inside government for the past decades has not improved it, so why think not publishing now would change anything?)
To me data quality is not an intrinsic aspect of the data. It is an external aspect. Data quality only becomes visible, gets noticed, in the context of usage. The job for which the data is being used determines whether the data is of the right quality to do so.
Also data quality is not the same as data granularity.
Only through making data available for re-use, and attempting to re-use that data in various settings, do notions of quality and questions on quality get formulated and discussed, and eventually dealt with (such as when Open Street Map corrected the location of 18.000 out of 360.000 busstops in the UK). This then may or may not reflect back on the public task for which the data was originally collected, and hence on the original data collection process.
Interesting article. Ensuring quality of the data to be shared is very important for a number of reasons.
However, I have found quality to be a relative concept. What might be high (or good) quality for me may not be of good quality for some other person. It all depends on the requirements. In my opinion, it is important to conform the data to be within the requirements of the person/application using it.
Understanding that requirement is the first and in my opinion the most critical step of quality conformance.
Hi Ankit,
I agree. Where you say “What might be high (or good) quality for me may not be of good quality for some other person. It all depends on the requirements.” I say “The job for which the data is being used determines whether the data is of the right quality to do so.”
To me this means that it is up to the re-user to bring the data into shape and at the quality level she needs, and for the government data holder just to ensure it is good enough for the public task it was collected for. I conclude from this that the government should not be worried about data quality at all: it is already at the right quality relative to the public task. Just publish as is.