One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?”
I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.
Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world. The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.
The drivers for increased exploitation of external data are mainly:
- Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
- Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
- Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free
As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.