One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?”
I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.
Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world. The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.
The drivers for increased exploitation of external data are mainly:
- Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
- Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
- Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free
As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.
I believe movement toward data quality in the cloud will also increase the need to move ETL to the cloud as well. bottom line …. get your head in the clouds!
great topic, Henrik! looking forward to more cloud based conversations!
Great post Henrik,
I definitely agree that the future of data quality is cloudy.
On a side note, perhaps we need a new name for it that could make our future seem brighter. Perhaps we should call it the Solar Web instead of the Semantic Web? Then we could say that the outlook for data quality is always sunny when you use the Solar Web.
P.S. Thanks for the link and the kind words 🙂
Thanks William and Jim.
I agree about ETL and oh yes, it’s always sunny above the clouds – I remember that from way back when it was possible to go by airplanes here in Northern Europe.
I’m with you all the way – “to infinity and beyond”!
Seriously though, as Data Quality professionals we need to be aware of, and to guide our clients about, the “move to the cloud” (e.g. Cloud based CRM from Salesforce.com), and the availability of quality external reference data in the cloud.
As William points out, moving date into a cloud based CRM will pose the same ETL challenges as a traditional “land based” migration.
Great post, looking forward to lots more debate,
Another great post Henik…
Just like Ken and William states, it does not matter where the data sits, it is how it is leveraged, consumed, created and managed…
As exciting it is to see Cloud Computing, I am just as excited where DQ & MDM will grow to be part of the cloud…
Thanks Ken and Garnie for commenting.
I think we will see an evolution with ETL, MDM and DQ when enterprises continue to embrace the cloud. Things will get more real-time and real-world.
We have been evangelizing and building upstream (Cloud) data quality solutions for years. It’s nice to see IT and marketing professionals begin to take it seriously.
Thanks Mark. I like your website at Ikhana – both the information and the fish.
Also thanks for the mention in your blog section.
Cloud computing will certainly lead to greater web services driven data quality assurance. However, whether that will lead to better data quality at OLTP data entry will only be decided by the efficiency and reliability of web services facilitated data quality. Are we there yet? Until then data cleansing and standardization by ETL in the DW and BI spaces will be the lifeline for quality information.
Thanks Pugazendhi. I agree, the reality today is very much about cleansing during ETL when loading data from operational applications into our DW’s and other batch processes like during migration.
We are certainly not there yet, as I see it. We only just started with the beginning.
I’m inclined to agree with Pugazendhi’s take on the current state of our industry. Traditional ETL tools still provide the best tools for data quality initiatives overall.
But traditional ETL tools usually do not provide the methods to deal with data that is used in real time. Many of our customers use very sophisticated data quality tools. From an IT perspective, these tools are employed in a manner that historically “works.” From the standpoint of the business owner of the data, they do not “work” well.
For example; one of our customers utilizes a Trillium instance to apply data rule based transformations to records coming in from various web sites. These records are stored in a temporary database, usually for a period of days before they are moved into the marketing database that the business uses for campaign management. By then the data is “old” as far as the business owner is concerned.
Many of the data transformations can be handled upstream, at the datasource. This is what our solutions are designed to accomplish. Can we do everything a traditional ETL DQ solution can do? Obviously not, nor would we claim it. But what we can do, in milliseconds not days, is transform and enhance the data; make it more useable in real time; help the receiving application’s matching process work much more efficiently; and, most importantly, give the business owner a process that they feel meets their needs.
As Henrik has noted: we have to start somewhere.