If I look at my journey in data quality I think you can say, that I started with working with the good way of implementing data quality tools, then turned to some better ways and, until now at least, is working with the best way of implementing data quality technology.
It is though not that the good old kind of tools are obsolete. They are just relieved from some of the repeating of the hard work in cleaning up dirty data.
The good (old) kind of tools are data cleansing and data matching tools. These tools are good at finding errors in postal addresses, duplicate party records and other nasty stuff in master data. The bad thing about finding the flaws long time after the bad master data has entered the databases, is that it often is very hard to do the corrections after transactions has been related to these master data and that, if you do not fix the root cause, you will have to do this periodically. However, there still are reasons to use these tools as reported in the post Top 5 Reasons for Downstream Cleansing.
The better way is real time validation and correction at data entry where possible. Here a single data element or a range of data elements are checked when entered. For example the address may be checked against reference data, phone number may be checked for adequate format for the country in question or product master data is checked for the right format and against a value list. The hard thing with this is to do it at all entry points. A possible approach to do it is discussed in the post Service Oriented MDM.
The best tools are emphasizing at assisting data capture and thus preventing data quality issues while also making the data capture process more effective by connecting opposite to collecting. Two such tools I have worked with are:
· IDQ™ which is a tool for mashing up internal party master data and 3rd party big reference data sources as explained further in the post instant Single Customer View.
· Product Data Lake, a cloud service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information. This service is described in detail here.
The other day Joy Medved aka @ParaDataGeek made this tweet:
Indeed, upstream prevention of bad data to enter our databases is sure the better way compared to downstream data cleaning. Also real time enrichment is better than enriching long time after data has been put to work.
That said, there are situations where data cleaning has to be done. These reasons were examined in the post Top 5 Reasons for Downstream Cleansing. But I can’t think of many situations, where a downstream cleaning and/or enrichment operation will be of much worth if it isn’t followed up by an approach to getting it first time right in the future.
If we go a level deeper into data quality challenges, there will be some different data quality dimensions with different importance to various data domains as explored in the post Multi-Domain MDM and Data Quality Dimensions.
With customer master data we most often have issues with uniqueness and location precision. While I have spend many happy years with data cleansing, data enrichment and data matching tools, I have during the last couple of years been focusing on a tool for getting that first time right.
Product master data are often marred by issues with completeness and (location) conformity. The situation here is that tools and platforms for mastering product data are focussed on what goes on inside a given organization and not so much about what goes on between trading partners. Standardization seems to be the only hope. But that path is too long to wait for and may in some way be contradicting the end purpose as discussed under the post Image Coming Soon.
So in order to have a first time right solution for product master data sharing, I have embarked on a journey with a service called the Product Data Lake. If you want to join, you are most welcome.
PS: The product data lake also has the capability of catching up with the sins of the past.
The previous post on this blog was called Informatica without Data Quality? This post digs into the messaging around the recent takeover of Informatica and the future for the data quality components in the Informatica toolbox.
In the comments Julien Peltier and Richard Branch discusses the cloud emphasis in the messaging from the new Informatica owners and especially the future of Master Data Management (MDM) in the cloud.
My best experience with MDM in the cloud is with a service called iDQ™ – a service that shares TLA (Three Letter Acronym) with Informatica Data Quality by the way. The former stands for instant Data Quality. This is a service that revolves around turning your MDM inside-out as latest touched on this blog in the post The Pros and Cons of MDM 3.0.
iDQ™ specifically deals with customer (or rather party) master data, how to get this kind of master data right the first time and how to avoid duplicates as explored in the post The Good, Better and Best Way of Avoiding Duplicates.
A recent post on this blog was called Three Stages of MDM Maturity. This post ponders the need to extend your Master Data Management (MDM) solution to external business partners and take more advantage of third party data providers. We may call this MDM 3.0.
In a comment on LinkedIn Bernard PERRINEAU says:
Starting with the most often mentioned point against extending your MDM solution to the outside Vipul Aroh of Verdantis rightfully in a comment to the post mentions a wide spread hesitancy around. I think/hope this hesitancy is the same as the hesitancy we saw when Salesforce.com first emerged. Many people didn’t foresee a great future for Salesforce.com, because putting your customer base into the cloud was seen as a huge risk. But eventually the operational advantages in most cases have trumped the thought risks.
Ironically the existents of CRM systems, in the cloud or not, is a hindrance for MDM solutions to be system of entry or support data entry for the customer master data domain. I remember when talking to a MDM vendor CEO about putting such features for customer data entry into a MDM solution his reply was something like: “Clients don’t want that, they want to consolidate downstream”. I think it is a pity that “clients want” to automate the mess and that MDM and other vendors wants to help them with that.
That said, there are IT system landscape circumstances to be overcome in order to put your MDM solution to the forefront.
But when doing that, and even when starting to do that, the advantages are plentiful. A story about a start of such a journey for customer master data is shared in the post instant Data Quality at Work. This approach is examined more in the post instant Single Customer View. To summarize you will gain both on getting data quality right the first time and at the same time save time (and time is money) in the data collection stage.
When it comes to product master data I think everyone working in that field acknowledges the insanity in how the same data are retyped, or messed around in spreadsheets, between manufactures, distributors, retailers and end users. Some approaches to overcome this are explored in the post Sharing Product Master Data. Each of these approaches has their pros and cons.
The rise of big data also points in the direction of having your MDM solution exposed to the outside as touched in the post Adding 180 Degrees to MDM.
The concept of MDM aware applications have been around for some time. What the Master Data Management establishment, including yours truly, is hoping for, is that applications like CRM, ERP and other systems will start to utilize the master entities in MDM solutions instead of having their own more or less useful data models within data silos around master data entities as parties, products, locations and assets as well as exploiting other good structures and services in the MDM realm.
But what about MDM solutions themselves? Are MDM solutions that smug that they don’t take in good capabilities from other MDM solutions?
One reason to do so is if a MDM vendor have several MDM solutions to offer. An example of that I experienced recently was when attending the Informatica MDM day for EMEA in London the other day. Informatica has recently acquired the Product MDM specialist firm Heiler and has therefore two MDM solutions to offer to the market. It has been too early for the newest version 10 of the general Informatica MDM solution to embrace the Heiler solution, so what I learned from one of the good now Informatica folks was that the Heiler solution is becoming MDM aware – at least aware of the Informatica MDM version 10 solution I guess.
On another front I’m working with the iDQ™ MDM Edition. Here we do have a default data model for party master entities, but we are not that smug that we can’t be aware of other MDM solutions and their capabilities in a given IT landscape. Even in the party domain.
Last week I had some fun making a blog post called The True Leader in Product MDM. This post was about how product Master Data Management still in most places is executed by having heaps of MS Excel spreadsheets flowing around within the enterprise and between business partners, as I have seen it.
When it comes to customer Master Data Management MS Excel may not be so dominant. Instead we have MS CRM and the competing offerings as Salesforce.com and a lot of other similar Customer Relationship Management solutions.
CRM systems are said to deliver a Single Customer View. Usually they don’t. One of the reasons is explained in the post Leads, Accounts, Contacts and Data Quality. The way CRM systems are built, used and integrated is a certain track to create duplicates.
Some remedies out there includes periodic duplicate checks within CRM databases or creating a federated Customer Master Data Hub with entities coming from CRM systems and other databases with customer master data. This is good, but not good enough as told in the post The Good, Better and Best Way of Avoiding Duplicates.
During the last couple of years I have been working with the instant Data Quality service. This MDM service sits within or besides CRM systems and/or Master Data Hubs in order to achieve the only sustainable way of having a Single Customer View, which is an instant Single Customer View.
There is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.
The same phenomenon is often seen in data management as told in the post Foreign Affaires.
If we for example work with postal addresses we tend to think that postal addresses in our own country has a well-known structure while foreign addresses is a total mess.
In Denmark where I am born and raised and has worked most of my life we have two ways of expressing an address:
- The envelope way where there are a certain range of possibilities especially on how to spell a street name and how to write the exact unit within a high rise building, though there is a structure more or less known to native people.
- The code way, as every street has a code too and there is a defined structure for units (known as the KVHX code). This code is used by the public sector as well as in private sectors as financial services and utility companies and this helps tremendously with data quality.
But around 3.5 percent of Danes, including yours truly, has a foreign address. And until now the way of registering and storing those addresses in the public sector and elsewhere has been totally random.
This is going to change. The public authorities has, with a little help from yours truly, made the first standard and governance principles for foreign addresses as seen in this document (in Danish).
At iDQ A/S we have simultaneously developed Master Data Management (MDM) services that helps utility companies, financial services and other industries in getting foreign addresses right as well.