The 2016 Magic Quadrant for Data Quality Tools by Gartner is out. One way to have a free read is downloading the report from Informatica, who is the most-top-right vendor in the tool vendor positioning.
Apart from the vendor positioning the report as always contains valuable opinions and observations about the market and how these tools are used to achieve business objectives.
Interenterprise data sharing is the last mentioned scenario besides BI and analytics (analytical scenarios), MDM (operational scenarios), information governance programs, ongoing operations and data migrations.
Another observation is that 90% of the reference customers surveyed for this Magic Quadrant consider party data a priority while the percentage of respondents prioritizing the product data domain was 47%.
My take on this difference is that it relates to interenterprise data sharing. Parties are per definition external to you and if your count of business partners (and B2C customers) exceeds some thousands (that’s the 90%), you need some of kind of tool to cope with data quality for the master data involved. If your product data are internal to you, you can manage data quality without profiling, parsing, matching and other core capabilities of a data quality tool. If your product data are part of a cross company supply chain, and your count of products exceeds some thousands (that’s the 47%), you probably have issues with product data quality.
In my eyes, the capabilities of a data quality tool will also have to be balanced differently for product data as examined in the post Multi-Domain MDM and Data Quality Dimensions.
Every organization needs Master Data Management (MDM). But does every organization need a MDM tool?
In many ways the MDM tools we see on the market resembles common database tools. But there are some things the MDM tools do better than a common database management tool. The post called The Database versus the Hub outlines three such features being:
- Controlling hierarchical completeness
- Achieving a Single Business Partner View
- Exploiting Real World Awareness
Controlling hierarchical completeness and achieving a single business partner view is closely related to the two things data quality tools do better than common database systems as explained in the post Data Quality Tools Revealed. These two features are:
- Data profiling and
- Data matching
Specialized data profiling tools are very good at providing out-of-the-box functionality for statistical summaries and frequency distributions for the unique values and formats found within the fields of your data sources in order to measure data quality and find critical areas that may harm your business. These capabilities are often better and easier to use than what you find inside a MDM tool. However, in order to measure the improvement in a business context and fix the problems not just in a one-off you need a solid MDM environment.
When it comes to data matching we also still see specialized solutions that are more effective and easier to use than what is typically delivered inside MDM solutions. Besides that, we also see business scenarios where it is better to do the data matching outside the MDM platform as examined in the post The Place for Data Matching in and around MDM.
Looking at the single MDM domains we also see alternatives. Customer Relation Management (CRM) systems are popular as a choice for managing customer master data. But as explained in the post CRM systems and Customer MDM: CRM systems are said to deliver a Single Customer View but usually they don’t. The way CRM systems are built, used and integrated is a certain track to create duplicates. Some remedies for that are touched in the post The Good, Better and Best Way of Avoiding Duplicates.
With product master data we also have Product Information Management (PIM) solutions. From what I have seen PIM solutions has one key capability that is essentially different from a common database solution and how many MDM solutions, that are built with party master data in mind, has. That is a flexible and super user angled way of building hierarchies and assigning attributes to entities – in this case particularly products. If you offer customer self-service, like in eCommerce, with products that have varying attributes you need PIM functionality. If you want to do this smart, you need a collaboration environment for supplier self-service as well as pondered in the post Chinese Whispers and Data Quality.
All in all the necessary components and combinations for a suitable MDM toolbox are plentiful and can be obtained by one-stop-shopping or by putting some best-of-breed solutions together.
In a blog post called JUDGEMENT DAY FOR DATA QUALITY published yesterday Forrester analyst Michele Goetz writes about the future of data quality tools.
“Data quality tools need to expand and support data management beyond the data warehouse, ETL, and point of capture cleansing.”
“The real test will be how data quality tools can do what they do best regardless of the data management landscape.”
As described in the post Data Quality Tools Revealed there are two things data quality tools do better than other tools:
- Data profiling and
- Data matching
Some of these new challenges I have worked with within designing tomorrow’s data quality tools are:
- Point of capture profiling
- Searching using data matching techniques
- Embracing social networks
Point of capture profiling:
The sweet thing about profiling your data while you are entering your data is that analysis and cleansing becomes part of the on-boarding business process. The emphasis moves from correction to assistance as explained in the post Avoiding Contact Data Entry Flaws. Exploiting big external reference data sources within point of capture is a core element in getting it right before judgment day.
Searching using data matching techniques:
Error tolerant searching is often the forgotten capability when core features of Master Data Management solutions and data quality tools are outlined. Applying error tolerant search to big reference data sources is, as examined in the post The Big Search Opportunity, a necessity to getting it right before judgment day.
Embracing social networks:
The growth of social networks during the recent years has been almost unbelievable. Traditionally data matching has been about comparing names and addresses. As told in the post Addressing Digital Identity it will be a must to be able to link the new systems of engagement with the old systems of record in order to getting it right before judgment day.
How have you prepared for judgment day?
When working with data quality improvement it is crucial to be able to monitor how your various ways of getting better data quality is actually working. Are things improving? What measures are improving and how fast? Are there things going in the wrong direction?
Recently I had a demonstration by Kasper Sørensen, the founder of the open source data quality tool called DataCleaner. The new version 3.0 of the tool has comprehensive support of monitoring how data quality key performance indicators develop over time.
What you do is that you take classic data quality assessment features as data profiling measurements of completeness and duplication counting. The results from periodic executing of these features are then attached to a timeline. You can then visually asses what is improving, at what speed and eventually if anything is not developing so well.
Continuously monitoring how data quality key performance indicators are developing is especially interesting in relation to using concepts of getting data quality right the first time and follow up by ongoing data maintenance through enrichment from external sources.
In a traditional downstream data cleansing project you will typically measure completeness and uniqueness two times: Once before and once after the executing.
With upstream data quality prevention and automatic ongoing data maintenance you have to make sure everything is running well all the time. Having a timeline of data quality key performance indicators is a great feature for doing just that.
Right now the yearly paramount in cycling sport Le Tour de France is going on and today is probably the hardest stage in the race with three extraordinary climbs. In cycling races the climbs are categorized on a scale from 4 (the easiest) to 1 (the hardest) depending on the length and steepness. And then there are climbs beyond category, being longer and steeper than usually, like the three climbs today. The description in French for such extreme climbs is “hors catégorie“.
Within master data management categorization is an important activity.
We categorize our customer master data for example depending on what kind of party we dealing with like in the list here called Party Master Data Types that I usually use within customer data integration (CDI). Another way of categorizing is by geography as the data quality challenges may vary depending on where the party in question resides.
In product information management (PIM) categorization of products is one of the most basic activities. Also here the categorization is important for establishing the data quality requirements as they may be very different between various categories as told in the post Hierarchical Completeness.
But there are always some master data records that are beyond categorization in order to fulfill else accepted requirements for data quality as I experienced in the post Big Trouble with Big Names.
This ninth Data Quality World Tour blog post is about Greece, a favorite travel destination of mine and the place of origin of so many terms and thoughts in today’s civilization.
Super senior citizens
Today Greece has a problem with keeping records over citizens. A recent data profiling activity has exposed that over 9,000 Greeks receiving pensions are over 100 years old. It is assumed that relatives has missed reporting the death of these people and therefore are taking care of the continuing stream of euro’s. News link here.
I found those good advices for you, when going to Greece today:
Timeliness: When coming to dinner, arriving 30 minutes late is considered punctual.
Accuracy: Under no circumstances should you publicly question someone’s statements.
Uniqueness: Meetings are often interrupted. Several people may speak at the same time.
(We all have some Greek in us I guess).
Previous Data Quality World Tour blog posts:
As part of my work I deal with data from different countries. In the below figure I have put in some examples of different presentations of the same data from some of the countries I meet the most being Denmark (DK), Germany (DE), France (FR), United States (US) and United Kingdom (GB):
Click on figure to enlarge.
I have some more information on the issues regarding the different attributes: