Quality of Data Behind the Data Quality Magic Quadrant

Last week the Gartner Magic Quadrant for Data Quality Tools was published. You may have a free look thru some of the vendor’s sites. For example SAP has a link here.

I’m not going into who are leaders, visionaries, challengers or niche players. I’m a bit puzzled about who is in there at all.

We may look at two UK based vendors:

  • Datactics has a good position among the niche players
  • Experian QAS is not in the quadrant, but is mentioned among the vendors not meeting the inclusion criteria

If you look up Datactics on LinkedIn there are 14 employees there. If you look up Experian QAS UK on LinkedIn there are 369 employees there (and QAS has subsidiaries around the world too). This balance of strength resembles what I know from business directories.

Now, the inclusion criteria set up by Gartner may make a lot of sense, but I find it strange that it so obviously fails to reflect market reality.

Please find more information about how another analyst includes players (compared to Gartner) in the post The Data Quality Tool Vendor Difference.

Bookmark and Share

Hot and Magic Medal Counting

In the ongoing Olympic Games one often displayed list is the list of medals per nation.

The list reminds me about the occasional analyst report ranking of Data Quality tools and Master Data Management (MDM) solutions. The latest one is fresh pressed as told in the post called Product Information Management is HOT for Business by Ventana Research, where the PIM vendors are ranked with Stibo Systems being the most HOT.

The counting of medals in the Olympic Games in London this afternoon looks like this:

As expected the top race is between the big teams from United States and China just as the mega vendors of tools also always receives good rankings by analysts though with a few exceptions as reported in the post The Data Quality Tool Vendor Difference, where the Gartner MAGIC Quadrant is compared with the ranking from Information Difference.

As often seen the home team, Great Britain and Northern Ireland, is also doing very well. With tools we also see that the Most Times the Home Team Wins despite of analyst ranking when a local client selects a tool.

Other big teams as Russia, Japan and Australia are currently struggling to get more gold medals to climb the list if ranked by gold (instead of total number of medals). Perhaps we will see a closer race with more teams in the last week just as expected with MDM tools as reported in the post Photo Finish in MDM Vendor Race.

The smaller nations often does it better in a small range of disciplines, like Ethiopia in running and Denmark in rowing and sailing resembling the situation described in the post Who is not Using Data Quality MAGIC, as there are plenty of Data Quality tools out there very feasible in certain tasks and local circumstances.

Bookmark and Share

Photo Finish in MDM Vendor Race

With the London Olympics going on we will probably see a lot of winners after a photo finish.

I noticed another photo finish in a recent analyst report called The MDM Landscape Q2 2012 by the Information Difference.

The MDM (Master Data Management) vendors are scored by technology and market strength. If we look at the technology axis – the vertical one, there is a close race.

Orchestra shared the victory on twitter:

Kalido was also mentioned on twitter:

The linked press release from Kalido has a subtitle telling that Kalido was in front of the megavendors.

As mentioned in the report the vendors are actually not competing in the exact same discipline. Some vendors MDM offerings are part of a larger suite, some vendors focus on a single domain (like product) or industry and some vendors are generalists embracing multi-domain MDM.

This situation is also why another analyst firm, Gartner, have two magic quadrants for MDM vendors: One for customer MDM and one for product MDM.

However the trend is that more and more vendors are going towards multi-domain MDM. I know that for sure as I have been involved in one of the product MDM specialists journeys within multi-domain MDM.

So we could expect an even closer match in the Multi-Domain MDM race in the years to come.

Bookmark and Share

The Data Quality Tool Vendor Difference

How do analysts look at the data quality tool vendor market? As with everything data quality there are differences and apparently no single source of truth.

Gartner has its magic quadrant. They sell it for money, but usually you are able to get a free copy from the leading vendors.

The Information Difference has its DQ Landscape in the cloud for free.

It is interesting to compare which vendors are included in the latest main pictures, as I have tried below:

The number of x’s is a rough measure of the ability to execute / market strength.

Three smaller vendors are considered by Gartner, but not by The Information Difference and vice versa. Two midsize vendors are included by The Information Difference, but not by Gartner. Experian QAS are included as a big one by The Information Difference, but did not (yet) meet the inclusion criteria used by Gartner.

Bookmark and Share

Search and if you are lucky you will find

This morning I was following the tweet stream from the ongoing Gartner Master Data Management (MDM) conference here in London, when another tweet caught my eyes:

This reminded me about that (error tolerant) search is The Overlooked MDM Feature.

Good search functionality is essential for making the most out of your well managed master data.

Search functionality may be implemented in these main scenarios:

Inside Search

You should be able to quickly find what is inside your master data hub.

The business benefits from having fast error tolerant search as a capacity inside your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

MDM inside search capabilities applies to multiple domains: Party, product and location master data.

Search the outside

You should be able to quickly find what you need to bring inside your master data hub.

Data entry may improve a lot by having fast error tolerant search that explores the cloud for relevant data related to the entry being done. Doing that has two main purposes:

  • Data entry becomes more effective with less cumbersome investigation and fewer keystrokes.
  • Data quality is safeguarded by better real world alignment.

Preferably the inside and the outside search should be the same mash-up.

Searching the outside is applies especially to location and party master data.

Search from the outside

Website search applies especially to product master data and in some cases also to related location master data as described in the post Product Placement.

Your website users should be able to quickly find what you publish from your master data hub be that description of physical products, services or research documents as in the case of Gartner, which is an analyst firm.

As said in the tweet on the top of this post, (good) search makes the life of your coming and current customers much easier. Do I need to emphasize the importance of good customer experience?

Bookmark and Share

What to do in 2012

The time between Christmas and New Year is a good time to think about if you are going to do the right things next year. In doing so, you will have to look back at the current year and see how you can develop from there.

In my professional life as a data quality and master data management practitioner my 2011 to do list included these three main activities:

  • Working with Multi-Domain Master Data Quality
  • Exploiting rich external reference data sources in the cloud
  • Doing downstream data cleansing

In a press release from May 2011 Gartner (the analyst firm) Highlights Three Trends That Will Shape the Master Data Management Market. These are:

  • Growing Demand for Multidomain MDM Software
  • Rising Adoption of MDM in the Cloud
  • Increasing Links Between MDM and Social Networks

It looks like I was working in the right space for the first two things but stayed in the past regarding the third activity being downstream data cleansing.

The third thing to embrace in the future, social MDM we may call it, has been an area of interest for me the last couple of years and actually some downstream data cleansing projects has touched making master data useful for including social media networks in the loop.  

I’m not sure if 2012 will be a breakthrough for social MDM, but I think there will be some exciting opportunities out there for paving the road for social MDM.

Bookmark and Share

Who Is Not Using Data Quality Magic?

The other day the latest Gartner Magic Quadrant for Data Quality Tools was released.

If you are interested in knowing what it says, it’s normally possible to download a copy from the leading vendors’ website.

Among the information in the paper you will find some estimated numbers of customers who has purchased the tools from the vendors included in the quadrant.

If you sum up these numbers, then it is estimated that 16,540 organizations worldwide is a customer at an included vendor.

So, if I matched that compiled customer list with the Dun & Bradstreet WorldBase holding at least 100 million active business entities worldwide, I will have a group of at least 99,983,460 companies who is not using magical data quality tools.

And that is probably falsely excluding that there are customers who has more than one vendor.

Anyway, what do all the others do then?

Well, of course the overwhelming number of companies will be too small to have any chance of investing in a data quality tool from a vendor that made it to the quadrant.

The quadrant also list a range of other vendors of data quality tools typically operating locally around the world. These vendors also have customers and probably more customers in numbers but not at the size of the companies who chooses a vendor in the quadrant.   

A lot of data quality technology is also used by service providers who either use a tool from a data quality tool vendor or has made a homegrown solution. So a lot of companies benefit from such services when processing large number of data records to be standardized, deduplicated and enriched.

Then we must not forget that technology doesn’t solve all your data quality issues as stated by the founder of DataQualityPro Dylan Jones in a recent post on a data quality forum operated by the (according to Gartner) leading data quality tool vendor. The post is called Finding the Passion for Data Quality.

My take is that it’s totally true that data quality tools doesn’t solve most of your data quality issues, but those issues addressed, typically data profiling and data matching, are hard to solve without a tool. So there is still a huge market out there currently covered by the true leader in the data quality market: Laissez-Faire.

Bookmark and Share

Big Master Data

Right now I am overseeing the processing of yet a master data file with millions of records. In this case it is product master data also with customer master data kind of attributes, as we are working with a big pile of author names and related book titles.

The Big Buzz

Having such high numbers of master data records isn’t new at all and compared to the size of data collections we usually are talking about when using the trendy buzzword BigData, it’s nothing.

Data collections that qualify as big will usually be files with transactions.

However master data collections are increasing in volume and most transactions have keys referencing descriptions of the master entities involved in the transactions.

The growth of master data collections are also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to near 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Extreme Master Data

Gartner, the analyst firm, has a concept called “extreme data” that rightly points out, that it is not only about volume this “big data” thing; it is also about velocity and variety.

This is certainly true also for master data management (MDM) challenges.

Master data are exchanged between organizations more and more often in higher and higher volumes. Data quality focuses and maturity may probably not be the same within the exchanging parties. The velocity and volume makes it hard to rely on people centric solutions in these situations.

Add to that increasing variety in master data. The variety may be international variety as the world gets smaller and we have collections of master data embracing many languages and cultures. We also add more and more attributes each day as for example governments are releasing more data along with the open data trend and we generally include more and more attributes in order to make better and more informed decisions.

Variety is also an aspect of Multi-Domain MDM, a subject that according to Gartner (the analyst firm once again) is one of the Three Trends That Will Shape the Master Data Management Market.

Bookmark and Share

More Social Master Data Management

Yesterday my American cyberspace friend Jim Harris was so kind to send an invitation for Google+ – the new social network service you must hook into. Thanks Jim, now I had to fill in yet a profile, upload the same picture as always and start networking from scratch once again 🙂

As many people I have several profiles in different social network services as Twitter, Facebook and LinkedIn. As I’m doing business also with German speaking countries I also use XING as alternative to LinkedIn as told in the post LinkedIn and the other Thing.

In a comment to that post my Austria based French connection Olivier Mathurin noted: “Disconnected duplicated siloed professional profiles, mmm…”

In a post on this blog called Social Master Data Management made one year ago it is discussed how social CRM will add new sources from social networks to the external reference data sources we already know from old time CRM.

With all the different faces everyone are wearing in the social media realm this isn’t going to be easy and one may consider if social master data management is a wrong path giving the individual nature and built-in privacy in social networking services.    

Well, Gartner (the analyst firm) says that increasing links between MDM and social networks is one of the Three Trends That Will Shape the Master Data Management Market.

So, acknowledging that Gartner predictions are self-fulfilling, you better get moving into LinkedIn, Xing, Viadeo, Twitter, Facebook, (forget MySpace), Google+  and what’s next.

Bookmark and Share

What is Identity Resolution?

We are continuously struggling with defining what it is we are doing like defining: What is data quality? What is Master Data? Lately I’ve been involved in discussions around: What is Identity Resolution? A current discussion on this topic is rolling in the Data Matching LinkedIn group.

This discussion has roots in one of my blog posts called Entity Revolution vs Entity Evolution. Jeffrey Huth of IBM Initiate followed up with the post Entity Resolution & MDM: Interchangeable? In January Phillip Howard of Bloor made a post called There’s identity resolution and then there’s identity resolution (followed up by a correction post the other day called My bad).

It is a “same same but different” discussion. Traditional data matching (or record linkage) as seen in a data quality tool and master data management solution is the bright view: Being about finding duplicates and making a “single business partner view” (or “single party view” or “single customer view”). Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.

The Gartner Hype Cycle describes the dark view as ”Entity Resolution and Analysis”. This discipline is approaching the expectation peak and will, according to Gartner, be absorbed by other disciplines as no one can tell the difference I guess.

Certainly there are poles. In an article from 2006 called Identity Resolution and Data Integration David Loshin said: There is a big difference between trying to determine if the same person is being mailed two catalogs instead of one and determining if the individual boarding the plane is on the terrorist list.

But there is also a grey zone.

From a business perspective for example the prevention of misuse of a restricted campaign offer is a bit of both sides. Here you want to avoid that an existing customer is using an offer only meant for new customers. How does that apply to members of the same household or the same company family tree? Or you want to avoid someone using an introduction offer twice by typing her name and address a bit different.

From a technical perspective I have an example from working with a newspaper in a big fraud scam described in the post Big Time ROI in Identity Resolution. Here I had no trouble using a traditional deduplication tool in discovering non-obvious relationships. Also the relationships discovered in traditional data matching ends up quite nicely in hierarchy management as part of master data management as described in the post Fuzzy Hierarchy Management.

And then there is the use of the words identity (resolution) versus entity (resolution).

My feeling is that we could use identity resolution for describing all kind of matching and linking with party master data and entity resolution could be used for describing all kind of matching and linking with all master data entity types as seen in multi-domain master data management. But that’s just my words.

Bookmark and Share