The Matrix

The data governance discipline, the Master Data Management (MDM) discipline and the data quality discipline are closely related and happens to be my fields of work as told in the post Data Governance, Data Quality and MDM.

Every IT enabled discipline has an element of understanding people, orchestrating business processes and using technology. The mix may vary between disciplines. This is also true for the three above-mentioned disciplines.

But how important is people, process and technology within these three disciplines? Are the disciplines very different in that perspective? I think so.

When assigning a value from 1 (less important) to 5 (very important) for Data Governance (DG), Master Data Management (MDM) and Data Quality (DQ) I came to this result:

The Matrix

A few words about the reasoning for the highs and lows:

Data governance is in my experience a lot about understanding people and less about using technology as told in the post Data Governance Tools: The New Snake Oil?

I often see arguments about that data quality is all about people too. But:

  • I think you are really talking about data governance when putting the people argument forward in the quest for achieving adequate data quality.
  • I see little room for having the personal opinion of different people dictating what adequate data quality is. This should really be as objective as possible.

Now I am ready for your relentless criticism.

Bookmark and Share

The Multi-Domain Data Quality Tool Magic Quadrant 2014 is out

Gartner, the analyst firm, has a different view of the data quality tool market than of the Master Data Management (MDM) market. The MDM market has two qudrants (customer MDM and product MDM) as reported in the post The Second part of the Multi-Domain MDM Magic Quadrant is out. There is only one quadrant for data quality tools.

Well, actually it is difficult to see a quadrant for product data quality tools. Most data quality tools revolves around the customer (or rather party) domain, with data matching and postal address verification as main features.

For the party domain it makes sense to have these capabilities deployed outside the MDM solution in some cases as examined in the post The place for Data Matching in and around MDM. And of course data quality tools are used in heaps of organizations who hasn’t a MDM solution.

For the product domain it is hard to see a separate data quality tool if you have a Product Information Management (PIM) / Product MDM solution. Well, maybe if you are an Informatica fan. Here you may end up with a same branded PIM (Heiler), Product MDM (Siperian) and data quality tool (SSA Name3) environment as a consequence of the matters discussed in the post PIM,  Product MDM and Multi-Domain MDM.

What should a data quality tool do in the product domain then? Address verification would be exotic (and ultimately belongs to the location domain). Data matching is a use case, but not usually something that eliminates main pain points with product data.

Some issues that have been touched on this blog are:

Anyway the first vendor tweets about the data quality tools quadrant 2014 is turning up, and I guess some of the vendors will share the report for free soon.

Magic Quadrant for Data Quality Tools 2014

Update 3rd December: I received 3 emails from Trillium Software today with a link to the report here.

Bookmark and Share

Cleansing International Addresses

A problem in data cleansing I have come across several times is when you have some name and address registrations where it is uncertain to which country the different addresses belong.

Many address-cleansing tools and services requires a country code as the first parameter in order to utilize external reference data for address cleansing and verification. Most business cases for address cleansing is indeed about a large number of business-to-consumer (B2C) addresses within a particular country. But sometimes you have a batch of typical business-to-business (B2B) addresses with no clear country registration.

The problem is that many location names applies to many different places. That is true within a given country – which was the main driver for having postal codes around. If a none-interactive tool or service have to look for a location all over the world that gets really difficult.

For example I’m in Richmond today. That could actually be a lot of places all over the world as seen on Wikipedia.

popeI am actually in the Richmond in the London, England, UK area. If I were in the state capital of the US state of Virginia, I could have written I’m in “Richmond, VA”. If an international address-cleansing tool looked at that address, I guess it would first look for a country code, quickly find VA as a two-character country code in the end of the string and firmly conclude I’m at something called Richmond in the Vatican City State.

Have you tried using or constructing an international address cleansing process? Where did you end up?

Bookmark and Share

Completeness is still bad, while uniqueness is improving

In a recent report called The State of Marketing Data prepared by Netprospex over 60 million B2B records were analyzed in order to assess the quality of the data measured as fitness for use related to marketing purposes.

An interesting find was that out of a score of maximum 5.0 duplication, the dark side of uniqueness, was given the average score 4.2 while completeness was given the average score 2.7.

The STaTe of MarkeTing DaTa

This corresponds well with my experience. We have in the data quality realm worked very hard with deduplication tools using data matching approaches over the years and results are showing up. We are certainly not there yet, but it seems that completeness, and in my experience also accuracy, are data quality dimensions currently suffering more.

In my eyes the remedy for improvement in completeness and accuracy goes hand in hand with even better uniqueness. It is about getting the basic data right the first time as described in the post instant Single Customer View and being able to keep up completeness and accuracy as told in the post External Events, MDM and Data Stewardship.

Bookmark and Share

Data governance tools: The new snake oil?

Traditionally data governance has been around the people and process side of data management. However we now see tools marketed as data governance tools either as a pure play tool for data governance or as a part of a wider data management suite as told in the post Who needs a data governance tool?

Snake-oilThe post refers to a report by Sunil Soares. In this report data governance tools are seen as tools related to six areas within enterprise data management: Data discovery, data quality, business glossary, metadata, information policy management and reference data management.

While IBM have tools for everything, according to the report it does not seem like a single tool cures it all – yet.

But will we go there? If we need tools at all, do we need an all-cure snake oil tool for data governance? Or will we be better off with different lubricants for data discovery, data quality, business glossary, metadata, information policy management and reference data management?

Bookmark and Share

Unique Data = Big Money

In a recent tweet Ted Friedman of Gartner (the analyst firm) said:

ted on reference data

I think he is right.

Duplicates has always been pain number one in most places when it comes to the cost of poor data quality.

Though I have been in the data matching business for many years and been fighting duplicates with dedupliaction tools in numerous battles the war doesn’t seem to be won by using deduplication tools alone as told in the post Somehow Deduplication Won’t Stick.

Eventually deduplication always comes down to entity resolution when you have to decide which results are true positives, which results are useless false positives and wonder how many false negatives you didn’t catch, which means how much money you didn’t have in return of your deduplication investment.

Bringing in new and be that obscure reference sources is in my eyes a very good idea as examined in the post The Good, Better and Best Way of Avoiding Duplicates.

Bookmark and Share

Tsundoku

tsundokuThere is a Japanese word called tsundoku. There is no equivalent English word, but in 6 words it means “buying books and not reading them”.

I guess tsundoku could have an eTsundoku variant describing buying software tools and not using them and that could also include data quality tools as told in the post The Worst Best Sale.

My own example isn’t the only one I’m sure. What may be the reasons for buying data quality tools, but not using them? A few suggestions:

  • Organizational changes after ordering (as in my example)
  • Focus has changed before receiving the delivery
  • The tool was never meant to be used as the buy was merely a sign of showing interest in data quality
  • The data quality tool came free (or hidden) as a part of a larger software suite
  • Data quality tools doesn’t solve anything anyway (not my favorite though, as told in the post The Role of Technology in Data Quality Management)

More suggestions?

Bookmark and Share

The Role of Technology in Data Quality Management

A recent article called Data’s Credibility Problem by Thomas Redman on Harvard Business Review has rightfully got a lot of mentions in the data quality community on social media including Twitter.

I agree with many things in the article except I have to question the credibility of this saying:

The solution is not better technology: It’s better communication between the creators of data and the data users”.

There is a lot a truth in this saying. But it is in my eyes not valid.

If the human race had relied solely on communication we would still discuss if a wheel should have the shape of a square or a circle. There is a balance between fruitful communication and throwing technology at problems and you may emphasize on one side or the other depending on if you sell data quality consultancy or data quality tools.

I would say:

“The solution is better communication between the creators of data, the data users and the innovators of data quality technology”.

Now, how do I best spread this message….

papertweet

Bookmark and Share

Getting eMail Addresses Right the First Time

emailChecking if an eMail address will bounce is essential for executing and measuring campaigns, news letter operations and other activities based on sending eMails as explained here on the site Don’t Bounce by BriteVerify.

A good principle within data quality prevention and Master Data Management (MDM) is the first time right approach. There is a 1-10-100 rule saying:

“One dollar spent on prevention will save 10 dollars on correction and 100 dollar on failure costs”.

(Replace dollars with your favorite currency: Euros, pounds, rubles, rupees, whatever.)

This also applies to capturing an eMail address of a (prospect) customer and other business partners. Many business processes today requires communication through eMails in order to save costs and speed up processes. If you register an invalid eMail address or allow self registration of an invalid eMail address you have got yourself some costly scrap and rework or maybe lost an opportunity.

As a natural consequence the instant Data Quality MDM Edition besides ensuring right names and correct postal addresses also checks for valid eMail addresses.

Bookmark and Share

What Should a Data Quality Tool Do?

Earlier this month we had this year’s magic quadrant for data quality tools from Gartner (the analyst firm). The magic quadrant always stirs up posts about data quality tools and this is true again this year. For example yours truly had a post here and Lorraine Lawson had a say on the ITBusinessEdge in the post Eight Questions to Ask Before Investing in Data Quality Tools.

Some of these questions asked by Lorraine relates to a grounding principle in the magic quadrant that is, that the data quality tool should be able to do everything data quality and even, as stated in Lorraine’s question 2: Can it be embedded into business process workflows or other technology-enabled programs or initiatives, such as MDM and analytics?

The LEGO StoryThinking that question  to the end inevitably makes you think about where data quality tools ends and where applications for different business processes, with data quality built in, takes over?

That question is close to me as I’m right now working with a tool for maintaining party master data with two main advantages:

  • Making the business process as smooth as possible
  • Ensuring data quality at pre data entry and all through the data lifetime

So, it’s not a true data quality tool. It doesn’t do everything data quality. It’s not a true MDM platform. It doesn’t do everything master data. But I would say that it does do what it does better than the full monty behemoths.

Bookmark and Share