March 2011 – Page 2 – Liliendahl on Data Quality

Foreign Affairs

11th March 201111th August 2011Henrik Gabs Liliendahl2 Comments

There is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.

The same phenomenon is often seen in data management.

I mentioned United States centricity as a minor criticism in my recent book review about the excellent book “Master Data Management and Data Governance”.

An example from the book is this statement:

“It is important to differentiate between U.S. domestic addresses and international addresses. This distinction is important for U.S.-centric MDM solutions because U.S. domestic addresses are normally better defined and therefore can be processed in a more automatic fashion, while international addresses require more manual intervention.”

The same fact could be expressed by saying:

“It is important to differentiate between Danish domestic addresses and international addresses. This distinction is important for Danish-centric MDM solutions because Danish domestic addresses are normally better defined and therefore can be processed in a more automatic fashion, while international addresses require more manual intervention.”

Only, the better formatted address in the first case is the messy address in the last case, and the better formatted address in the last case is the messy address in the first case.

If your MDM scope is country-centric it is sensible to concentrate on automation related to that country.

If your MDM scope is international there are two options:

The easy way: The one size fits all option. This is a moderate investment, but also, it only yields moderate results in terms of automation and data quality.
The hard way: You have to implement specialized automation and investigate best external reference data for each country. I made a Danish-centric post on that last year here.

Book Review: Berson and Dubov on MDM

10th March 201111th March 2011Henrik Gabs Liliendahl3 Comments

A few days ago Julian Schwarzenbach over at the Data and Process Advantage Blog published a review of the book “Master Data Management and Data Governance” by Alex Berson and Larry Dubov. Link to Julian’s review here.

And hey, that’s the book I have been reading too during the last months. So why not make my review too.

I agree very much with Julian’s positive review of the book. It is a very comprehensive book – and thick and heavy I have learned from bringing it with me on travel which is where I usually read offline stuff. But master data management and related data governance is a big and heavy discipline with a lot of details that has to be dealt with.

Probably I have annoyed fellow travellers in trains and airplanes while reading the book with exclamations as: Yes, precisely, that’s what I always have said, good point and so on. Because I agree very much with many of the issues described and the solutions discussed in the book.

For the mandatory bit of criticism that must be included in every book review I will bring on my pet bashing about United States and English language centricity. Well, it’s actually not that bad, as the book at many places does indicate that other angles and pains exist than those being prominent in the United States and with the English language.

Oh, and I bear with that my surname in the references are spelled “Sorensen” instead of “Sørensen” and that a related date are formatted like “11/22/2009” which will be the 11^th day in the 22^nd month of the year 2009 to me.

No Privacy Customer Onboarding

9th March 20117th September 2011Henrik Gabs Liliendahl12 Comments

This post is a follow up on today’s #DataKnightsJam happening on twitter. Today’s subject was data quality and data privacy.

Diversity in data quality is a subject discussed a lot of times on this blog.

So I want to share a real life example of a good upstream get it right first time data sharing approach that might compromise privacy thresholds in other places.

The image to the right is the data entry form from a Swedish webshop used for customer self-registration. The main flow is that:

You type your national ID (personnummer in Swedish)
You press the following button
The system fetches your name and address data from the public citizen hub
The webshop gets an accurate, complete single customer view

The webshop www.jula.se sells tools for home improvement.

What is Identity Resolution?

8th March 201129th May 2012Henrik Gabs Liliendahl6 Comments

We are continuously struggling with defining what it is we are doing like defining: What is data quality? What is Master Data? Lately I’ve been involved in discussions around: What is Identity Resolution? A current discussion on this topic is rolling in the Data Matching LinkedIn group.

This discussion has roots in one of my blog posts called Entity Revolution vs Entity Evolution. Jeffrey Huth of IBM Initiate followed up with the post Entity Resolution & MDM: Interchangeable? In January Phillip Howard of Bloor made a post called There’s identity resolution and then there’s identity resolution (followed up by a correction post the other day called My bad).

It is a “same same but different” discussion. Traditional data matching (or record linkage) as seen in a data quality tool and master data management solution is the bright view: Being about finding duplicates and making a “single business partner view” (or “single party view” or “single customer view”). Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.

The Gartner Hype Cycle describes the dark view as ”Entity Resolution and Analysis”. This discipline is approaching the expectation peak and will, according to Gartner, be absorbed by other disciplines as no one can tell the difference I guess.

Certainly there are poles. In an article from 2006 called Identity Resolution and Data Integration David Loshin said: There is a big difference between trying to determine if the same person is being mailed two catalogs instead of one and determining if the individual boarding the plane is on the terrorist list.

But there is also a grey zone.

From a business perspective for example the prevention of misuse of a restricted campaign offer is a bit of both sides. Here you want to avoid that an existing customer is using an offer only meant for new customers. How does that apply to members of the same household or the same company family tree? Or you want to avoid someone using an introduction offer twice by typing her name and address a bit different.

From a technical perspective I have an example from working with a newspaper in a big fraud scam described in the post Big Time ROI in Identity Resolution. Here I had no trouble using a traditional deduplication tool in discovering non-obvious relationships. Also the relationships discovered in traditional data matching ends up quite nicely in hierarchy management as part of master data management as described in the post Fuzzy Hierarchy Management.

And then there is the use of the words identity (resolution) versus entity (resolution).

My feeling is that we could use identity resolution for describing all kind of matching and linking with party master data and entity resolution could be used for describing all kind of matching and linking with all master data entity types as seen in multi-domain master data management. But that’s just my words.

Multi-Commerce Data Quality

5th March 20117th March 2011Henrik Gabs Liliendahl2 Comments

A month ago I wrote about Multi-Channel Data Quality. Multi-Commerce and the related data quality is pretty much another term covering the same challenges which is that despite we today talk a lot about eCommerce, being doing business online, we still have a lot of business going on offline. So we have challenges with online data quality, offline data quality and not at least a single view of online/offline data quality.

According to the Gartner Hype Cycle there is such a thing as Multicommerce Master Data Management. This discipline has just passed the expectation peak but will, according to Gartner, be absorbed by Multidomain Master Data Management on the descent before climbing up again towards enlightenment and productivity.

As data quality and master data management are best friends I find it very likely that Multi-Commerce Data Quality will be all about Multi-Domain Master Data Management, including:

Having a single business partner view (that includes single customer view) encompassing all online and offline activities
Having a unified way of maintaining and exposing product data online and offline
Having the means for doing content management (that includes unstructured data) embracing online presentation as well as offline distribution.

I also see Multi-Domain Master Data Management as not only doing master data management for several data domains at the same time (with the same software brand), but also exploring the intersections between the different domains.

If you for example look at a customer/product matrix you may add a third dimension being a channel where we examine the relations between a customer type, a product type/attribute and a given channel, thus having a 3D picture of doing business in a multi-commerce environment.

If you are interested in Multi-Domain Master Data Management including how Multi-Commerce Master Data Management and related data quality are developing right now, then please join the LinkedIn group for Multi-Domain MDM by clicking on the puzzle.

Fuzzy Hierarchy Management

2nd March 2011Henrik Gabs Liliendahl6 Comments

When evaluating results from automated data matching your goal is typically to find false positives and false negatives being entities that are matched, but shouldn’t be (false positives) and entities that are not matched, but should have been (false negatives).

However the fuzziness often used in the data matching process also apply to the evaluation of the results as many dubious results isn’t a question about if the matched database rows are reflecting the same real world entity but more a question about if the matched (or not matched) database rows are reflecting different members of a real world hierarchy.

Example 1:

John Smith on 1 Main Street in Anytown

Mary & John Smith on 1 Main Str in Anytown

Example 2:

Anytown Municipality, Technical Dept

Municipality of Anytown

Example 3:

Acme Corporation, Anytown

Acme Corporation, Anywhere

All three examples above may be considered a false positive if matched and a false negative if not matched.

You may say that it depends on the purpose of use, which is true.

But if we are talking master data management we may probably encompass multiple requirements where we simultaneously need the match and don’t want the match, which is why we need to be able to resolve and store the results from fuzzy data matching into hierarchies.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Month: March 2011