DataQualityPro – Page 2 – Liliendahl on Data Quality

Matching Down Under

17th December 2010Henrik Gabs Liliendahl4 Comments

As a data matching geek I always love reading about how others have made the great but fearful journey into the data matching world.

This week Wayne Colless of the Australian Attorney-General’s Department kindly made a document about data matching public on the DataQualityPro site. The full title is “Improving the Integrity of Identity Data – Data Matching Better Practice Guidelines, 2009”. Link here.

As Wayne explains in a discussion in the LinkedIn Data Matching group: Australia has no national unique identifier for individuals (such as the US SSN or the number recorded on national ID cards used in many other countries) that can be used, so the matching has to involve only non-unique values such as name, address and dates of birth.

The document gives a very thorough step by step guidance into matching individual’s names, addresses and birthdays. As the document says you may either build all the logic yourself or you may buy commercial software that does the same. But anyway you have to understand what the software does in order to tune the processes and set the thresholds meaningful to you.

As Australia is a nation mainly born through immigration the challenges with adapting the ruling Anglo-Saxon naming conventions to the reality of name formats coming from all over the world is very apparent. I like that the diversity issues is given a good thought in the document.

I also like that the document addresses a subject not mentioned as often as it should be, namely the challenges with embracing historical values in settling a match as seen in this figure taken from the document:

Whether you think you already know the dos and don’ts in data matching (and I guess you never know that) I really find the document worth reading.

My Secret

18th September 201018th September 2010Henrik Gabs Liliendahl

Yesterday I followed a webinar on DataQualityPro with ECCMA ISO 8000 project leader Peter Benson.

Peter had a lot of good sayings and fortunately Jim Harris as a result of his live tweeting has documented a sample of good quotes here.

My favorite:

“Quality data does NOT guarantee quality information, but quality information is impossible without quality data.”

I have personally conducted an experiment that supports that hypothesis. It goes as this:

First, I found a data file on my computer. Lots of data in there being numbers and letters. And sure, what is interesting is the information I can derive for different purposes.

Then I deleted the data file and tried to see how much information was left behind.

Guess what? Not a bit.

I first published that experiment as a comment to one of Jim’s blog posts: Data Quality and the Cupertino Effect.

As documented in the comments on this blog post the subject of data (quality) versus information (quality) is ever recurring and almost always guarantees a fierce discussion among data/information management professionals.

So, I’ll just tell you this secret: My work in achieving quality information is done by fixing data quality.

And guess what? I have disabled comments on this blog post.

Bad word?: Data Owner

1st March 201021st June 2010Henrik Gabs Liliendahl25 Comments

When reading a recent excellent blog post called “How to Assign a Data Owner” by Rayk Fenske I once again came to think about how I dislike the word owner in “Data Owner” and “Data Ownership”.

I am not alone. Recently Milan Kucera expressed the same feelings on DataQualityPro. I also remember that Paul Woodward from British Airways on MDM Summit Europe 2009 said: Data is owned by the entire company – not any individuals.

My thoughts are:

Owner is a good word where we strive for fit for a single purpose of use in one silo
Owner may be a word of choice where we strive for fit for single purposes of use in several silos
Owner is a bad word where we strive for fit for multiple purposes of use in several silos

Well, I of course don’t expect all the issues raised by Rayk will disappear if we are able to find a better term than “Data Owner”.

Nevertheless I will welcome better suggestions for coining what is really meant with “Data Ownership”.

Master Data Survivorship

28th October 20092nd July 2010Henrik Gabs Liliendahl1 Comment

A Master Data initiative is often described as making a “golden view” of all Master Data records held by an organization in various databases used by different applications serving a range of business units.

In doing that (either in the initial consolidation or the ongoing insertion and update) you will time and again encounter situations where two versions of the same element must be merged into one version of the truth.

In some MDM hub styles the decision is to be taken at consolidation time, in other styles the decision is prolonged until the data (links) is consumed in a given context.

In the following I will talk about Party Master Data being the most common entity in Master Data initiatives.

This spring Jim Harris made a brilliant series of articles on DataQualityPro on the subject of identifying duplicate customers ending with part number 5 dealing with survivorship. Here Jim describes all the basic considerations on how some data elements survives a merge/purge and others will be forgotten and gives good examples with US consumer/citizens.

Taking it from there Master Data projects may have the following additional challenges and opportunities:

Global Data adds diversity into the rule set of consolidation data on record level as well as field level. You will have to comprise on simple global rules versus complex optimized rules (and supporting knowledge data) for each country/culture.
Multiple types of Party Master Data must be handled when Business Partners includes business entities having departments and employees and not at least when they are present together with consumers/citizens.
External Reference Data is becoming more and more common as part of MDM solutions adding valid, accurate and complete information about Business Partners. Here you have to set rules (on field level) of whether they override internal data, fills in the blanks or only supplements internal data.
Hierarchy building is closely related to survivorship. Rules may be set for whether two entities goes into two hierarchies with surviving parts from both or merges as one with survivorship. Even an original entity may be split into two hierarchies with surviving parts.

What is essential in survivorship is not loosing any valuable information while not creating information redundancy.

An example of complex survivorship processing may be this:

A membership database holds the following record (Name, Address, City):

Margaret & John Smith, 1 Main Street, Anytown

An eShop system has the following accounts (Name, Address, Place):

Mrs Margaret Smith, 1 Main Str, Anytown
Peggy Smith, 1 Main Street, Anytown
Local Charity c/o Margaret Smith, 1 Main Str, Anytown

A complex process of consolidation including survivorship may take place. As part of this example the company Local Charity is matched with an external source telling it has a new name being Anytown Angels. The result may be this “golden view”:

ADDRESS in Anytown on Main Street no 1 having
• HOUSEHOLD having
– CONSUMER Mrs. Margaret Smith aka Peggy
– CONSUMER Mr. John Smith
• BUSINESS Anytown Angels having
– EMPLOYEE Mrs. Margaret Smith aka Peggy

Observe that everything survives in a global applicable structure in a fit hierarchy reflecting local rules handling multiple types of party entities using external reference data.

But OK, we didn’t have funny names, dirt, misplaced data…..

Guerrilla Data Quality

23rd October 200923rd October 2009Henrik Gabs Liliendahl7 Comments

Oh yes, in my crazy berserkergang of presenting stupid buzzword suggestions it’s time for “Guerrilla Data Quality”. And this time there is no previous hits on google to point at as the original source.

But I noticed that “Guerrilla Data Governance” is in use and as Data Governance and Data Quality are closely related disciplines, I think there could be something being “Guerrilla Data Quality”.

Also recently an article called “How to set data quality goals any business can achieve” was published by Dylan Jones on DataQualityPro. Here the need for setting short term realistic goals is emphasised in contrast to making a full size enterprise wide all domain massive initiative. This article sets focus on the people and process side of what may be “Guerrilla Data Quality”.

Recently I wrote a blog post called “Driving Data Quality in 2 Lanes” focussing on the tool selection for what may be “Guerrilla Data Quality” and the enterprise wide follow up.

Actually I guess most Data Quality activity going on is in fact “Guerrilla Data Quality”. The problem then is that most literature and teaching on Data Quality is aimed at the massive enterprise wide implementations.

Any thoughts?

Follow Friday Master Data Hub

31st July 200924th July 2010Henrik Gabs Liliendahl2 Comments

Social Networking needs Master Data Management.

A recurring event every Friday on Twitter is the #FollowFriday with the acronym #FF, where people on Twitter tweets about who to follow.

I do it too and as every one else sometimes I perhaps forget someone, and then (s)he gets angry and don’t #FF me and that’s bad. Bad Data Management. Bad #mdm.

So now I have started building a Master Data Hub fit for the purpose of doing consistent #FF. I do see other purposes for this as well as I recognize the advantages of combining data sources, so I did a #datamatching with LinkedIn connections to improve #dataquality through Identity Resolution.

This is as far I am now (very convenient that WordPress lets me edit my blog posts):

@ReferenceData where http://www.linkedin.com/pub/carla-mangado/11/467/239 is Staff Writer

@KenOConnorData is http://www.linkedin.com/in/kenoconnor00

@ocdqblog is a blog where http://www.linkedin.com/in/jimharris is blogger-in-chief

@dataqualitypro is a community founded by http://www.linkedin.com/in/dylanjones

Dylan was a @Datanomic partner where @SteveTuck is http://www.linkedin.com/in/stevetuck

@InitiateSystems has a CTO = @wmmarty who is http://www.linkedin.com/pub/marty-moseley/0/57/43b

@VishAgashe is http://www.linkedin.com/in/vishagashe

@KeithMesser is http://www.linkedin.com/in/keithmesser running @GlobalMktgPros

@fionamacd is at @TrilliumSW as seen here http://www.linkedin.com/in/fionamacd

So is @stevesarsfield being http://www.linkedin.com/pub/steve-sarsfield/2/675/47a

Trillium is owned by Harte-Hanks where @MarkGoloboy also was http://www.linkedin.com/in/markgoloboy

@biknowledgebase is operated by http://www.linkedin.com/in/barryharmsen

@Dataexperts has a managing director who is http://www.linkedin.com/pub/gary-holland/1/101/135

@IDResolution (Infoglide) has several Data Matching members in http://www.linkedin.com/groups?gid=2107798 including http://www.linkedin.com/in/dougwood

@rdrijsen is http://www.linkedin.com/in/rdrijsen with possible duplicate http://www.linkedin.com/pub/resa-drijsen/1/389/58

@grahamrhind is http://www.linkedin.com/in/grahamrhind

@omathurin is http://www.linkedin.com/in/oliviermathurin

@zzubbuzz is probably http://www.linkedin.com/pub/charles-proctor/14/591/31

@CharlesBurleigh is http://www.linkedin.com/in/charlesburleigh

@wesharp is http://www.linkedin.com/in/williamesharp doing @dqchronicle

@decisionstats has an editor being http://www.linkedin.com/in/ajayohri

@jeric40 is my colleague at Omikron as shown here http://www.linkedin.com/in/janerikingvaldsen

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph