Single version of the truth – Page 2 – Liliendahl on Data Quality

Leads, Accounts, Contacts and Data Quality

29th July 2014Henrik Gabs Liliendahl2 Comments

business partners Many CRM applications have the concepts of leads, accounts and contacts for registering customers or other parties with roles in sales and customer service.

Most CRM systems have a data model suited for business-to-business (B2B) operations. In a B2B environment:

A lead is someone who might become your customer some day
An account is a legal entity who has or seems to become your customer
A contact is a person that works at or in other ways represent an account

In business-to-consumer (B2C) environments there are different ways of making that model work.

The general perception is that data about a lead can be so and so while it of course is important to have optimal data quality for accounts and contacts.

However, this approach works against the essential data quality rule of getting things right the first time.

Converting a lead into an account and/or a contact is a basic CRM process and the data quality pitfalls in that process are many. To name a few:

Is the lead a new account or did we already have that account in the database?
Is the contact new or did we know that person maybe at another account?
How do we align the known data about the lead with external reference data during the conversion process?

In other words, the promise of having a 360-degree customer view is jeopardized by the concept of most CRM systems.

Sharing Big Location Reference Data

19th February 2014Henrik Gabs Liliendahl2 Comments

In the post Location Data Quality for MDM the different ways of handling location master data within many companies was examined.

A typical “as is” picture could be this:

Location data are handled for different purposes using different kinds of systems. Customer data may be data quality checked by using address validation tools and services, which also serves as prerequisite for better utilization of these data in a Geographical Information System (GIS) and in using internal customer master data in marketing research for example by utilizing demographic classifications for current and prospective customers.

Often additional external location data are used for enrichment and for supplementing internal master data downstream in these specialized systems. It may very well be that the external location reference data used at different points does not agree in terms of precision, timeliness, conformity and other data quality dimensions.

A desired “to be” picture could be this:

In this set-up everything that can be shared across different purposes are kept as common (big) reference data and/or are accessible within a data-as-a-service environment maintained by third party data providers.

A Little Bit of Truth vs A Big Load of Trust

28th November 2013Henrik Gabs Liliendahl2 Comments

The soul of Master Data Management (MDM) is often explained as the search for a single version of the truth. It has always puzzled me that that search in many cases has been about finding the truth as the best data within different data silos inside a given organization.

Big data, including how MDM and big data can be a good match, has been a well covered subject lately. As discussed in the post Adding 180 Degrees to MDM this has shed the light on how external data may help having better master data by looking at data from outside in.

At Gartner, the analyst firm, they have phrased that movement as a shift from truth to trust for example as told in the post by Andrew White called From MDM to Big Data – From truth to trust.

Don’t get me (and master data) wrong. The truth isn’t out there in a single silver bullet shot. You have to mash up your internal master data with some of the most trustworthy external big reference data. This include commercial directory offerings, open data possibilities, public sector data (made available for private entities) and social networks.

Indeed there are potholes in that path. Timeliness of directories, completeness of open data, consistency and availability and price tags on public sector data and validity of social network data are common challenges.

Building an instant Data Quality Service for Quotes

12th October 201312th October 2013Henrik Gabs LiliendahlLeave a comment

In yesterday’s post called Introducing the Famous Person Quote Checker the issue with all the quotes floating around in social media about things apparently said by famous persons was touched.

The bumblebee can’t fly faster than the speed of light – Albert Einstein

If you were to build a service that could avoid postings with disputable quotes, what considerations would you have then? Well, I guess pretty much the same considerations as with any other data quality prevention service.

Here are three things to consider:

Getting the reference data right

Finding the right sources for say reference data for world-wide postal addresses was discussed in the post A Universal Challenge.

The same way, so to speak, it will be hard to find a single source of truth about what famous persons actually said. It will be a daunting task to make a registry of confirmed quotes.

Embracing diversity

Staying with postal addresses this blog has a post called Where the Streets have one Name but Two Spellings.

The same way, so to speak again, quotes are translated, transliterated and has gone through transcription from the original language and writing system. So every quote may have many true versions.

Where to put the check?

As examined in the post The Good, Better and Best Way of Avoiding Duplicates there are three options:

1) A good and simple option could be to periodically scan through postings in social media and when a disputable quote is found sending an eMail to the culprit who did the posting. However, it’s probably too late, as even if you for example delete your tweet, the 250 retweets will still be out there. But it’s a reasonable way of starting marking up all the disputable quotes out there.

2) A better option could be a real-time check. You type in a quote on a social media site and the service prompts you: “Hey Dude, that person didn’t say that”. The weak point is that you already did all the typing, and now you have to find a new quote. But it will work when people try to share disputable quotes.

3) The best option would be that you start typing “If you can’t explain it simply… “ and the service prompts a likely quote as: “Everything should be as simple as it can be, but not simpler – Albert Einstein”.

On Maps, Data Quality and MDM

20th August 2013Henrik Gabs LiliendahlLeave a comment

Maps are great but sometimes you’ll have some trouble with data quality issues on maps as told in the post Troubled Bridge over Water.

When it comes to political borders on maps things may get really nasty as it happened lately for Huawei with a congratulation to Pakistan on the independence day showing a map with borders not in line with the Pakistani version of the truth. The story is told here.

There are plenty of disputes about borders in the world stretching from the serious situations in the Himalaya region to for example the close to comical case between Canada and Denmark/Greenland over Hans Island.

In these situations you can’t settle on a single version of the truth.

However, even if we don’t have disputes on what is right or wrong we may have very different views on how to look at various entities as examined in the post The Greenland Problem in MDM.

Hierarchical Data Matching

13th August 2013Henrik Gabs Liliendahl9 Comments

A year ago I wrote a blog post about data matching published on the Informatica Perspective blog. The post was called Five Future Data Matching Trends.

One of the trends mentioned is hierarchical data matching.

The reason we need what may be called hierarchical data matching is that more and more organizations are looking into master data management and then they realize that the classic name and address matching rules do not necessarily fit when party master data are going to be used for multiple purposes. What constitutes a duplicate in one context, like sending a direct mail, doesn’t necessary make a duplicate in another business function and vice versa. Duplicates come in hierarchies.

One example is a household. You probably don’t want to send two sets of the same material to a household, but you might want to engage in a 1-to-1 dialogue with the individual members. Another example is that you might do some very different kinds of business with the same legal entity. Financial risk management is the same, but different sales or purchase processes may require very different views.

I usually divide a data matching process into three main steps:

Candidate selection
Match scoring
Match destination

(More information on the page: The Art of Data Matching)

Hierarchical data matching is mostly about the last step where we apply survivorship rules and execute business rules on whether to purge, merge, split or link records.

In my experience there are a lot of data matching tools out there capable of handling candidate selection, match scoring, purging records and in some degree merging records. But solutions are sparse when it comes to more sophisticated things like spitting an original entity into two or more entities by for example Splitting Names or linking records in hierarchies in order to build a Hierarchical Single Source of Truth.

180 Degree Prospective Customer View isn’t Unusual

5th August 2013Henrik Gabs Liliendahl3 Comments

My eMail inbox is collecting received mails from several eMail accounts and therefore it’s not unusual to have duplicate messages in there.

This morning I had two eMails coming in to two different eMail accounts probably part of the same campaign but with different messages:

Apparently I have landed in two different segments with two different eMail accounts: One technology oriented and one sales and marketing oriented.

Record linking of sparse subscription profiles isn’t easy and even Informatica, a big player in Master Data Management and Data Quality solutions, have land to be covered in this game.

Hear ye, hear ye, hear ye

25th July 2013Henrik Gabs LiliendahlLeave a comment

A certain birth in London the other day was widely visualized by the announcement by a royal crier in front of St. Mary’s Hospital.

However, as reported by International Business Times here, the crier in fact just crashed the party, as he wasn’t invited by any Royal party. But the cries and included facts were true right enough.

So, this time everything was OK. But in general it’s amazing how we confuse great visualization and trustworthiness.

Where the Streets have one Name but Two Spellings

23rd July 2013Henrik Gabs Liliendahl2 Comments

Last week’s post called Where The Streets have Two Names caught a lot of comments both on this blog and in LinkedIn groups as here on Data Quality Professionals and on The Data Quality Association, with a lot of examples from around the world on how this challenge actually exist more or less everywhere.

Recently I had the pleasure of experiencing a variant of the challenge when driving around in a rented car in the Saint Petersburg area in Russia. Here the streets usually only have one name but that may be presented in two different alphabets being the local Cyrillic or the Latin alphabet I’m used to which also was included in the reference data on the Sat Nav. So while it was nice for me to type destinations in Latin letters it was nice to have directions in Cyrillic in order to follow the progress on road signs.

So here standardization (or standardisation) to one preferred language, alphabet or script system isn’t the best solution. Best of breed solutions for handling addresses must be able to handle several right spellings for the same address.

Nevsky_Prospekt,_St_Petersburg,_street_sign — Street sign in Cyrillic with Latin subtitle

When Bad Data Quality isn’t Bad Data

24th April 201324th April 2013Henrik Gabs Liliendahl7 Comments

There has been a quiz running on this blog with the question: What is the name of the current Pope of the Catholic Church?. Find the current standing of answers in the figure to the right.

It’s good to see a lot of different answers and indeed, a problem with the quiz is that all answers may be correct. While Francis is the name as pope in English chosen by Jorge Mario Bergoglio, the pope has other names in other languages as Frans in Danish and Norwegian, François in French, Franziskus in German and Francesco in Italian.

The quiz is actually bad as it has not included other good answers as Franciscus, the latin name, Francisco, the Spanish name, and Franciszek, the Polish name. The question in the quiz is too simple. What is meant by “the name” should be clarified: Is it the birth name, the chosen name as Pope in a given language or what?

Such problems are in fact very common related to what we often see as bad data quality, as it reflects two frequent issues which aren’t about the raw data:

Data models are too simple. In this case we could be able to reflect different types of names: Birth name and what (sorry, believers) resembles a screen name. And names in various languages.
Metadata is too weak. In this case it could be more precise what name we are collecting, if it is only one of the name types we need, for example chosen name in English. More about metadata on Wikipedia.

What other issues have you encountered seen as bad data quality, but which isn’t bad raw data?

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph