Liliendahl on Data Quality

Beyond Home Improvement

14th April 20106th July 2010Henrik Gabs Liliendahl6 Comments

During my many years in customer master data quality improvement I have worked with a lot of clients having data from several countries. In almost every case the data has been prioritized in two pots:

Master Data referring to domestic customers
Master Data referring to foreign customers

Even though the enterprise defines itself as an international organization, the term domestic still in a lot of cases is easily assigned to the country where a headquarter is situated and where the organization was born.

Signs of this include:

Data formats are designed to fit domestic customers
Internal reference data are richer for domestic locations
External reference data services are limited to domestic customers

The high prioritizing of domestic data is of course natural for historical reasons, because domestic customers almost certainly are the largest group, and because the rules are common to most delegates in a data quality program.

If we accept the fact that improving data quality will be reflected in an improved bottom line, there is still a margin you may improve by not stopping when having optimal procedures for domestic data.

One way of dealing with this in an easy way is to apply general formats, services and rules that may work for data from all over the world, and this approach may in some cases be the best considering costs and benefits.

But I have no doubt that achieving the best data quality with customer master data is done by exploiting the specific opportunities that exist for each country / culture.

Examples are:

The completeness and depth for address (location) data available in each country is very different – so are the rules of the postal service’s operating there
Public sector company and citizen registration practice also differs why the quality of external reference data is different and so are the rules of access to the data.
Using local character sets, script systems, naming conventions and addressing formats besides (or instead of) what applies to that of the headquarter helps with data quality through real world alignment

My guess is that we will see services in cloud in the near future helping us making the global village also come true for master data quality.

Matchback and Master Data Management

10th April 201020th March 2011Henrik Gabs LiliendahlLeave a comment

The term matchback is used by marketers for the process of determining which marketing activity that triggered a given purchase. In these times where multichannel marketing and sale is embraced by more and more companies, doing matchback is becoming more and more complicated.

The core functionality in matchback is the good old data matching, like: Does the name and address in a catalogue sending match (with a certain similarity) the name and address of a new buyer? But you also have to ask questions as: Is this buyer in fact a new buyer or did he buy before – in this channel or in another channel? Was this buyer also included in a concurrent email campaign? If private: Is the new buyer in the same household as an old buyer? If business: Does the new buyer belong to the same company family tree as the old buyer? Was the contact actually a contact at an old business customer?

Answering these questions will be a totally mess if you don’t have a solid party master data management program in place. You need to:

Store (or at least reference) all party entities from all channels in one single so called golden copy
Identify the same real world entities
Build the hierarchies necessary for current and possible future uses of data

Doing matchback is only one of many activities setting the requirements for party master data management program within an enterprise. And by the way: When that is up and running next thing you need is to manage your product master data the same way in order to make further analysis’s – and probably you also need to have a better structure and data quality with your location master data.

I keep my notes about Master Data Management here.

Enterprise Data Mashup and Data Matching

6th April 20107th July 2010Henrik Gabs Liliendahl3 Comments

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

Matching internal data like deduplicating and consolidating
Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

Breaking through an open door

4th April 20104th April 2010Henrik Gabs Liliendahl4 Comments

This is perhaps a road I have been down before for example lately in the post The Myth about a Myth.

But it is a pet peeve of mine.

Why are some people always reminding us that this and that must be seen in a business context?

Of course everything we do in our professional life within data quality, master data management, business intelligence and so on must be seen in a business context. Again, I have never seen any people taking the opposite stance.

I am aware that playing the “business context” card is a friendly reminder when say some people become too excited about a tool. But remember, every tool is originally made by people to solve a business challenge and if the tool continues to exist it has probably done that several times.

It may be that tools are over exposed in our business issue discussions due to that some people are doing their job:

Vendors are naturally pushing their tools – it’s a business issue
Analysts talks about tools and vendors – it’s a business issue
Conference organizers invites vendors to make sponsorships and tool exhibitions – it’s a business issue

But I don’t think you are breaking through anything when reminding anyone about the business context. Everyone knows that already. Take it to the next level.

Reduplication: The next big thing?

1st April 20107th July 2010Henrik Gabs Liliendahl6 Comments

Today I got a very exciting Master Data Management assignment. Usually I do deduplication processes which means that two or more rows in a database are merged into one golden record because the original rows represents the same real world entity.

But in this case we are going to split one row into several rows with random keys (a so called MNUID = Messy Non-Unique IDentifier). Also names and addresses have to be misspelled in different ways so they are not easily recognized as being the same.

My client, the Danish Tax Authorities, has for years tried to develop methods for taxation above 100% and has finally reached this simple but very efficient method. Until now you as one person or one company pay up to 60% tax, but now each duplicate row will pay 60%. Hereby in phase one you may in fact pay 120%, but in later phases this will be extended to larger duplicate groups paying much higher percentages.

Already some foreign tax authorities have shown deep interest in this model (called Intelligent Reduplication for Supertaxation). First of all our Scandinavian neighbors are very interested, but eventually it may spread to the rest of the world.

Which came first, the chicken or the egg?

29th March 20107th April 2012Henrik Gabs Liliendahl3 Comments

The most common symbol for Easter, which is just around the corner in countries with Christian cultural roots, is the decorated egg. What a good occasion to have a little “which came first” discussion.

So, where do you start if you want better information quality: Data Governance or Data Quality improvement?

In order to look at it exemplified with something that is known to nearly everyone’s business, let’s look at party master data where we face the ever recurring questing: What is a customer? Do you have to know the precise answer to that question (which looks like a Data Governance exercise) before correcting your party master data (which often is a Data Quality automation implementation).

I think this question is closely related to the two ways of having high quality data:

Either they are fit for their intended uses
Or they correctly represent the real-world construct to which they refer

In my eyes the first way, make data fit for their intended uses, is probably the best way if you aim for information quality in one or two silos, but the second way, alignment with the real world, is the best and less cumbersome way, if you aim for enterprise wide information quality where data are fit for current and future multiple purposes.

So, starting with Data Governance and then long way down the line applying some Data Quality automation like Data Profiling and Data Matching seems to be the way forward in if you go for intended use.

On the other hand, if you go for real world alignment it may be best that you start with some Data Profiling and Data Matching in order to realize what the state of your data is and make the first corrections towards having your party master data aligned with the real world. From there you go forward with an interactive Data Governance and Data Quality automation (never ending) journey which includes discovering what a customer role really is.

What is a best-in-class match engine?

26th March 201021st June 2010Henrik Gabs Liliendahl9 Comments

Latest in connection with that TIBCO acquires data matching vendor Netrics the term best-in-class match engine has been attached to the Netrics product.

First: I have no doubt that the Netrics product is a capable match engine – I know that from discussions in the LinkedIn Data Matching group and here on this blog.

Next: I don’t think anyone knows what product is the best match engine, because I don’t think that all match engines have been benchmarked with a representative set of data.

There are of course on top the matching capabilities with different entity types to consider. Here party master data (like customer data) are covered by most products whereas capabilities with other entity types (be that considered same same or not) are far less exposed.

As match engine products are acquired and integrated in suites the core matching capabilities somehow becomes mixed up with a lot of other capabilities making it hard to compare the match engine alone.

Some independent match engines work stand alone and some may be embedded into other applications.

These may then be the classes to be best in:

Match engines in suites
Embedded match engines (for say SAP, MS CRM and so on)
Stand alone match engines

Many match engines I have seen are tuned to deal with data from the country (culture) where they are born and had their first triumphs. As the US market is still far the largest for match engines the nomination of best match engine resembles when a team becomes World Champions in American Football. International/multi-cultural capabilities will become more and more important in data matching. But indeed we may define a class for each country (culture).

In the old days I have heard that one match engine was best for marketing data and another match engine was best for credit risk management. I think these days are over too. With Master Data Management you have to embrace all data purposes.

Some match engines are more successful in one industry. The biggest differentiator in match effectiveness is with B2C and/or B2B data. B2C is the easiest, B2B is more complex and embracing both is in my eyes a must for being considered best-in-class – unless we define separate classes for B2C, B2B and both.

As some matching techniques are deterministic and some are probabilistic the evaluation on the latter one will be based on data already processed in a given instance, as the matching gets better and better as the self learning element is warmed up.

So, yes, an endless religious-like discussion I reopened here.

Double Falshood

22nd March 201015th March 2011Henrik Gabs Liliendahl2 Comments

Always remember to include Shakespeare in a blog, right?

Now, it is actually disputable if Shakespeare has anything to do with the title of this blog post. Double Falshood is the (first part of the) title of a play claimed to be based on a lost play by Shakespeare (and someone else). The only fact that seems to be true in this story is that the plot of the play(s) is based on an episode in Don Quixote by Cervantes. “The Ingenious Hidalgo Don Quixote of La Mancha”, which is the full name of the novel, is probably best known for the attack on the windmills by don Quijote (the Spanish version of the name).

All this confusion about sorting out who, what, when and where, and the feeling of tilting at windmills, seems familiar in the daily work in trying to fix master data quality.

And indeed “double falsehood” may be a good term for the classic challenge in the data quality kind of deduplication, which is to avoid false positives and false negatives at the same time.

Now, back to work.

Dealing with annoying customers

21st March 201023rd June 2010Henrik Gabs Liliendahl2 Comments

No, this is not a blog post about how to handle customers that unjustly complaints about everything.

This is a blog post about how to maintain high quality data in customer databases.

When doing that, there are some types of party entities that are more difficult to handle than others. In general B2B (business) entities are more complex than B2C (consumer/citizen) entities. Some of the B2B types I have spent more time with than others are the following:

Restaurants are some of the more demanding guests in our databases:

They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Public sector bodies can’t be sliced the same way as private entities:

Often it is hard to state if a business partner belongs to a narrow defined or a broader defined unit within a governmental or local authority.
Public sector bodies tend to have long names that may be used with different inclusion of words, sequence of words and abbreviations of words.

Global enterprises may be seen as one or as thousands of customers:

The need for hierarchy management is obvious when it comes to handle data about business partners that belongs to a global enterprise – risk management, 1-1 marketing, sales force automation and so on will use the same data in many different ways.
Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

These are some of the facts of life that make it fun and not trivial when you are conducting data matching and other activities in order to achieve and maintain high quality of customer master data.

Data Quality in the Cloud

19th March 201016th June 2010Henrik Gabs Liliendahl5 Comments

In my previous post I advocated that Data Quality tools in the near future will exploit the huge data resources in the cloud in order to achieve having data of high quality by correctly reflecting the real world construct to which they refer.

I am well aware that this is based on an assumption that data in the cloud are accurate, timely and so on, which is of course not always the case – now. This will only come when a certain data source has a number of subscribers that require a certain level of data quality and perhaps contributes to correcting flaws.

I tried that out right before writing this post when I installed Google Earth on a new laptop. A journey where I shifted between being very impressed and then a bit disappointed.

First the site from where to install – either by position or my OS language – guessed that I am not English speaking. Unfortunately it changed to Dutch – and not Danish. Well, most Dutch words are either like German or English or at least urban slang. I went through. Inside the application most text has now changed to Danish – only with a few Dutch and English labels.

Knowing that the application hasn’t learned anything about me yet I started to type just my street address which is only 8 characters but global unique: “Lerås 13” (remember: house number after street name in my part of the world). The application answered promptly with my full address as first candidate and when clicking on that it took me from high above the earth right down to where I live. Impressing.

Well, the pointer was actually 40 meters NNE from the nearest corner of our premise – and in front of our garage I could recognize the grey car I had 2 years ago. Disappointing.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph