MDM – Page 56 – Liliendahl on Data Quality

Matchback and Master Data Management

10th April 201020th March 2011Henrik Gabs LiliendahlLeave a comment

The term matchback is used by marketers for the process of determining which marketing activity that triggered a given purchase. In these times where multichannel marketing and sale is embraced by more and more companies, doing matchback is becoming more and more complicated.

The core functionality in matchback is the good old data matching, like: Does the name and address in a catalogue sending match (with a certain similarity) the name and address of a new buyer? But you also have to ask questions as: Is this buyer in fact a new buyer or did he buy before – in this channel or in another channel? Was this buyer also included in a concurrent email campaign? If private: Is the new buyer in the same household as an old buyer? If business: Does the new buyer belong to the same company family tree as the old buyer? Was the contact actually a contact at an old business customer?

Answering these questions will be a totally mess if you don’t have a solid party master data management program in place. You need to:

Store (or at least reference) all party entities from all channels in one single so called golden copy
Identify the same real world entities
Build the hierarchies necessary for current and possible future uses of data

Doing matchback is only one of many activities setting the requirements for party master data management program within an enterprise. And by the way: When that is up and running next thing you need is to manage your product master data the same way in order to make further analysis’s – and probably you also need to have a better structure and data quality with your location master data.

I keep my notes about Master Data Management here.

Enterprise Data Mashup and Data Matching

6th April 20107th July 2010Henrik Gabs Liliendahl3 Comments

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

Matching internal data like deduplicating and consolidating
Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

Which came first, the chicken or the egg?

29th March 20107th April 2012Henrik Gabs Liliendahl3 Comments

The most common symbol for Easter, which is just around the corner in countries with Christian cultural roots, is the decorated egg. What a good occasion to have a little “which came first” discussion.

So, where do you start if you want better information quality: Data Governance or Data Quality improvement?

In order to look at it exemplified with something that is known to nearly everyone’s business, let’s look at party master data where we face the ever recurring questing: What is a customer? Do you have to know the precise answer to that question (which looks like a Data Governance exercise) before correcting your party master data (which often is a Data Quality automation implementation).

I think this question is closely related to the two ways of having high quality data:

Either they are fit for their intended uses
Or they correctly represent the real-world construct to which they refer

In my eyes the first way, make data fit for their intended uses, is probably the best way if you aim for information quality in one or two silos, but the second way, alignment with the real world, is the best and less cumbersome way, if you aim for enterprise wide information quality where data are fit for current and future multiple purposes.

So, starting with Data Governance and then long way down the line applying some Data Quality automation like Data Profiling and Data Matching seems to be the way forward in if you go for intended use.

On the other hand, if you go for real world alignment it may be best that you start with some Data Profiling and Data Matching in order to realize what the state of your data is and make the first corrections towards having your party master data aligned with the real world. From there you go forward with an interactive Data Governance and Data Quality automation (never ending) journey which includes discovering what a customer role really is.

Dealing with annoying customers

21st March 201023rd June 2010Henrik Gabs Liliendahl2 Comments

No, this is not a blog post about how to handle customers that unjustly complaints about everything.

This is a blog post about how to maintain high quality data in customer databases.

When doing that, there are some types of party entities that are more difficult to handle than others. In general B2B (business) entities are more complex than B2C (consumer/citizen) entities. Some of the B2B types I have spent more time with than others are the following:

Restaurants are some of the more demanding guests in our databases:

They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Public sector bodies can’t be sliced the same way as private entities:

Often it is hard to state if a business partner belongs to a narrow defined or a broader defined unit within a governmental or local authority.
Public sector bodies tend to have long names that may be used with different inclusion of words, sequence of words and abbreviations of words.

Global enterprises may be seen as one or as thousands of customers:

The need for hierarchy management is obvious when it comes to handle data about business partners that belongs to a global enterprise – risk management, 1-1 marketing, sales force automation and so on will use the same data in many different ways.
Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

These are some of the facts of life that make it fun and not trivial when you are conducting data matching and other activities in order to achieve and maintain high quality of customer master data.

What is Data Quality anyway?

17th March 201022nd July 2011Henrik Gabs Liliendahl21 Comments

The above question might seem a bit belated after I have blogged about it for 9 months now. But from time to time I ask myself some questions like:

Is Data Quality an independent discipline? If it is, will it continue to be that?

Data Quality is (or should) actually be a part of a lot of other disciplines.

Data Governance as a discipline is probably the best place to include general data quality skills and methodology – or to say all the people and process sides of data quality practice. Data Governance is an emerging discipline with an evolving definition, says Wikipedia. I think there is a pretty good chance that data quality management as a discipline will increasingly be regarded as a core component of data governance.

Master Data Management is a lot about Data Quality, but MDM could be dead already. Just like SOA. In short: I think MDM and SOA will survive getting new life from the semantic web and all the data resources in the cloud. For that MDM and SOA needs Data Quality components. Data Quality 3.0 it is.

You may then replace MDM with CRM, SCM, ERP and so on and here by extend the use of Data Quality components from not only dealing with master data but also transaction data.

Next questions: Is Data Quality tools an independent technology? If it is, will it continue to be that?

It’s clear that Data Quality technology is moving from being stand alone batch processing environments, over embedded modules to, oh yes, SOA components.

If we look at what data quality tools today actually do, they in fact mostly support you with automation of data profiling and data matching, which is probably only some of the data quality challenges you have.

In the recent years there has been a lot of consolidation in the market around Data Integration, Master Data Management and Data Quality which certainly is telling that the market need Data Quality technology as components in a bigger scheme along with other capabilities.

But also some new pure Data Quality players are established – and I think I often see some old folks from the acquired entities at these new challengers. So independent Data Quality technology is not dead and don’t seem to want to be that.

Having the right element to the left

27th February 201029th May 2012Henrik Gabs Liliendahl9 Comments

Name, address and place are core attributes in almost any database. You may atomize these attributes into smaller slices, but in doing that: Mind the sequence.

When working with data matching and party master data management some of the frequent exposed issues are:

Person name

Often a person name is split into first name and last name, but even when assigning these labels you are on slippery ground. Examples:

In some cultures like in east Asia the family name is written first and the given name is written last.
Some notations indicate that the given name isn’t the first element:
- “DUPONT Michel” is a custom French way of telling, that the family name is the first element
- “Smith, John” is an universal way of telling, that the family name is the first element

Besides that we have issues with middle names and other three part naming and having salutation, education and job titles mixed up in name fields.

Street address

Most of the world is divided into two “street address” cultures:

In the Americas you write the house number in front of street name if you are north of Rio Grande being US and CA, but you write the house number after the street name if you are south of Rio Grande being MeXico, BRazil, ARgentina and almost any other country.
In Europe you write the house number in front of street name if you are on the British Isles or in France, but you write the house number after the street name if you are in almost any other country.
The rest of the world is also divided in writing street addresses.

Besides that we have other ways of writing addresses like the block style in Japan.

Place

Most countries have a postal code system – even Ireland will have that soon.

Despite the fact that a city name in most cases can be obtained by looking up the postal code we often do store the city name anyway – for those cases that we can’t.

And if the postal code and the city name is in one string: Oh yes, in some cultures you write the city name in front of the postal code and in other cultures you do it the opposite way. And oh no: It doesn’t necessary follow the sequence of the house number and street name.

In a blog post written a while ago we also had a look into postal address hierarchy, granularity, precision and history.

Under new Master Data Management

24th February 20109th July 2010Henrik Gabs Liliendahl4 Comments

”Under new management” is a common sign in the window of a restaurant. The purpose of the sign is to tell: Yes, we know: Really bad food was served in a really bad way here. But from now on we have a new management dedicated to serve really good food in a really good way.

By the way: Restaurants are one of the more challenging business entities to handle in Party Master Data Management:

They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Master Data Management is not trivial – serving restaurants or not.

Improving Master Data Management starts with the sign in the window: Yes, we know: Really bad information was served here in a really bad way. But from now on we have a new master data management dedicated to serve really good information in a really good way.

Then you may have a look at the menu. Do we have the right mix of menu items for the guests we like to serve? How are we going to govern a steady flow of fresh raw data that’s going to be prepared and selected from the menu and end up at the tables?

What about the waiters attitude? Serving is much more fun if you are proud about the dishes coming from the kitchen. It’s pleasant to bring compliments from guests back to the kitchen – not at least given along with great tips.

The information chef have to be very much concerned about the raw data quality and the tools available for what may be similar to rinsing, slicing, mixing and boiling food.

Bon appetit.

Deploying Data Matching

18th February 20102nd July 2010Henrik Gabs Liliendahl2 Comments

As discussed in my last post a core part of many Data Quality tools is Data Matching. Data Matching is about linking entities in or between databases, where these entities are not already linked with unique keys.

Data Matching may be deployed in some different ways, where I have been involved in the following ones:

External Service Provider

Here your organization sends extracted data sets to an external service provider where the data are compared and also in many cases related to other reference sources all through matching technology. The provider sends back a “golden copy” ready for uploading in your databases.

Some service provider’s uses a Data Matching tool from the market and others has developed own solutions. Many solutions grown at the providers are country specific equipped with a lot of tips and tricks learned from handling data from that country over the years.

The big advantage here is that you gain from the experience – and the reference data collection – at these providers.

Internal Processing

You may implement a data quality tool from the market and use it for comparing your own data often from disparate internal sources in order to grow the “golden copy” at home.

Many MDM (Master Data Management) products have some matching capabilities build in.

Also many leading Business Intelligence tool providers supplement the offering with a (integrated) Data Quality tool with matching capabilities as an answer to the fact, that Business Intelligence on top of duplicated data doesn’t make sense.

Embedded Technology

Many data quality tool vendors provide plug-ins to popular ERP, CRM and SCM solutions so that data are matched with existing records at the point of entry. For the most popular such solutions as SAP and MS CRM there is multiple such plug-in’s from different Data Quality technology providers. Then again many implementation houses have a favorite combination – so in that way you select the matching tool by selecting an implementation house.

SOA Components

The embedded technology is of course not optimal where you operate with several databases and the commercial bundling may also not be the actual best solution for you.

Here Service Oriented Architecture thinking helps, so that matching services are available as SOA components at any point in your IT landscape based on centralized rule setting.

Cloud Computing

Cloud computing services offered from external service providers takes the best from these two worlds into one offering.

Here the SOA component resides at the external service provider – in best case combining an advanced matching tool, rich external reference data and the tips and tricks for your particular country and industry in question.

The Myth about a Myth

8th February 201023rd June 2010Henrik Gabs Liliendahl38 Comments

A sentiment repeated again and again related to Data (Information) Quality improvement goes like this:

“It’s a myth that Data Quality improvement is all about technology”.

In fact you see the same related to a lot of other disciplines as:

“It’s a myth that Master Data Management is all about technology”.
“It’s a myth that Business Intelligence is all about technology”.
“It’s a myth that Customer Relationship Management is all about technology”.

I have a problem with that: I have never heard anyone say that DQ/MDM/BI/CRM… is all about technology and I have never seen anyone writing so.

When I make the above remark the reaction is almost always this:

“Of course not, but I have seen a lot of projects carried out as if they were all about technology – and of course they failed”.

Unquestionable true.

But the next question is then about root cause. Why did those projects seem to be all about technology? I think it was:

Poor project management or
Bad balance between business and IT involvement or
Immature technology alienating business users.

In my eyes there is no myth about that Data Quality (and a lot of other things) is all about technology. It’s a myth it’s a myth.

What’s in an eMail Address?

28th November 200928th November 2010Henrik Gabs Liliendahl11 Comments

When you are deduping, consolidating or doing identity resolution with party master data the elements that may be used includes names, postal addresses and places, phone numbers, national ID’s and eMail addresses.

Types of eMail addresses

In this post I will look closer into eMail addresses based on a general list of types of party master data.

You may divide eMail addresses into these types:

CONSUMER/CITIZEN:

This is a private eMail address belonging to an individual person.

Typical formats are myname@hotmail.com and nickname@gmail.com and name123@anymail.com

You may change your eMail address as a private person as time goes or have several such addresses at a time depending on your favourite providers of eMail services and other reasons to split your personality.

HOUSEHOLD:

A household/family may choose to have a shared eMail Address for private use.

Typical format will be xyz-family@anymail.com where the word family of course could be in a lot of different languages like famiglia-italiano@email.it

A special usage is the GROUP where two (or more) names are included like mary-and-john@anymail.com

EMPLOYEE:

This is the eMail address you are assigned as an employee (including owner) at a company.

Common formats are abc@company.com and name.name@company.com

When you change employer you also change eMail address and you may have several employers or other assignments at the same time. Also different formats like initials and full name may point to the same inbox.

DEPARTMENT:

Here the eMail address is not pointed at a particular person but some sort of a team within a company.

Formats are like sales@company.com and salg@firma.dk and vertrieb@firma.de choosing the sales team in some different languages.

Some eMail are referring to a specific FUNCTION like webmaster@company.com

BUSINESS:

This is an eMail address for the entire company.

Most common formats are info@company.com and company@company.com

INVALID:

Often a field designed for an eMail address is populated with invalid values going from obvious wrong values like XXX to harder detectable syntax errors and not existing domains.

Real world duplication

Many online services are based on registration via an eMail address assuming that one eMail represents one real world entity which of course is not the case.

Even on a service like LinkedIn where you may attach several eMail addresses to one profile you do encounter persons with obvious duplicate profiles.

Multi-channel marketing and sales

An increasing number of organisations are doing both offline and online operations today and when building enterprise wide master data hubs the eMail address becomes an more and more important element in matching party master data.

In such matching activities the eMail address can not stand alone but must be combined with the other elements as names, postal addresses, phone numbers and national ID’s upon availability.

Success in automating such processes is based on advanced algorithms in flexible and configurable solutions.

Comment or eMail me

If you also have been battling here I will be glad to have your comments here or by mail. My mail is hlsgr@mail.tele.dk and hls@omikron.net and hls@locus.dk and hls@dmpartner.dk and nordic@omikron.net

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph