Data Architecture – Page 24 – Liliendahl on Data Quality

Mixed Identities

5th July 20105th July 2010Henrik Gabs Liliendahl4 Comments

A frequent challenge when building a customer master data hub is dealing with incoming records from operational systems where the data in one record belongs to several real world entities.

One situation may be that that a name contains two (or more) real world names. This situation was discussed in the post Splitting names.

Another situation may be that:

The name belongs to real world entity X
The address belongs to real world entity Y
The national identification number belongs to real world entity Z

Fortunately most cases only have 2 different real world representations like X and Y or Y and Z.

An example I have encountered often is when a company delivers a service through another organization. Then you may have:

The name of the 3^rd party organization in the name column(s)
The address of the (private) end user in the address columns

Or as I remember seen once:

The name of the (private) end user in the name column(s)
The address of the (private) end user in the address columns
The company national identification number of the 3^rd party organization in the national ID column

Of course the root cause solution to this will be a better (and perhaps more complex) way of gathering master data in the operational systems. But most companies have old and not so easy changeable systems running core business activities. Swapping to new systems in a rush isn’t something just done either. Also data gathering may take place outside your company making the data governance much more political.

A solution downstream at the data matching gates of the master data hub may be to facilitate complex hierarchy building.

Oftentimes the solution will be that the single customer view in the master data hub will be challenged from the start as the data in some perception is fit for the intended purpose of use.

Citizen ID within seconds

1st June 201020th March 2012Henrik Gabs Liliendahl8 Comments

Here is a picture of my grandson Jonas taken minutes after his was born. He has a ribbon around his wrist showing his citizen ID which has just been assigned. There is even a barcode with it on the ribbon.

Now, I have mixed feelings about that. It is indeed very impersonal. But as a data quality professional I do realize that this is a way of solving a problem at the root. Duplicate master data in healthcare is a serious problem as Dylan Jones reported last year when he had a son in this article from DataQualityPro.

A unique citizen ID (National identification number) assigned in seconds after a birth have a lot of advantages. As said it is a foundation for data quality in healthcare from the very start of a life. Later when you get your first job you hand the citizen ID to your employer and tax is collected automatically. When the rest of the money is in the bank you are uniquely identified there. When you turn 18 you are seamlessly put on the electoral roll. Later your marriage is merely a relation in a government database between your citizen ID and the citizen ID of your beloved one.

Oh joy, Master Data Management at the very best.

Relational Data Quality

20th May 201019th June 2010Henrik Gabs Liliendahl2 Comments

Most of the work related to data quality improvement I do is done with data in relational databases and is aimed at creating new relations between data. Examples (from party master data) are:

Make a relation between a postal address in a customer table and a real world address (represented in an official address dictionary).
Make a relation between a business entity in a vendor table and a real world business (represented in a business directory most often derived from an official business register).
Make a relation between a consumer in one prospect table and a consumer in another prospect table because they are considered to represent the same real world person.

When striving for multi-purpose data quality it is often necessary to reflect further relations from the real world like:

Make a relation in a database reflecting that two (or more) persons belongs to the same household (on the same real world address)
Make a relation in the database reflecting that two (or more) companies have the same (ultimate) mother.

Having these relations done right is fundamental for any further data quality improvement endeavors and all the exciting business intelligence stuff. In doing that you may continue to have more or less fruitful discussions on say the classic question: What is a customer?

But in my eyes, in relation to data quality, it doesn’t matter if that discussion ends with that a given row in your database is a customer, an old customer, a prospect or something else. Building the relations may even help you realize what that someone really is. Could be a sporadic lead is recognized as belonging to the same household as a good customer. Could be a vendor is recognized as being a daughter company of a hot prospect. Could be someone is recognized as being fake. And you may even have some business intelligence that based on the relations may report a given row as a customer role in one context and another role in another context.

Aadhar (or Aadhaar)

2nd May 201021st June 2010Henrik Gabs Liliendahl9 Comments

The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.

Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.

In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.

The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.

Meterencedata

26th April 201026th April 2010Henrik Gabs Liliendahl2 Comments

Today I will like to invent a new word.

The word ”Meterencedata” is a combination of the two terms:

Metadata and
Reference Data

Metadata is data about data. Roughly spoken; in relation to databases and spreadsheets metadata describes what is in the columns.

Reference Data are high level value lists that categorize the data. Roughly spoken; in relation to databases and spreadsheets reference data explains what is in the rows.

Data Management activities – like Data Quality improvement, Master Data Management and Data Migration – will be (and have I seen are) like working in the dark if you don’t know the Metadata – and the Reference Data.

Data Models may look different. Some information may be understood through metadata in a model but through reference data in another model.

Example:

In one data model there are three columns in a customer table with corresponding describing metadata for:
- Fixed line telephone number
- Cell phone number
- Fax number
In another data model there are a phone type reference table explaining the values in a separate phone table under (as a child to) the customer table having the columns:
- Phone type
- Phone number

In the latter case the original phone types may have been the classic fixed line, cell and fax but the entries may have been extended over time as the real world changes. This model also reflects the reality of several same type numbers attached to a single party.

Conclusion: One man’s Metadata is another man’s Reference Data as you don’t meet and mete out the data equal ways.

Data Quality from the Cloud

19th April 201019th July 2010Henrik Gabs Liliendahl11 Comments

One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?”

I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.

Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world. The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

The drivers for increased exploitation of external data are mainly:

Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free

As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.

Beyond Home Improvement

14th April 20106th July 2010Henrik Gabs Liliendahl6 Comments

During my many years in customer master data quality improvement I have worked with a lot of clients having data from several countries. In almost every case the data has been prioritized in two pots:

Master Data referring to domestic customers
Master Data referring to foreign customers

Even though the enterprise defines itself as an international organization, the term domestic still in a lot of cases is easily assigned to the country where a headquarter is situated and where the organization was born.

Signs of this include:

Data formats are designed to fit domestic customers
Internal reference data are richer for domestic locations
External reference data services are limited to domestic customers

The high prioritizing of domestic data is of course natural for historical reasons, because domestic customers almost certainly are the largest group, and because the rules are common to most delegates in a data quality program.

If we accept the fact that improving data quality will be reflected in an improved bottom line, there is still a margin you may improve by not stopping when having optimal procedures for domestic data.

One way of dealing with this in an easy way is to apply general formats, services and rules that may work for data from all over the world, and this approach may in some cases be the best considering costs and benefits.

But I have no doubt that achieving the best data quality with customer master data is done by exploiting the specific opportunities that exist for each country / culture.

Examples are:

The completeness and depth for address (location) data available in each country is very different – so are the rules of the postal service’s operating there
Public sector company and citizen registration practice also differs why the quality of external reference data is different and so are the rules of access to the data.
Using local character sets, script systems, naming conventions and addressing formats besides (or instead of) what applies to that of the headquarter helps with data quality through real world alignment

My guess is that we will see services in cloud in the near future helping us making the global village also come true for master data quality.

Matchback and Master Data Management

10th April 201020th March 2011Henrik Gabs LiliendahlLeave a comment

The term matchback is used by marketers for the process of determining which marketing activity that triggered a given purchase. In these times where multichannel marketing and sale is embraced by more and more companies, doing matchback is becoming more and more complicated.

The core functionality in matchback is the good old data matching, like: Does the name and address in a catalogue sending match (with a certain similarity) the name and address of a new buyer? But you also have to ask questions as: Is this buyer in fact a new buyer or did he buy before – in this channel or in another channel? Was this buyer also included in a concurrent email campaign? If private: Is the new buyer in the same household as an old buyer? If business: Does the new buyer belong to the same company family tree as the old buyer? Was the contact actually a contact at an old business customer?

Answering these questions will be a totally mess if you don’t have a solid party master data management program in place. You need to:

Store (or at least reference) all party entities from all channels in one single so called golden copy
Identify the same real world entities
Build the hierarchies necessary for current and possible future uses of data

Doing matchback is only one of many activities setting the requirements for party master data management program within an enterprise. And by the way: When that is up and running next thing you need is to manage your product master data the same way in order to make further analysis’s – and probably you also need to have a better structure and data quality with your location master data.

I keep my notes about Master Data Management here.

Enterprise Data Mashup and Data Matching

6th April 20107th July 2010Henrik Gabs Liliendahl3 Comments

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

Matching internal data like deduplicating and consolidating
Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

Dealing with annoying customers

21st March 201023rd June 2010Henrik Gabs Liliendahl2 Comments

No, this is not a blog post about how to handle customers that unjustly complaints about everything.

This is a blog post about how to maintain high quality data in customer databases.

When doing that, there are some types of party entities that are more difficult to handle than others. In general B2B (business) entities are more complex than B2C (consumer/citizen) entities. Some of the B2B types I have spent more time with than others are the following:

Restaurants are some of the more demanding guests in our databases:

They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Public sector bodies can’t be sliced the same way as private entities:

Often it is hard to state if a business partner belongs to a narrow defined or a broader defined unit within a governmental or local authority.
Public sector bodies tend to have long names that may be used with different inclusion of words, sequence of words and abbreviations of words.

Global enterprises may be seen as one or as thousands of customers:

The need for hierarchy management is obvious when it comes to handle data about business partners that belongs to a global enterprise – risk management, 1-1 marketing, sales force automation and so on will use the same data in many different ways.
Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

These are some of the facts of life that make it fun and not trivial when you are conducting data matching and other activities in order to achieve and maintain high quality of customer master data.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph