MDM – Page 57 – Liliendahl on Data Quality

Who is working where doing what?

8th November 200924th July 2010Henrik Gabs Liliendahl2 Comments

A classic core data model for Master Data in CRM databases and Master Data hubs when doing B2B is that you have:

Accounts being the BUSINESS entities who are your customers, prospects and all kind of other business partners
Contacts being the EMPLOYEEs working there and acting in the roles as decision makers, influencers, gate keepers, users and so on – and having some kind of job title

Establishing and maintaining an optimal data quality with B2B records are often done by integrating with external reference data.

Available sources for the account layer have been in place for many years as business directories. The D&B Worldbase is one example but there are plenty around with varying scopes. Those directories offered by service providers often also covers the contact layer. But actuality has always been a problem and depth (completeness) have been limited not at least with large business entities. So in most cases I have witnessed only the account level has been integrated with external reference data while the use of external contact layer data have been limited to new market campaigns (with varying results).

With the rise of social network sites information about employees are made more or less available to anyone. Last time (mid-October) I checked on LinkedIn the rate of profiles compared to population was:

Denmark had 435,628 profiles, population 5,519,441 giving a ratio of 7.89 %.
Netherlands had 1,278,927 profiles, population 16,500,156 giving a ratio of 7.75 %
USA had 23,089,079 profiles, population 307,698,000 giving a ratio of 7.50 %.

Other countries I checked had lesser ratios but fast increasing numbers. All in all a formidable source of reference data for the contact layer.

Of course there are data quality issues with social networking sites. Data are maintained by the persons themselves which most often means good actuality and validity – but sometimes also means exaggeration and deceit. And yes, there are duplicate profiles.

Doing Social CRM is already hot stuff. Social MDM – in the meaning of exploiting social network reference data – will follow.

Slowly Changing Hierarchies

4th November 200923rd June 2010Henrik Gabs Liliendahl4 Comments

The term “slowly changing dimensions” is known from building data warehouses and attempting to make sense of data with business intelligence using reference data.

The fact that the world is changing all the time is also present when we look at Master Data Management and the essential hierarchy building taking place when structuring these data.

Company family trees are a common hierarchy structure in Master Data. One source of information about company family trees is the D&B Worldbase – a database operated by Dun & Bradstreet holding over 150 million business entities from all over the world.

I used to have Dun & Bradstreet as a customer. I don’t have that anymore – but I’m still working with the very same project. Because since I started this assignment US based Dun & Bradstreet handed over the operation in a range of European countries to the Swedish publishing group Bonnier. They later handed it over to Swedish company Bisnode. I started the project when I worked for Swedish consultancy group Sigma, continued in my Danish sole proprietorship and now serve Bisnode through German data quality tool vendor Omikron. Slowly changing relationships indeed.

As with many other activities in the realm of data quality establishing the “golden view”, “the single version of the truth” is only the beginning. If that “golden view” is not put into an ongoing maintenance the shiny gold will fade – slowly but steady.

360° Business Partner View

1st November 20096th July 2010Henrik Gabs Liliendahl2 Comments

Having a 360° customer view is a well established term in CRM and Master Data Management. It is typically defined as “providing everyone in the organization with a consistent view of the customer.”

Then some organizations don’t use the term customer but other words like:

Citizen is the common term in public sector organizations when dealing with private persons
Patient is used in healthcare and the customer/citizen balance is different between countries around the world
Member is used in membership organizations like fundraising and those organizing employers and employees

The concept of a 360° customer view is in my eyes easily swapped with 360° citizen / patient/ member view.

Also related to the position in the pipeline we have words as:

Prospect being an entity with whom we have a 1-1 dialogue about becoming a customer
Lead being an entity we want to engage in such a dialogue

I think embracing prospects and leads is a must for a 360° customer view. Having the same real world object acting as a customer and a prospect/lead at the same time doesn’t make sense.

Hierarchy is of course important here, as the customer and the prospect or lead may belong to the same hierarchy but at a different level or only seen at a higher level. This is true for:

Households in B2C operations
Company family trees in B2B operations
Multiple employee engagements in B2B operations
Small business owners in B2B and B2C coexisting environments

Organizations also have suppliers. In a B2B organization the intersection of business partners being customers / prospects / leads and also suppliers may be surprisingly large. Typically the intersection is not that large seen at branch level but higher if we take a look at the ultimate global mother level.

From my point of view a 360° customer view should be made on consolidated customer and supplier hierarchies in B2B. Even in B2C a private customer may be a business owner or key employee at a supplier.

Employees are another master data entity that may have an intersection with customers and suppliers. Having an employee being a (or spouse of a) business owner at a small business supplier is a classic cause of trouble. I have seen situations where a 360° customer view could include employee entities.

Other Business Partner entities exists depending on industry and specific business operations where a 360° customer view would benefit from catching up on other real world party entities.

I think Data Matching and/or upstream prevention by error tolerant search has a busy near future.

Master Data Survivorship

28th October 20092nd July 2010Henrik Gabs Liliendahl1 Comment

A Master Data initiative is often described as making a “golden view” of all Master Data records held by an organization in various databases used by different applications serving a range of business units.

In doing that (either in the initial consolidation or the ongoing insertion and update) you will time and again encounter situations where two versions of the same element must be merged into one version of the truth.

In some MDM hub styles the decision is to be taken at consolidation time, in other styles the decision is prolonged until the data (links) is consumed in a given context.

In the following I will talk about Party Master Data being the most common entity in Master Data initiatives.

This spring Jim Harris made a brilliant series of articles on DataQualityPro on the subject of identifying duplicate customers ending with part number 5 dealing with survivorship. Here Jim describes all the basic considerations on how some data elements survives a merge/purge and others will be forgotten and gives good examples with US consumer/citizens.

Taking it from there Master Data projects may have the following additional challenges and opportunities:

Global Data adds diversity into the rule set of consolidation data on record level as well as field level. You will have to comprise on simple global rules versus complex optimized rules (and supporting knowledge data) for each country/culture.
Multiple types of Party Master Data must be handled when Business Partners includes business entities having departments and employees and not at least when they are present together with consumers/citizens.
External Reference Data is becoming more and more common as part of MDM solutions adding valid, accurate and complete information about Business Partners. Here you have to set rules (on field level) of whether they override internal data, fills in the blanks or only supplements internal data.
Hierarchy building is closely related to survivorship. Rules may be set for whether two entities goes into two hierarchies with surviving parts from both or merges as one with survivorship. Even an original entity may be split into two hierarchies with surviving parts.

What is essential in survivorship is not loosing any valuable information while not creating information redundancy.

An example of complex survivorship processing may be this:

A membership database holds the following record (Name, Address, City):

Margaret & John Smith, 1 Main Street, Anytown

An eShop system has the following accounts (Name, Address, Place):

Mrs Margaret Smith, 1 Main Str, Anytown
Peggy Smith, 1 Main Street, Anytown
Local Charity c/o Margaret Smith, 1 Main Str, Anytown

A complex process of consolidation including survivorship may take place. As part of this example the company Local Charity is matched with an external source telling it has a new name being Anytown Angels. The result may be this “golden view”:

ADDRESS in Anytown on Main Street no 1 having
• HOUSEHOLD having
– CONSUMER Mrs. Margaret Smith aka Peggy
– CONSUMER Mr. John Smith
• BUSINESS Anytown Angels having
– EMPLOYEE Mrs. Margaret Smith aka Peggy

Observe that everything survives in a global applicable structure in a fit hierarchy reflecting local rules handling multiple types of party entities using external reference data.

But OK, we didn’t have funny names, dirt, misplaced data…..

Splitting names

21st October 20095th July 2010Henrik Gabs Liliendahl11 Comments

When working through a list of names in order to make a deduplication, consolidation or identity resolution you will meet name fields populated as these:

Margaret & John Smith
Margaret Smith. John Smith
Maria Dolores St. John Smith
Johnson & Johnson Limited
Johnson & Johnson Limited, John Smith
Johnson Furniture Inc., Sales Dept
Johnson, Johnson and Smith Sales Training

Some of the entities having these names must be split into two entities before we can do the proper processing.

When you as a human look at a name field, you mostly (given that you share the same culture) know what it is about.

Making a computer program that does the same is an exiting but fearful journey.

What I have been working with includes the following techniques:

String manipulation
Look up in list of words as given names, family names, titles, “business words”, special characters. These are country/culture specific.
Matching with address directories, used for checking if the address is a private residence or a business address.
Matching with business directories, used for checking if it is in fact a business name and which part of a name string is not included in the corresponding name.
Matching with consumer/citizen directories, used for checking which names are known on an address.
Probabilistic learning, storing and looking up previous human decisions.

As with other data quality computer supported processes I have found it useful having the computer dividing the names into 3 pots:

A: The ones the computer may split automatically with an accepted failure rate of false positives
B: The dubious ones, selected for human inspection
C: The clean ones where the computer have found no reason to split (with an accepted failure rate of false negatives)

For the listed names a suggestion for the golden single version of the truth could be:

“Margaret & John Smith” will be split into CONSUMER “Margaret Smith” and CONSUMER “John Smith”
“Margaret Smith. John Smith” will be split into CONSUMER “Margaret Smith” and CONSUMER “John Smith”
“Maria Dolores St. John Smith” stays as CONSUMER “Maria Dolores St. John Smith”
“Johnson & Johnson Limited” stays as BUSINESS “Johnson & Johnson Limited”
“Johnson & Johnson Limited, John Smith” will be split into BUSINESS “Johnson & Johnson Limited” having EMPLOYEE “John Smith”
“Johnson Furniture Inc., Sales Dept” will be split into “BUSINESS “Johnson Furniture Inc.” having “DEPARTMENT “Sales Dept”
“Johnson, Johnson and Smith Sales Training” stays as BUSINESS “Johnson, Johnson and Smith Sales Training”

For further explanation of the Master Data Types BUSINESS, CONSUMER, DEPARTMENT, EMPLOYEE you may have a look here.

Business Rules and Duplicates

10th October 200910th October 2010Henrik Gabs Liliendahl2 Comments

When finding or avoiding duplicates or doing similar kind of consolidation with party master data you will encounter lots of situations, where it is disputable what to do.

The “political correct” answer is: Depends on your business rules.

Yea right. Easier said than done.

Often you face the following:

Business rules doesn’t exist. Decisions are based on common sense.
Business rules differs between data providers.

Lets have an example.

We have these business rules (Owner, Brief):

Finance, No sales and deliveries to dissolved business entities

Logistics, Access to premises must be stated in Address2 if different from Address1

Sales, Every event must be registered with an active contact

Customer Service, In case of duplicate contacts the contact with the first event date wins

In a CRM system we have these 2 accounts (AccountID, CompanyName, Address1, Address2, City):

1, Restaurant San Remo, 2 Main Street, entrance thru no 4, Anytown

2, Ristorante San Remo, 2 Main Street, , Anytown

Also we have some contacts (AccountID, ContactID, JobTitle, ContactName, Status, StartYear. EventCount):

1, 1, Manager, Luigi Calda, Inactive, 2001, 2

1, 2, Chef de la Cusine, John Hothead, Active, 2002, 87

2, 1, Chef de la Cuisine, John Hothead, Duplicate, 2008, 2

2, 2, Owner, Gordon Testy, Active, 2008, 7

We are so lucky that a business directory is available now. Here we have (NationalID, Name, Address, City, Owner, Status):

3, Ristorante San Remo, 2 Main Street, Anytown, Luigi Calda, Dissolved

4, Ristorante San Remo, 2 Main Street, Anytown, Gordon Testy, Active

So, I don’t think we will produce a golden view of this business relationship based on the data (structure) available and the business rules available.

Building and aligning business rules and data structures to solve this example – and a lot of other examples with different challenges – may seem difficult and are often omitted in the name of simplicity. But:

Master data – not at least business partners – is a valuable asset in the enterprise, so why treat it with simplicity while we do complex handling with a lot of other (transaction) data.
Common sense may help you a lot. Many of these questions are not specific to your business but are shared among most other enterprises in your industry and many others in the whole real world.
I guess the near future will bring increased number of available services with software and external data support that helps a lot in selecting common business rules and apply these in the master data processing landscape.

Process of consolidating Master Data

27th September 20096th July 2010Henrik Gabs Liliendahl4 Comments

stormp1

In my previous blog post “Multi-Purpose Data Quality” we examined a business challenge where we have multiple purposes with party master data.

The comments suggested some form of consolidation should be done with the data.

How do we do that?

I have made a PowerPoint show “Example process of consolidating master data” with a suggested way of doing that.

The process uses the party master data types explained here.

The next questions in solving our business challenge will include:

Is it necessary to have master data in optimal shape real time – or is it OK to make periodic consolidation?
How do we design processes for maintaining the master data when:
- New members and customers are inserted?
- We update existing members and customers?
- External reference data changes?
What changes must be made with the existing applications handling the member database and the eShop?

Also the question of what style of Master Data Hub is suitable is indeed very common in these kinds of implementations.

Multi-Purpose Data Quality

24th September 200924th September 2011Henrik Gabs Liliendahl3 Comments

Say you are an organisation within charity fundraising. Since many years you had a membership database and recently you also introduced an eShop with related accessories.

The membership database holds the following record (Name, Address, City, YearlyContribution):

Margaret & John Smith, 1 Main Street, Anytown, 100 Euro

The eShop system has the following accounts (Name, Address, Place, PurchaseInAll):

Mrs Margaret Smith, 1 Main Str, Anytown, 12 Euro
Peggy Smith, 1 Main Street, Anytown, 218 Euro
Local Charity c/o Margaret Smith, 1 Main Str, Anytown, 334 Euro

Now the new management wants to double contributions from members and triple eShop turnover. Based on the recommendations from “The One Truth Consulting Company” you plan to do the following:

Establish a platform for 1-1 dialogue with your individual members and customers
Analyze member and customer behaviour and profiles in order to:
- Support the 1-1 dialogue with existing members and customers
- Find new members and customers who are like your best members and customers

As the new management wants to stay for many years ahead, the solution must not be a one-shot exercise but must be implemented as a business process reengineering with a continuous focus on the best fit data governance, master data management and data (information) quality.

So, what are you going to do with your data so they are fit for action with the old purposes and the new purposes?

Recently I wrote some posts related to these challenges:

Any other comments on the issues in how to do it are welcome.

Upstream prevention by error tolerant search

10th September 20095th July 2010Henrik Gabs Liliendahl5 Comments

Fuzzy matching techniques were originally developed for batch processing in order to find duplicates and consolidate database rows with no unique identifiers with the real world.

These processes have traditionally been implemented for downstream data cleansing.

As we know that upstream prevention is much more effective than tidy up downstream, real time data entry checking is becoming more common.

But we are able to go further upstream by introducing error tolerant search capabilities.

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for either introducing duplicates or challenge the real time checking.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

Implementation of such features may be as embedded functionality in CRM and ERP systems or as my favourite term: SOA components. So besides classic data quality elements for monitoring and checking we can add error tolerant search to the component catalogue needed for a good MDM solution.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph