Identity Resolution – Liliendahl on Data Quality

MDM and Knowledge Graph

28th November 2021Henrik Gabs Liliendahl2 Comments

As examined in a previous post with the title Data Fabric and Master Data Management, the use of the knowledge graph approach is on the rise.

Utilizing a knowledge graph has an overlap with Master Data Management (MDM).

If we go back 10 years MDM and Data Quality Management had a small niche discipline that was called (among other things) entity resolution as explored in the post Non-Obvious Entity Relationship Awareness. The aim of this was the same that today can be delivered in a much larger scale using knowledge graph technology.

During the past decade there have been examples of using graph technology for MDM as for example mentioned in the post Takeaways from MDM Summit Europe 2016. However, most attempts to combine MDM and graph have been to visualize the relationships in MDM using a graph presentation.

When utilizing knowledge graph approaches you will be able to detect many more relationships than those that are currently managed in MDM. This fact is the foundation for a successful co-existence between MDM and knowledge graph with these synergies:

MDM hubs can enrich knowledge graph with proven descriptions of the entities that are the nodes (vertices) in the knowledge graph.
Additional detected relationships (edges) and entities (nodes) from the knowledge graph that are of operational and/or general analytic interest enterprise wide can be proven and managed in MDM.

In this way you can create new business benefits from both MDM and knowledge graph.

Are These Familiar Hierarchies in Your MDM / DQM / PIM Solution?

7th November 20197th November 2019Henrik Gabs Liliendahl2 Comments

The term family is used in different contexts within Master Data Management (MDM), Data Quality Management (DQM) and Product Information Management (PIM) when working with hierarchy management and entity resolution.

Here are three frequent examples:

Consumer / citizen family

Family consumer citizen When handling party master data about consumers / citizens we can deal with the basic definition of a family, being a group consisting of two parents and their children living together as a unit.

This is used when the business scenario does not only target each individual person but also a household with a shared economy. When identifying a household, a common parameter is that the persons live on the same postal address (at the same time) while observing constellations as:

Nuclear families consisting of a female and a male adult (and their children)
Rainbow families where the gender is not an issue
Extended families consisting of more than two generations
Persons who happen to live on the same postal address

There are multicultural aspects of these constellations including the different family name constructions around the world and the various frequency and acceptance of rainbow families as well of frequency of extended families.

Company family tree

When handling party master data about companies / organizations a valuable information is how the companies / organizations are related most commonly pictured as a company family tree with mothers and sisters. This can in theory be in infinite levels. The basic levels are:

A global ultimate mother being the company that ultimately owns (fully or partly) a range of companies in several countries.
A national ultimate mother being the company that owns (fully or partly) a range of companies in a given country.
A legal entity being the basic registered company within a country having some form of a business entity identifier.
A branch owned by a legal entity and operating from a given postal / visiting address.

Family company You can build your own company tree describing your customers, suppliers and other business partners. Alternatively or supplementary, you can rely on third party business directories. It is here worth noticing that a national source will only go to the ultimate national mother level while a global source can include the global ultimate mother and thus form larger families.

Having a company family view in your master data repository is a valuable information asset within credit risk, supply risk, discount opportunities, cross-selling and more.

Product family

The term “product family” is often used to define a level in a homegrown product classification / product grouping scheme. It is used to define a level that can have levels above and levels below with other terms as “product line”, “product category”, “product class”, “product group”, “product type” and more.

Family product Sometimes it is also used as a term to define a product with a family of variants below, where variants are the same product produced and kept in stock in different colours, sizes and more.

Read more about Stock Keeping Units (SKUs), product variants, product identification and product classification in the post Five Product Information Management Core Aspects.

Using External Data in Data Matching

26th May 20148th July 2014Henrik Gabs Liliendahl4 Comments

One of the things that data quality tools does is data matching. Data matching is mostly related to the party master data domain. It is about comparing two or more data records that does not have exactly the same data but are describing the same real world entity.

Common approaches for that is to compare data records in internal master data repositories within your organization. However, there are great advantages in bringing in external reference data sources to support the data matching.

Some of the ways to do that I have worked with includes these kind of big reference data:

Business directories:

The business-to-business (B2B) world does not have privacy issues in the degree we see in the business-to-consumer (B2C) world. Therefore there are many business directories out there with a quite complete picture of which business entities exists in a given country and even in regions and the whole world.

A common approach is to first match your internal B2B records against a business directory and obtain a unique key for each business entity. The next step of matching business entities with that unique is a no brainer.

The problem is though that an automatic match between internal B2B records and a business directory most often does not yield a 100 % hit rate. Not even close as examined in the post 3 out of 10.

Address directories:

Address directories are mostly used in order to standardize postal address data, so that two addresses in internal master data that can be standardized to an address written in exactly the same way can be better matched.

A deeper use of address directories is to exploit related property data. The probability of two records with “John Smith” on the same address being a true positive match is much higher if the address is a single-family house opposite to a high-rise building, nursery home or university campus.

Relocation services:

A common cause of false negatives in data matching is that you have compared two records where one of the postal addresses is an old one.

Bringing in National Change of Address (NCOA) services for the countries in question will help a lot.

The optimal way of doing that (and utilizing business and address directories) is to make it a continuous element of Master Data Management (MDM) as explored in the post The Relocation Event.

Unique Data = Big Money

14th February 201415th February 2014Henrik Gabs LiliendahlLeave a comment

In a recent tweet Ted Friedman of Gartner (the analyst firm) said:

I think he is right.

Duplicates has always been pain number one in most places when it comes to the cost of poor data quality.

Though I have been in the data matching business for many years and been fighting duplicates with dedupliaction tools in numerous battles the war doesn’t seem to be won by using deduplication tools alone as told in the post Somehow Deduplication Won’t Stick.

Eventually deduplication always comes down to entity resolution when you have to decide which results are true positives, which results are useless false positives and wonder how many false negatives you didn’t catch, which means how much money you didn’t have in return of your deduplication investment.

Bringing in new and be that obscure reference sources is in my eyes a very good idea as examined in the post The Good, Better and Best Way of Avoiding Duplicates.

Data Quality vs Identity Checking

14th November 2013Henrik Gabs Liliendahl1 Comment

Yesterday we had a call from British Gas (or probably a call centre hired by British Gas) explaining the great savings possible if switching from the current provider – which by the way is: British Gas. This is a classic data quality issue in direct marketing operations being accurately separating your current customers and entities belonging to new market.

As I have learned that your premier identity proof in the United Kingdom is your utility bill, this incident may be seen as somewhat disturbing – or by further thinking, maybe a business opportunity 🙂

At iDQ we develop a solution that may be positioned in the space between data quality prevention and identity check by addressing the identity resolution aspect during data capture.

The nearly two year old post The New Year in Identity Resolution explains some different kinds of identity resolution being:

Hard core identity check
Light weight real world alignment
Digital identity resolution

Since then I have seen a slowly but steady convergence of these activities.

Our Double Trouble

31st October 20131st November 2013Henrik Gabs LiliendahlLeave a comment

Using the royal we is usually only for majestic people, but as a person with a being in two countries at the same time, I do sometimes feel that I am we.

So, this morning we once again found our way to London Heathrow Airport for one of our many trips between London and Copenhagen as we have lived in the United Kingdom the last couple of years but still have many business and private ties with The Kingdom of Denmark where we (is that was or were?) born, raised and worked and from where we still hold a passport.

Most public sector and private sector business processes and master data management implementations simply don’t cope with the fast evolving globalization. Reflecting on this, flying over Doggerland, we memorize situations where:

We as a prospect or customer in a global brand are stored as a duplicate record for each country as told in the post Hello Leading MDM Vendor.
You as an employee in a multi-national firm have a duplicate record for each country you have worked in.

People moving between countries are still treated as an exception not covered by adequate business rules and data capture procedures. Most things are sorted out eventually, but it always takes a whole lot of more trouble compared to if you just are born, raised and stays in the same country.

When we landed in Copenhagen this morning we (is that was or were?) able to use the new local smart travel card in order to travel on with public transit. But it wasn’t easy getting the card we remember. With a foreign address you can’t apply online. So we had to queue up at the Central Station, fill in a form and explain that you don’t have an official document with your address in the UK – and we avoided explaining the shocking fact that in the UK your electricity bill is your premier proof of almost anything related to your identity.

What about you? Do you have a being in several countries? Any war stories experienced related to your going back and forth?

Entity Resolution and Big Data

6th October 2013Henrik Gabs Liliendahl1 Comment

The Wikipedia article on Identity Resolution has this catch on the difference between good old data matching and Entity Resolution:

”Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Laboratory for Advanced Research in Entity Resolution and Information Quality:

Works with both structured and unstructured records, and it entails the process of extracting references when the sources are unstructured or semi-structured
Uses elaborate business rules and concept models to deal with missing, conflicting, and corrupted information
Utilizes non-matching, asserted linking (associate) information in addition to direct matching
Uncovers non-obvious relationships and association networks (i.e. who’s associated with whom)”

I have a gut feeling that Data Matching and Entity (or Identity) Resolution will melt together in the future as expressed in the post Deduplication vs Identity Resolution.

If you look at the above mentioned factors that distinguish data matching from identity resolution, some of the often mentioned features in the new big data technology shine through:

Working with unstructured and semi-structured data is probably the most mentioned difference between working with small data versus working with big data.
Working with associations is a feature of graph databases or other similar technologies as mentioned in the post Will Graph Databases become Common in MDM?

So, in the quest of expanding matching small data to evolve into Entity (or Identity) Resolution we will be helped by general developments in working with big data.

Matching for Multiple Purposes

12th September 201312th September 2013Henrik Gabs Liliendahl2 Comments

In a recent post on the InfoTrellis blog we have the good old question in data matching about Deterministic Matching versus Probabilistic Matching.

The post has a good walk through on the topic and reaches this conclusion:

“So, which is better, Deterministic Matching or Probabilistic Matching? The question should actually be: ‘Which is better for you, for your specific needs?’ Your specific needs may even call for a combination of the two methodologies instead of going purely with one.”

On a side note the author of the post is MARIANITORRALBA. I had to use my combined probabilistic and deterministic in-word parsing supported and social media connected data matching capability to match this concatenated name with the Linked profile of an InfoTrellis employee called Marian Itorralba.

This little exercise brings me to an observation about data matching that is, that matching party master data, not at least when you do this for several purposes, ultimately is identity resolution as discussed in the post The New Year in Identity Resolution.

For that we need what could be called hierarchical data matching.

The reason we need hierarchical data matching is that more and more organizations are looking into master data management and then they realize that the classic name and address matching rules do not necessarily fit when party master data are going to be used for multiple purposes. What constitutes a duplicate in one context, like sending a direct mail, doesn’t necessary make a duplicate in another business function and vice versa. Duplicates come in hierarchies.

One example is a household. You probably don’t want to send two sets of the same material to a household, but you might want to engage in a 1-to-1 dialogue with the individual members. Another example is that you might do some very different kinds of business with the same legal entity. Financial risk management is the same, but different sales or purchase processes may require very different views.

This matter is discussed in the post and not at least the comments of the post called Hierarchical Data Matching.

Know Your Fan

1st September 20131st September 2013Henrik Gabs LiliendahlLeave a comment

A variant of the saying “Know Your Customer” for a football club will be “Know Your Fan” and indeed fans are customers when they buy tickets. If they can.

FC Copenhagen cruised into stormy waters when they apparently cancelled all purchases for the upcoming Champions League (European soccer club paramount tournament) clashes against Real Madrid, Juventus and Galatasaray if the purchasers didn’t have a Danish sounding name. The reason was to prevent mixing fans of the different clubs, but surely this poorly thought screening method wasn’t received well among the FC Copenhagen fans not called Jensen, Nielsen or Sørensen.

The story is told in English here on Times of India.

Actually methods of verifying identities are available and cheap in Denmark so I’m surprised to see FC Copenhagen caught offside in this situation.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph