A year ago I wrote a blog post about data matching published on the Informatica Perspective blog. The post was called Five Future Data Matching Trends.
One of the trends mentioned is hierarchical data matching.
The reason we need what may be called hierarchical data matching is that more and more organizations are looking into master data management and then they realize that the classic name and address matching rules do not necessarily fit when party master data are going to be used for multiple purposes. What constitutes a duplicate in one context, like sending a direct mail, doesn’t necessary make a duplicate in another business function and vice versa. Duplicates come in hierarchies.
One example is a household. You probably don’t want to send two sets of the same material to a household, but you might want to engage in a 1-to-1 dialogue with the individual members. Another example is that you might do some very different kinds of business with the same legal entity. Financial risk management is the same, but different sales or purchase processes may require very different views.
I usually divide a data matching process into three main steps:
- Candidate selection
- Match scoring
- Match destination
(More information on the page: The Art of Data Matching)
Hierarchical data matching is mostly about the last step where we apply survivorship rules and execute business rules on whether to purge, merge, split or link records.
In my experience there are a lot of data matching tools out there capable of handling candidate selection, match scoring, purging records and in some degree merging records. But solutions are sparse when it comes to more sophisticated things like spitting an original entity into two or more entities by for example Splitting Names or linking records in hierarchies in order to build a Hierarchical Single Source of Truth.
A story featured a lot in the media the last days is the incident where one of richest women on the planet, Oprah Winfrey, was told that she couldn’t afford the handbag she wanted to look at in a Zürich shop. Was it racism or a misunderstanding because Oprah isn’t good at speaking German?
Either way it was for sure an example of bad things happening when you don’t know your customer. This story also highlights the issues we have with foreign customers as Oprah may not be just as famous in Zürich as in New York.
We have these challenges in customer master data management all over as described in the post Know Your Foreign Customer.
And oh: Maybe it’s time to start a sister blog called Liliendahl on Fashion. This is my second post on luxury handbags. The first post was called Data Quality Luxury.
When calling people in order to have a long distance conversation there are three main ways today:
- The landline phone, which have been around since the 19th century and penetrated most homes and businesses in the last century
- The mobile phone, which came around in the 70’s and spread rapidly in the 90’s
- Skype, a voice over internet service that grew in the 00’s
Using these services involves and identifier which may be stored in customer tables and other party master data repositories with some implications for data management and identity resolution:
The Landline Phone Number
The landline phone number is a very common attribute in databases around and is often used as the main identifier of a customer in ERP and CRM solutions around.
Using a landline phone number for identity resolution has some challenges, including:
- As with most attributes they may change. Depending on the country in question they may change during relocation and most phone number systems gets and upgrade over the years.
- In business-to-business (B2B) a company typically has more than one phone number.
- In business-to-consumer (B2C) the landline phone number merely belongs to a household rather than a single individual. That may be good or not good depending on purpose of use.
The Mobile Phone Number
Mobile phone numbers also piles up in databases around. In relation to identity resolution there are issues with mobile phone numbers, namely:
- They change a lot.
- It’s not always clear to who a number actually belongs:
- A company paid phone may be used for both business and pleasure and may be transferred to another individual
- In a household a person may be registered for a range of mobile phones used by individual members of the household including children
The Skype ID
I seldom see databases with Skype ID’s. In my experience Skype ID aren’t used a lot in internal master data. They reside in Skype and social network profiles like for example LinkedIn.
A final rant
Today I hardly ever use a landline phone, I use my mobile once in a while and I use Skype a lot. Not because it’s convenient, but because the telecom companies has decided to charge international mobile calls in ways so greedy that it make Somali sea pirates look like honest business men.
Most data matching activities going on are related to matching customer, other rather party, master data.
In today’s business world we see data matching related to party master data in those three different channels types:
- Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
- Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
- Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
- You can’t measure a true response rate
- Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.
The concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.
Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.
When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.
Some advices for getting it right without too much complication are:
- Emphasize on data capturing by getting it right the first time. It helps a lot.
- Get your data models right. Here reflecting the real world helps a lot.
- Don’t reinvent the wheel. There are services for this out here. They help a lot.
Read more about such a service in the post instant Single Customer View.
A few days ago Jeff Jonas of IBM made a new blog post called Master Data Management (MDM) vs. Sensemaking.
Herein Jeff Jonas ponders the differences in the data matching algorithms we use in traditional MDM, predominately name and address matching, and the kind of identity resolution we need when we for example try to listen to and make sense of the signals in the social media data streams.
Jeff Jonas says: “Different missions, different tools. Some organizations will use one or the other; most organizations will want both.”
I tend to disagree slightly with Jeff Jonas. As told in the post The New Year in Identity Resolution I think we will need a connection between the old systems of record and the new systems of engagement.
Indeed the algorithms will be used differently and indeed we need different thresholds of confidence for different tasks. But I think we will have to make the integration story a bit more complicated in order to make sensible decisions across the two missions.
As reported in the post Fighting Identity Fraud with Identity Fraud and experienced with the post 255 Reasons for Data Quality Diversity I have seen several sloppy attempts of link building from SEO agencies working for data quality tool vendors.
The other day it happened again, this time on LinkedIn.
There was a comment in the Master Data Management Interest group:
The comment is now deleted by the author and I do understand why.
I guess a SEO guy was working for Simon at DataLadder and Nathan from somewhere else at the same time and given access to their LinkedIn accounts. However he/she posted a comment to be meant being from Simon logged in as Nathan (who is not working with MDM and data quality).
So, data quality tool and service vendors: You can’t fight identity fraud with identity fraud and you can’t advocate for a single view of customer with a messy view of you as a vendor. Be authentic.
Recently I stumbled upon a report called Future Identities in the UK. The purpose of the report is to provide the government in the UK insight into how identities of citizens will develop over the next 10 years. But the insight certainly also applies to how private companies will have to react to this development and certainly also not just in the UK.
The report talks about three different kinds of identities:
Applied to data quality and master data management I think these future kinds of identities will have these consequences:
Biometric identities relates to hard core identity resolution as in fighting terrorism, crime investigation and physical access control but is sometimes even used in simple commercial checks as told in the post Real World Identity. My guess is that we will see biometrics used more as a mean to have better data quality, but not considerable more due to return of investment also as examined in the post Citizen ID and Biometrics.
Biographical identities and the related attributes resembles what we often also calls demographic attributes used in handling data for direct marketing and other purposes of data management. Direct marketing may, as reported in the post Psychographic Data Quality, be in transition to go deeper into big data in order to be psychographic marketing.
Social identities is the new black. As discussed on this blog, latest in the post Defining Social MDM, my guess is that social data master management is going to be big and has to be partly interwoven with using traditional biographical attributes and even, like it or not, biometric attributes. The art of doing that in a proper way is going to be very exciting.
You may divide doing identity resolution into these categories:
- Hard core identity check
- Light weight real world alignment
- Digital identity resolution
Hard Core Identity Check
Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.
Services for identity checks vary from country to country because of different regulations and different availability of reference data.
An identity check usually involves the entity who is being checked.
Light Weight Real World Alignment
In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.
Doing such kind of identity resolution usually doesn’t involve the entity being examined.
Digital Identity Resolution
Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.
There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.
With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.
My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.
That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.
I have earlier had issues with SEO agencies posting comments on this blog in their quest to help data quality tool vendors in getting better search rank for data quality related terms. Example here.
This happened again today with a recent post called Addressing Digital Identity.
I find it quite funny that the SEO guy is talking about fighting identity fraud while posting a comment under a name that I bet is not his/her real name:
A physical address has traditionally been a core element of doing identity resolution. Stating a name and an address is the most widespread way of telling with which person or which company we are (aiming at) having a business and other form of relationship.
However, during the last 25 years a lot of things have moved from the physical world to the online world. Not at least a lot of things start in the online world while in many cases ends up in the physical world. Today selling, the smart way, starts in social media. Final delivery may be digital or may be sending a package or a consultant to a physical address. A thing like dating most often starts in the online world today but surely the aim is a physical encounter.
This new way of life has a tremendous affect on data quality and master data management. Within quality of contact data, the most frequent domain for data quality issues, we have traditionally dealt with verifying names and addresses and deduplicating names and addresses.
As the best way of preventing data quality issues is looking at the root we must address that onboarding of contact data often starts with a digital identity where a physical address isn’t present in the first place but often will be updated at a later stage.
As described in the post Social MDM and Systems of Engagement a new trend in master data management is to establish a link between the new systems of engagement and the old systems of record.
In the same way data quality prevention and improvement will have to cover establishing a link between a new discipline being digital identity resolution and the good old address verification stuff.