When matching party data – names and addresses – very often it is not just only about hitting similar records, but also about performing some form of transformation with the data before, during and after the hitting.
Transformation related to matching includes:
Deduplication: One thing is to find 2 (or more) similar records reflecting the same real world entity in a single table. But you also have to deal with survivorship being settling which record must continue and which record(s) must be:
- marked/linked as duplicate
- merged – with related transactions
The actual execution of a deduplication result may of course take place outside the scope of the matching process.
Consolidation: Bringing records from several tables (or a single table with multiple purposes) together often includes managing hierarchies and roles:
- A close similarity between 2 (or more) business entities may imply that they are the same or that they are mothers, daughters or sisters.
- Also a close similarity may be that you are looking at a small business owner in his/her business role and consumer role.
Depending on the desired hierarchy building there are many versions of truth in a consolidation match as discussed in the post Fuzzy Hierarchy Management.
Splitting: From time to time you may detect records holding information about more than one entity:
- 2 consumers in the name field: “Mary & John Smith”
- a business name and a contact name: “Acme Ltd, John Smith”
- a business name and a department name: “Acme Ltd, Sales Dept”
Ultimately you have to split such records in order to get a proper match – and execute the deduplication/consolidation.
More on splitting names here.
Atomising: Parsing (and standardisation) is often done before a match. The purpose is to get a more precise match by comparing smaller elements separately:
- Separate given name(s), surname, salutation, title in person names
- Separate house number, street name, direction etc. in postal addresses
- Separate country codes, region codes in phone numbers
As with everything in matching there is pro’s and con’s with this approach. Combining match with both original and parsed/standardised elements is a plus.
The results from matching and the described transformations may end up serving different (combinations of) implementations as:
Single campaign: Here the match is a one time event or follow up operation as preparation of data for an offline direct marketing campaign, online 1-1 marketing operation or other forms of communication to the parties reflected in the data.
Migration: Data from old system(s) are prepared for load into a new system, maybe with a new data model, maybe with higher data quality goals.
When to do cleansing in migration is discussed here.
Identity Resolution: The party data may have been collected with one purpose but are now prepared to be used with further purposes causing needs for close alignment with the real world.
Master Data Hub: Data are prepared for updating in a hub supposed to be the single source of truth. The hub may be a dedicated MDM solution or a single application database appointed as the enterprise hub – e.g. a CRM system.
A key concept in transformation of data from operational sources into these destinations is Master Data Survivorship.
Henrik; As usual, I couldn’t agree more with your thoughts.