Often we use the who, what and where terms in defining master data opposite to transaction data, like saying:
- Transaction data accurately identifies who, what, where and when and…
- Master data accurately describes who, what and where
Who is easily related to our business partners, what to the products we sell, buy and use – where is the locations of the events.
In some industries when is also easily related to master data entities like in public transportation a time table valid for a given period. Also a fiscal year in financial reporting belongs to the when side of things.
But when is also a factor in improving and preventing data quality related to our business partners, products and locations and assigned categories because the description of these entities do change over time.
This fact is named as “slowly changing dimensions” when building data warehouses and attempting to make sense of data with business intelligence.
But also in matching, deduplication and identity resolution the “when” dimension matters. Having data with the finest actuality doesn’t necessary lead to a good match as you may compare with data not having the same actuality. Here history tracking is a solution by storing former names, addresses, phones, e-mail addresses, descriptions, roles and relations.
Such a complexity is often not handled in master data containers around – and even less in matching environments.
My guess is that the future will bring public accessible reference data in the cloud describing our master data entities with a rich complexity including the when – the time – dimension and capable matching environments around.
Great point about how data changes can affect data matching, especially duplicate consolidation.
I discussed duplicate consolidation techniques in Part 5 of my series on Data Quality Pro:
Where I described that logical linkage is far more common than physical removal as an implementation strategy for duplicate consolidation.
As you describe, this approach is very similar to the concept of a slowly changing dimension in data warehousing (specifically a Type 2 slowly changing dimension where full history is maintained).
After initial deduplication and consolidation linking to a representative record (or “survivor”) for the duplicate group, a significant question in the application logic is whether or not subsequent data matching should only compare to the representative record or if the entire duplicate group should be evaluated.
Complexity can be introduced either way. Data changes often require a reevaluation of previous match results in order to maintain as you refer to it – data with the finest actuality.
Thanks a lot for elaborating Jim.
Survivorship is one of my prospect topics for this blog, but the problem is that you exactly explained all the important stuff in your entry.
So for now I only have the heading: Survival of the fittest.
Either time will erode your blog entry or I’ll stumble over some interesting example from real life and make it.
Most MDM examples are based on customer data. I come from a different background – I work with quantitative analysts, for whom master data describes business entities.
What you’ve described is a problem that affects many of us trying to create master data entities from a collection of data warehouses, each coming from a different source.
“Universal” identifiers such as SEDOLs, CUSIPs and Tickers change over time. All too frequently this time dimension is ignored, so we find completely unrelated companies mapped together.
Thanks for commenting Tamyka. Creepy examples!
Another great post Henrik,
As Jim Harris mentioned Survivorship is very important, and today we are starting to see more and more requirements (business requests and needs) that lend flexibility in that Surviving record. For Example, one business unit will trust source systems A, B and C, while another business unit will trust Systems A C and D, while there is overlap, the ability to create the correct version of the record based on time, source or role is starting to become more important to MDM / DQ projects… Gone are the days of one record of WHO, will rule them ALL (thanks J R R Tolkien) …
So to say WHEN is important, can I add another one, HOW (being how the record is compiled from trusted Source).
Thanks for sharing, keep up the great information.
Thanks Garnie. Indeed HOW is the key. It’s always easy to say WHEN (and WHO and WHAT and WHERE). We should be writing more about how to do it.