Following up on my post no. 100 I can’t resist making a post having 101 in the title. I’ll use 101 in the meaning of an introduction to a subject. As “Data Quality 101” and “MDM 101” is already widely discussed I think “Data Matching 101” is a good title.
Data matching deals with the dimension of data quality I like to call uniqueness. I use uniqueness because it is the positive term describing the state we want to bring our data to – opposite to duplication which is the state we want to change. Just like the other dimensions of data quality also describes the desired states such as accuracy, consistency and timeliness.
Data matching is besides data profiling the activity within data quality that has been automated the most. No wonder since duplicates in especially master data and master data not being aligned with the real world is costing organizations incredible amounts of money. Finding duplicates among millions (or even thousands) of records by manual means is impossible. The same is true for matching with directories with timely descriptions of the real world. You have to use a computerized approach controlled by exactly that amount of manual verification that makes your return on investment positive.
Matching names and addresses (party master data) is the most common area of data matching. Matching product master data is probably going to be the next big thing in matching. I have also been involved in matching location data and timetables.
A computerized approach to data matching may include some different techniques like parsing and standardization, using synonyms, assigning match codes, advanced algorithms and probabilistic learning.
All that is best explained with examples. Therefore I am happy to do a webinar called “The Art of Data Matching” as part of a series of free webinars on eLearningCurve. The webinar will be a sightseeing looking at examples on challenges and solutions in the data matching world.
Date and time: Well, these are matching examples of expressing the moment the webinar starts:
- Friday 06/04/10 12pm EDT
- Friday 04/06/10 18:00 Central European Summer Time
- Sydney, Sat Jun 5 2:00 AM
Link to the eLearningCurve free webinar here.
Henrik, I just signed up for your webinar and I look forward to your presentation in data matching. I have been working in the PIM data cleansing arena for the last 10 years and look forward to future discussions on matching product master data.
I can remember many heated discussions with a software architect (no data experience) as he once explained to me that the match for PIM data should only be one part number and the a rule was no duplicate part numbers. I provided my “real data examples”, his response was that if we have two different component manufacturers that used the same part number we should meet with them and explain that they should change their part number schema.
An even funnier response was to create a fake part number, not sure how to order a part with a fake part number or to reference to the good part number, I guess that wasn’t his issue.
Here is to an improved intelligence for the managemnt of PIM master data!
Thanks Jackie. No doubt about that improving data quality in product master data – including uniqueness – is an area where especially manufacturing companies having lots of spare parts will be able to save costs big time. I have a case story about that here.