Following up on my post no. 100 I can’t resist making a post having 101 in the title. I’ll use 101 in the meaning of an introduction to a subject. As “Data Quality 101” and “MDM 101” is already widely discussed I think “Data Matching 101” is a good title.
Data matching deals with the dimension of data quality I like to call uniqueness. I use uniqueness because it is the positive term describing the state we want to bring our data to – opposite to duplication which is the state we want to change. Just like the other dimensions of data quality also describes the desired states such as accuracy, consistency and timeliness.
Data matching is besides data profiling the activity within data quality that has been automated the most. No wonder since duplicates in especially master data and master data not being aligned with the real world is costing organizations incredible amounts of money. Finding duplicates among millions (or even thousands) of records by manual means is impossible. The same is true for matching with directories with timely descriptions of the real world. You have to use a computerized approach controlled by exactly that amount of manual verification that makes your return on investment positive.
Matching names and addresses (party master data) is the most common area of data matching. Matching product master data is probably going to be the next big thing in matching. I have also been involved in matching location data and timetables.
A computerized approach to data matching may include some different techniques like parsing and standardization, using synonyms, assigning match codes, advanced algorithms and probabilistic learning.
All that is best explained with examples. Therefore I am happy to do a webinar called “The Art of Data Matching” as part of a series of free webinars on eLearningCurve. The webinar will be a sightseeing looking at examples on challenges and solutions in the data matching world.
Date and time: Well, these are matching examples of expressing the moment the webinar starts:
- Friday 06/04/10 12pm EDT
- Friday 04/06/10 18:00 Central European Summer Time
- Sydney, Sat Jun 5 2:00 AM
Link to the eLearningCurve free webinar here.








