Four Different Data Matching Stage Types

One of the activities I do in my leisure time is cycling. As a consequence I guess I also like to watch cycling on TV (or on the computer), not at least the cycling sport paramount of the year: Le Tour de France.

In Le Tour de France you basically have four different types of stages:

  • Time trial
  • Stages on flat terrain
  • Stages through hilly landscape
  • Stages in the high mountains

Some riders are specialists in one of the stage types and some riders are more all-around types.

With automated data matching, which is what I do the most in my business time, there are basically also four different types of processes:

  • Internal deduplication of rows inside one table
  • Removal of rows in one table which also appears in another table
  • Consolidation of rows from several tables
  • Reference matching with rows in one table against another (big) table

Internal deduplication

Examples of data matching objectives here is finding duplicates in names and addresses before sending a direct mail or finding the same products in a material master.

The big question in this type of process is if you are able to balance between not making any false positives (being too aggressive) while not leaving to many to many false negatives behind (losing the game). You also have to think about survivorship when merging into a golden record.

In Le Tour de France the overall leader who gets the yellow jersey has to make a good time trial.


Here the examples of data matching objectives will be eliminating nixies (people who don’t want offerings by mail) before sending a direct mail or eliminating bad payers (people you don’t want to offer a credit).

Probably the easiest process everyone can do – but in the end of the day some are better sprinters than others.

The best sprinter in Le Tour de France gets the green jersey.


When migrating databases and/or building a master data hub you often have to merge rows from several different tables into a golden copy.

Here you often see the difficulty of making data fit for the immediate purpose of use and at the same time be aligned with the real world in order to also being able to handle the needs that arises tomorrow.

Often some of the young riders in Le Tour de France makes an escape when climbing the hills and gets the white jersey.

Reference match

Doing business directory matching has been a focus area of mine including making a solution for match with the D&B worldbase. The worldbase holds over 165 million rows representing business entities from all over the world.

The results from automated matching with such directories may vary a lot like you see huge time differences in Le Tour de France when the riders faces the big mountains. Here the best climber gets the polka dotted jersey.

Bookmark and Share

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s