How long is a Marathon?

Many large cities around the world have a yearly marathon event. Today it’s Copenhagen (and possibly other cities too).

The marathon distance today is 42,195 kilometers (if I use comma as decimal point) which resembles 26 miles and 385 yards or 26.22 miles (if I use a dot as decimal point).

So even if we today agree about the distance we might represent that distance in various ways. The distance has however varied during history as seen in the table with the length of the Olympic marathons.

What about real world alignment?

Well, if the Greek runner called Pheidippides (sometimes spelled Phidippides or Philippides) took the long but flat Southern route from Marathon to Athens it would have been around 42 kilometers. If he took the shorter but steeper Northern route it would only have been around 35 kilometers.

What about me? Oh, I’ll go for 42,195 kilometers – on the bike.   

Bookmark and Share

Fitness Data

About a month ago I wrote about how my personal data was on-boarded in the local fitness club in the post called Right the First Time.

Since then I have actually succeeded in visiting the gym twice a week and used the amazing technology necessary to get me in action.

As a complete data geek I of course use the full TV screen on the machine not to watch TV but to display the full dashboard with key performance indicators related to my workout. These include:

  • Time done / remaining
  • Pulse with red alert when I’m over the healthy threshold for my age
  • Distance I would have gone if I wasn’t in the same fixed position
  • Calories burned

As with many data presentations we here have a mix of hard facts, like the time done, and then some assumed figures like calories burned. The machine doesn’t really measure the actual accurate burning but calculates the assumed burning as a function of power level, speed, my weight and age.  

It’s actually a question if I really want to know about the calories burned. My conclusion is yes. The time done is wasted anyway, the high pulse doesn’t last and the distance is virtual. So the calories burned fit the purpose of use. It keeps me going.   

Bookmark and Share

Despite Best Intentions

Sometimes you have the best intentions in improving things as data quality and a lot of other things, but somewhere you failed seeing the big picture and it is too late to correct.

From the sports world this apparently happened to the Singapore water polo team at the current Asian Games.

They have new designed speedos honoring the nation’s flag.

But now some ministry tells them, that the swimsuit is inappropriate. But you can’t change outfit during the games.

By the way: I also work at a company with this logo:

Fortunately we haven’t got company speedos.

Bookmark and Share

Game, Set, Match

Tennis is one of the sports I practiced a lot when I was young and still like to play when possible.

As a consequence I guess I also like to follow world class tennis not at least now where we finally got a Dane competing for the big titles. I’m thinking about Caroline Wozniacki who is seeded as number one in the ongoing US Open Grand Slam tournament.

So, as an excuse to write a blog post about it I have come up with these connections between Caroline and Data Matching.

The name:

Wozniacki isn’t exactly a Nordic name as she is the daughter of native-born Polish parents. In fact, if the Polish naming practice should be followed her surname should be Wozniacka; the female form of the name. But as practiced in Western countries she has inherited a genderless family name.  Good for matching.

The bet:

Bets on sports event is like scoring in data matching. You are not 100 % sure but rely on probability. Odds for Caroline winning the US Open opening round matches are as 1.01 and 1.02 = 98 – 99 % certainty = pretty sure. But odds get higher as the tournament proceeds to final rounds and it can go either way.

Bookmark and Share

Four Different Data Matching Stage Types

One of the activities I do in my leisure time is cycling. As a consequence I guess I also like to watch cycling on TV (or on the computer), not at least the cycling sport paramount of the year: Le Tour de France.

In Le Tour de France you basically have four different types of stages:

  • Time trial
  • Stages on flat terrain
  • Stages through hilly landscape
  • Stages in the high mountains

Some riders are specialists in one of the stage types and some riders are more all-around types.

With automated data matching, which is what I do the most in my business time, there are basically also four different types of processes:

  • Internal deduplication of rows inside one table
  • Removal of rows in one table which also appears in another table
  • Consolidation of rows from several tables
  • Reference matching with rows in one table against another (big) table

Internal deduplication

Examples of data matching objectives here is finding duplicates in names and addresses before sending a direct mail or finding the same products in a material master.

The big question in this type of process is if you are able to balance between not making any false positives (being too aggressive) while not leaving to many to many false negatives behind (losing the game). You also have to think about survivorship when merging into a golden record.

In Le Tour de France the overall leader who gets the yellow jersey has to make a good time trial.

Removal

Here the examples of data matching objectives will be eliminating nixies (people who don’t want offerings by mail) before sending a direct mail or eliminating bad payers (people you don’t want to offer a credit).

Probably the easiest process everyone can do – but in the end of the day some are better sprinters than others.

The best sprinter in Le Tour de France gets the green jersey.

Consolidation

When migrating databases and/or building a master data hub you often have to merge rows from several different tables into a golden copy.

Here you often see the difficulty of making data fit for the immediate purpose of use and at the same time be aligned with the real world in order to also being able to handle the needs that arises tomorrow.

Often some of the young riders in Le Tour de France makes an escape when climbing the hills and gets the white jersey.

Reference match

Doing business directory matching has been a focus area of mine including making a solution for match with the D&B worldbase. The worldbase holds over 165 million rows representing business entities from all over the world.

The results from automated matching with such directories may vary a lot like you see huge time differences in Le Tour de France when the riders faces the big mountains. Here the best climber gets the polka dotted jersey.

Bookmark and Share