Location, Location, Location

Now, I am not going to write about the importance of location when selling real estates, but I am going to provide three examples about knowing about the location when you are doing data matching like trying to find duplicates in names and addresses.

Location uniqueness

Let’s say we have these two records:

  • Stefani Germanotta, Main Street, Anytown
  • Stefani Germanotta, Main Street, Anytown

The data is character by character exactly the same. But:

  • There is only a very high probability that it is the same real world individual if there is only one address on Main Street in Anytown.
  • If there are only a few addresses on Main Street in Anytown, you will still have a fair probability that this is the same individual.
  • But if there are hundreds of addresses on Main Street in Anytown, the probability that this is the same individual will be below threshold for many matching purposes.

Of course, if you are sending a direct marketing letter it is pointless sending both letters, as:

  • Either they will be delivered in the same mailbox.
  • Or both will be returned by postal service.

So this example highlights a major point in data quality. If you are matching for a single purpose of use like direct marketing you may apply simple processing. But if you are matching for multiple purposes of use like building a master data hub, you don’t avoid some kind of complexity.

Location enrichment

Let’s say we have these two records:

  • Alejandro Germanotta, 123 Main Street, Anytown
  • Alejandro Germanotta, 123 Main Street, Anytown

If you know that 123 Main Street in Anytown is a single family house there is a high probability that this is the same real world individual.

But if you know that 123 Main Street in Anytown is a building used as a nursing home, a campus or that this entrance has many apartments or other kind of units, then it is not so certain that these records represents the same real world individual (not at least if the name is John Smith).

So this example highlights the importance of using external reference data in data matching.

Location geocoding

Let’s say we have these two records:

  • Gaga Real Estate, 1 Main Street, Anytown
  • L.  Gaga Real Estate, Central Square, Anytown

If you match using the street address, the match is not that close.

But if you assigned a geocode for the two addresses, then the two addresses may be very close (just around the corner) and your match will then be pretty confident.

Assigning geocodes usually serve other purposes than data matching. So this example highlights how enhancing your data may have several positive impacts.

Bookmark and Share

6 thoughts on “Location, Location, Location

  1. Monis Iqbal 24th June 2010 / 09:17

    Very good points Henrik and that too on the hot topic ‘geo-location’.

  2. Daryl Swinden 24th June 2010 / 10:09

    Another good post Henrik and nice lateral thinking!
    This highlights the importance of getting your data right and complete before data matching occurs. Obviously you need to ensure data is standardised and as clean as possible to match your address data to the geocode in order to gain the best results, which inherently improves data matching capabilities of the standardised data.

  3. William Sharp 24th June 2010 / 16:05

    very good example of why geocoding is important! I’m glad I stoppped by to read this!

  4. John O'Gorman 24th June 2010 / 17:20

    Hello again, Henrik

    Excellent topic (again!)and one I would like to add an observation for…

    We use faceted classification throughout our DQ work and one of the things we noticed is an extension of your observations. Once you establish a geo-code (or lat/long) for a given address, it becomes persistent in a collection. Now, if Julie’s Massage and Dog Wash moves into 1 Main Street, Anytown (or if Gaga Real Estate changes its name) you don’t have to worry about re-working that organization’s location – you already have it.

    Another way of putting this from a data relationship perspective: the building has a geo-code (and as you mentioned a bunch of other facets) not the organization.

    Cheers.

    John O’

  5. Dylan Jones 25th June 2010 / 06:47

    Great post.

    I’m increasingly recommending organisations use lat-long enrichment, we’re actually doing a webcast about this in a few weeks on Data Quality Pro with a data visualization specialist, there are so many data quality issues that can be detected that are impossible to find using traditional profiling.

  6. Henrik Liliendahl Sørensen 25th June 2010 / 08:49

    Thanks Monis, Daryl, William, John and Dylan for the comments.

    It seems like geocoding (assigning latitude and longitude to addresses) is a hot topic in the data quality realm.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s