Where the Streets have Two Names

As told in post The Art in Data Matching a common challenge in matching names and addresses is that in some parts of the world the streets have more than one name at the same time because more than one language is in use.

We have the same challenge when building functionality for rapid addressing, being functionality that facilitates fast and quality assured entry of addresses supported by reference data that knows about postal codes / cities and street names.

The below example is taken from the instant Data Quality tool address form:

Finish Swedish

The Finnish capital Helsinki also has an official name in Swedish being Helsingfors and the streets in Helsinki/Helsingfors have both Finnish and Swedish names. So when you start typing a letter suggestions could be in both Finnish and Swedish.

What challenges have you encountered with street names in multiple languages?

Bookmark and Share

A Blast from the Past

Many days I work in a so called day office, which is an office booked for a single day at a location convenient for where I am and is going to do on that day.

My day office today comes with a Rolodex.

rolodex

But I have trouble connecting it with Bluetooth 🙂

Fortunately means of keeping a contact list has improved over the years, not at least when it comes to connectivity:

  • The Personal Digital Assistant (PDA) usually could do Bluetooth or had other ways to connect to other devices and share data that way.
  • With the rise of Customer Relationship Management (CRM) systems your contact list was blended with the contact list of everyone else in your company.
  • Now with Social CRM (SCRM) your company’s contact list is (or will be) integrated with social networks.

Data Quality challenges and opportunities also have changed with the development in how to keep a contact list:

  • The Rolodex was totally dependent on you keeping the data up-to-date and it was your choice how it was indexed – by given name, surname or whatever.
  • The PDA data should be kept timely by you as well. When exchanging with other devices different ways of organizing data could be a pain somewhere.
  • With CRM systems updates from third party sources became relevant and you aren’t alone on making the updates – differently. Duplicates and data not fit for your purpose is a pain.
  • Now with SCRM your contacts themselves may make most of the updates. Now you have to figure out which ones to rely upon and how to link with your old recording. In other words: Social Master Data Management.

Well, perhaps I better have to forget about using the Rolodex and get on with today’s tweeting. Now, where is my pencil?

Papertweet

Bookmark and Share

Rapid Addressing, Structured or Unstructured Approach

Systems supporting faster and more accurate registration of addresses are becoming more and more common along with that they are becoming better and better.

I have noticed a structured and an unstructured approach to rapid addressing – and hybrids of course.

Structured Approach

The general concept is that you target in on the address like this:

  • First you choose a country from a country list (unless it’s always the same country).
  • Then you select a state or province if a state or province is a mandatory part of an address in that country like it is in the United States, Canada, Australia and India
  • Then you type a postal code if the country has a postal code system. It may be suggested as you write.
  • Then you type a street if the country has thoroughfare based addressing. It may be suggested as you write. For some countries, like the United Kingdom, or part of a country the street is unique by the postal code.
  • Then you type a building number. May be suggested if present in reference data.
  • Then you type a unit or other section of building where applicable. May be suggested if present in reference data.

Rapid AddressingUnstructured Approach

You type in the sequence in a single string as it suites you and the system figures out along the way what matches and makes suggestions.

This approach may better fit the way the address is known to you, but does on the other hand sometimes require you to start again and thereby the rapidness disappears a bit.

Hybrid Approach

A common hybrid solution as that you select the country before going unstructured. That cures the worst system glitches.

What’s Your Approach?

What are your experiences as a user? Maybe you are developing rapid addressing and have had your considerations. Where do you stand?

Bookmark and Share

Double Trouble with Social MDM and Big Data

Yesterday was the first day at the MDM Summit Europe 2013 in London.

One of the workshops I attended was called Master Data Governance for Cloud/Social MDM/Big Data. The workshop was lead by Malcolm Chisholm, one of my favorite thought leaders within data management.

According to Malcolm Chisholm, and I totally agree with that, the rise of social networks and big data will have a tremendous impact on future MDM (Master Data Management) architecture. We are not going to see that these new opportunities and challenges will replace the old way of doing MDM. Integration of social data and other big data will add new elements to the existing component landscape around MDM solutions.

Like it or not, things are going to be more complicated than before.

We will have some different technologies and methodologies handling the old systems of record and the new systems of engagement at the same time, for example relational databases (as we know it today) for master data and columnar databases for big data.

Profiling results from analysis of big data will be added to the current identity resolution centric master data elements handled in current master data solutions. Furthermore, there will be new interfaces for social collaboration around master data maintenance on top of the current interfaces.

So, the question is if taking on the double trouble is worth it. Doing nothing, in this case sticking to small data, is always a popular option. But will the organizations choosing that path exist in the next decade? – or will they be outsmarted by newcomers?

MDM Summit Europe 2013

Bookmark and Share

Big Data and Data Matching

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality
The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way.  Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Bookmark and Share

Making sense with Social MDM

A few days ago Jeff Jonas of IBM made a new blog post called Master Data Management (MDM) vs. Sensemaking.

iDQ microscopeHerein Jeff Jonas ponders the differences in the data matching algorithms we use in traditional MDM, predominately name and address matching, and the kind of identity resolution we need when we for example try to listen to and make sense of the signals in the social media data streams.

Jeff Jonas says: “Different missions, different tools.  Some organizations will use one or the other; most organizations will want both.”  

I tend to disagree slightly with Jeff Jonas. As told in the post The New Year in Identity Resolution I think we will need a connection between the old systems of record and the new systems of engagement.

Indeed the algorithms will be used differently and indeed we need different thresholds of confidence for different tasks. But I think we will have to make the integration story a bit more complicated in order to make sensible decisions across the two missions.

Bookmark and Share

Data Management in the Cloud

We are seeing more and more data management services offered in the cloud.

dnblogo2As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

  • Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
  • Data enrichment embracing a wide range of data attributes
  • Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

Bookmark and Share

instant Single Customer View

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

  • Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
  • Searching internal master data with information already known inside the enterprise
  • Inserting really new entities or updating current entities by picking  as much data as possible from external sources

instant Single Customer View

Some essential capabilities in doing this are:

  • Searching is error tolerant so you will find entities even if the spelling is different
  • The receiving data model is real world aligned. This includes:
    • Party information and location information have separate lives as explained in the post called A Place in Time
    • You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Bookmark and Share

Master Data Management in the Utility Sector

Making vertical MDM (Master Data Management) solutions, being MDM solutions prepared for a given industry, seems to become a trend in the MDM realm.

Traditionally many MDM solutions actually are strong in a given industry or a few related industries.

This is also true for the MDM solution I’m working with right now, as this solution has gained traction in the utility sector.

So, what’s special (and not entirely special) about the utility sector?

Here are three of my observations:

Exploiting big external reference data

As examined in the post instant Data Quality at Work the utility sector may gain much in using all the available external reference data available in the party master data domain, including:

  • Consumer/citizen directories
  • Business directories
  • Address directories
  • Property directories

However, if data quality shouldn’t be a joke, this means using the best national data sources available as many of the world-wide data sources is this domain are far from providing the precision, accuracy and timeliness needed in the utility sector.

Location precision

Managing locations is a big thing in the utility sector. The post called Where is the Spot explains how identifying locations isn’t as simple as we may use to think in daily life.

This is indeed also true in the utility sector where the issue also includes managing many different locations for the same customer fulfilling different purposes at the same time.

The products

puzzleThe electricity supply part of the utility sector share a lot of issues with the telco sector when it comes to fixed installations and the products and services are in fact the same in some cases which also as a consequence means that  some organizations belongs to both sectors.

This is also a danger with vertical MDM solutions as there may be several best-of-breed options for a given organization, which eventually will result in choosing more than one platform and thereby introducing the silos which MDM in first place was supposed to eliminate.

Counting on LinkedIn

Let’s say LinkedIn opened a bank. Would you put money into the LinkedIn bank?

I don’t think I would if they used the same technology for accounting as they use for counting members in the LinkedIn groups.

The other day I made a happy tweet telling that the Social MDM LinkedIn group just got 400 members. And now today LinkedIn told me we are only 385 members. First thought: Jesus, 15 members left in a few days. Boring subject. Missing the hype before it even got inflated.

But when I went to the statistics page we were now 400 again:

Count1

Going back to the member list and refreshing it several times showed these results:

count2

And:

Count3

And:

Count4

Well, I guess we are around 400 members. And oh, there’s room for more. Join here.

Bookmark and Share