Technology – Page 21 – Liliendahl on Data Quality

Where the Streets have Two Names

18th July 2013Henrik Gabs Liliendahl12 Comments

As told in post The Art in Data Matching a common challenge in matching names and addresses is that in some parts of the world the streets have more than one name at the same time because more than one language is in use.

We have the same challenge when building functionality for rapid addressing, being functionality that facilitates fast and quality assured entry of addresses supported by reference data that knows about postal codes / cities and street names.

The below example is taken from the instant Data Quality tool address form:

The Finnish capital Helsinki also has an official name in Swedish being Helsingfors and the streets in Helsinki/Helsingfors have both Finnish and Swedish names. So when you start typing a letter suggestions could be in both Finnish and Swedish.

What challenges have you encountered with street names in multiple languages?

A Blast from the Past

1st July 2013Henrik Gabs Liliendahl2 Comments

Many days I work in a so called day office, which is an office booked for a single day at a location convenient for where I am and is going to do on that day.

My day office today comes with a Rolodex.

But I have trouble connecting it with Bluetooth 🙂

Fortunately means of keeping a contact list has improved over the years, not at least when it comes to connectivity:

The Personal Digital Assistant (PDA) usually could do Bluetooth or had other ways to connect to other devices and share data that way.
With the rise of Customer Relationship Management (CRM) systems your contact list was blended with the contact list of everyone else in your company.
Now with Social CRM (SCRM) your company’s contact list is (or will be) integrated with social networks.

Data Quality challenges and opportunities also have changed with the development in how to keep a contact list:

The Rolodex was totally dependent on you keeping the data up-to-date and it was your choice how it was indexed – by given name, surname or whatever.
The PDA data should be kept timely by you as well. When exchanging with other devices different ways of organizing data could be a pain somewhere.
With CRM systems updates from third party sources became relevant and you aren’t alone on making the updates – differently. Duplicates and data not fit for your purpose is a pain.
Now with SCRM your contacts themselves may make most of the updates. Now you have to figure out which ones to rely upon and how to link with your old recording. In other words: Social Master Data Management.

Well, perhaps I better have to forget about using the Rolodex and get on with today’s tweeting. Now, where is my pencil?

Rapid Addressing, Structured or Unstructured Approach

15th May 201315th May 2013Henrik Gabs Liliendahl4 Comments

Systems supporting faster and more accurate registration of addresses are becoming more and more common along with that they are becoming better and better.

I have noticed a structured and an unstructured approach to rapid addressing – and hybrids of course.

Structured Approach

The general concept is that you target in on the address like this:

First you choose a country from a country list (unless it’s always the same country).
Then you select a state or province if a state or province is a mandatory part of an address in that country like it is in the United States, Canada, Australia and India
Then you type a postal code if the country has a postal code system. It may be suggested as you write.
Then you type a street if the country has thoroughfare based addressing. It may be suggested as you write. For some countries, like the United Kingdom, or part of a country the street is unique by the postal code.
Then you type a building number. May be suggested if present in reference data.
Then you type a unit or other section of building where applicable. May be suggested if present in reference data.

Unstructured Approach

You type in the sequence in a single string as it suites you and the system figures out along the way what matches and makes suggestions.

This approach may better fit the way the address is known to you, but does on the other hand sometimes require you to start again and thereby the rapidness disappears a bit.

Hybrid Approach

A common hybrid solution as that you select the country before going unstructured. That cures the worst system glitches.

What’s Your Approach?

What are your experiences as a user? Maybe you are developing rapid addressing and have had your considerations. Where do you stand?

Big Data and Data Matching

2nd April 2013Henrik Gabs Liliendahl2 Comments

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality — The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way. Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Data Management in the Cloud

14th March 201314th March 2013Henrik Gabs Liliendahl2 Comments

We are seeing more and more data management services offered in the cloud.

As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
Data enrichment embracing a wide range of data attributes
Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

instant Single Customer View

13th March 201313th March 2013Henrik Gabs Liliendahl2 Comments

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
Searching internal master data with information already known inside the enterprise
Inserting really new entities or updating current entities by picking as much data as possible from external sources

Some essential capabilities in doing this are:

Searching is error tolerant so you will find entities even if the spelling is different
The receiving data model is real world aligned. This includes:
- Party information and location information have separate lives as explained in the post called A Place in Time
- You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Master Data Management in the Utility Sector

11th March 2013Henrik Gabs Liliendahl4 Comments

Making vertical MDM (Master Data Management) solutions, being MDM solutions prepared for a given industry, seems to become a trend in the MDM realm.

Traditionally many MDM solutions actually are strong in a given industry or a few related industries.

This is also true for the MDM solution I’m working with right now, as this solution has gained traction in the utility sector.

So, what’s special (and not entirely special) about the utility sector?

Here are three of my observations:

Exploiting big external reference data

As examined in the post instant Data Quality at Work the utility sector may gain much in using all the available external reference data available in the party master data domain, including:

Consumer/citizen directories
Business directories
Address directories
Property directories

However, if data quality shouldn’t be a joke, this means using the best national data sources available as many of the world-wide data sources is this domain are far from providing the precision, accuracy and timeliness needed in the utility sector.

Location precision

Managing locations is a big thing in the utility sector. The post called Where is the Spot explains how identifying locations isn’t as simple as we may use to think in daily life.

This is indeed also true in the utility sector where the issue also includes managing many different locations for the same customer fulfilling different purposes at the same time.

The products

The electricity supply part of the utility sector share a lot of issues with the telco sector when it comes to fixed installations and the products and services are in fact the same in some cases which also as a consequence means that some organizations belongs to both sectors.

This is also a danger with vertical MDM solutions as there may be several best-of-breed options for a given organization, which eventually will result in choosing more than one platform and thereby introducing the silos which MDM in first place was supposed to eliminate.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph