Data Matching – Page 10 – Liliendahl on Data Quality

Data Management in the Cloud

14th March 201314th March 2013Henrik Gabs Liliendahl2 Comments

We are seeing more and more data management services offered in the cloud.

As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
Data enrichment embracing a wide range of data attributes
Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

instant Single Customer View

13th March 201313th March 2013Henrik Gabs Liliendahl2 Comments

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
Searching internal master data with information already known inside the enterprise
Inserting really new entities or updating current entities by picking as much data as possible from external sources

Some essential capabilities in doing this are:

Searching is error tolerant so you will find entities even if the spelling is different
The receiving data model is real world aligned. This includes:
- Party information and location information have separate lives as explained in the post called A Place in Time
- You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Data Quality Vendors Beware of SEO Agencies

1st March 2013Henrik Gabs Liliendahl7 Comments

As reported in the post Fighting Identity Fraud with Identity Fraud and experienced with the post 255 Reasons for Data Quality Diversity I have seen several sloppy attempts of link building from SEO agencies working for data quality tool vendors.

The other day it happened again, this time on LinkedIn.

There was a comment in the Master Data Management Interest group:

The comment is now deleted by the author and I do understand why.

I guess a SEO guy was working for Simon at DataLadder and Nathan from somewhere else at the same time and given access to their LinkedIn accounts. However he/she posted a comment to be meant being from Simon logged in as Nathan (who is not working with MDM and data quality).

So, data quality tool and service vendors: You can’t fight identity fraud with identity fraud and you can’t advocate for a single view of customer with a messy view of you as a vendor. Be authentic.

How to Avoid True Positives in Data Matching

26th February 2013Henrik Gabs Liliendahl2 Comments

Now, this blog post title might sound silly, as we generally consider true positives to be the cream of data matching as it means that we have found a match between two data records that reflects the same real world entity and it has been confirmed, that this is true and based on that we can eliminate a harmful and costly duplicate in our records.

Why this isn’t still an optimal situation is that the duplicate shouldn’t have entered our data store in the first place. Avoiding duplicates up front is by far the best option.

So, how do you do that?

You may aim for low latency duplicate prevention by catching the duplicates in (near) real-time by having duplicate checks after records have been captured but before they are committed in whatever is the data store for the entities in question. But still, this is actually also about finding true positives and at the same time to be aware of false positives.

The best way is to aim for instant data quality. That is, instead of entering data for the (supposed) new records, you are able to pick the data from data stores already available presumably in the cloud through an error tolerant search that covers external data as well as data records already in the internal data store.

This is exactly such a solution I’m working with right now. And oh yes, it is exactly called instant Data Quality.

Beware of False Positives in Data Matching

23rd February 201324th February 2013Henrik Gabs Liliendahl3 Comments

In a recent blog post by Kristen Gregerson of Satori Software you may learn A Terrible Tale where the identity of two different real world individuals were merged into one golden record with the most horrible result you may imagine associated with a recent special day related to the results of the other kind of matching going around.

As reported by Jim Harris some years ago in the post The Very True Fear of False Positives the bad things happening from false positives in data matching is indeed a hindrance for doing data matching

If we do data matching we should be aware that false positives will happen and we should know the probability of that it happens and we should know how to avoid the resulting heartache.

Indeed using a data matching tool is better than relying on simple database indexes and indeed there are differences in how good various data matching tools are at doing the job, not at least doing it under different circumstances as told in the post What is a best-in-class match engine?

Curious about how data matching tools work (differently)? There is an eLearning course available co-authored by yours truly. The course is called Data Parsing, Matching and De-duplication.

Data Quality Does Matter!

19th February 2013Henrik Gabs LiliendahlLeave a comment

The title of this blog post is the title of a seminar about data quality and data matching taking place in Copenhagen:

The seminar is hosted by Affecto, a data management consultancy firm with strong presence in the Nordic and the Baltic countries, and Informatica, a leading data management tool vendors word-wide.

There will be three sessions on the seminar:

First you will learn about steps for working with a data quality platform to improve BI and master data management solutions.
Then you will see a walkthrough of the architecture and capabilities of the Informatica Data Quality platform.
And finally you shouldn’t miss the session with yours truly on data matching based on a Informatica Perspectives blog post called Five Future Data Matching Trends.

Hope to see you in Copenhagen, København, Köpenhamn, Kopenhagen, Copenhague, Copenaghen, Hafnia or whatever name you use for that place as told in the post about data matching and Diversity in City Names.

Tomorrow’s Data Quality Tool

29th January 2013Henrik Gabs LiliendahlLeave a comment

In a blog post called JUDGEMENT DAY FOR DATA QUALITY published yesterday Forrester analyst Michele Goetz writes about the future of data quality tools.

Michele says:

“Data quality tools need to expand and support data management beyond the data warehouse, ETL, and point of capture cleansing.”

and continues:

“The real test will be how data quality tools can do what they do best regardless of the data management landscape.”

As described in the post Data Quality Tools Revealed there are two things data quality tools do better than other tools:

Data profiling and
Data matching

Some of these new challenges I have worked with within designing tomorrow’s data quality tools are:

Point of capture profiling
Searching using data matching techniques
Embracing social networks

Point of capture profiling:

The sweet thing about profiling your data while you are entering your data is that analysis and cleansing becomes part of the on-boarding business process. The emphasis moves from correction to assistance as explained in the post Avoiding Contact Data Entry Flaws. Exploiting big external reference data sources within point of capture is a core element in getting it right before judgment day.

Searching using data matching techniques:

Error tolerant searching is often the forgotten capability when core features of Master Data Management solutions and data quality tools are outlined. Applying error tolerant search to big reference data sources is, as examined in the post The Big Search Opportunity, a necessity to getting it right before judgment day.

Embracing social networks:

The growth of social networks during the recent years has been almost unbelievable. Traditionally data matching has been about comparing names and addresses. As told in the post Addressing Digital Identity it will be a must to be able to link the new systems of engagement with the old systems of record in order to getting it right before judgment day.

How have you prepared for judgment day?

While we are waiting for the LEI

15th January 201315th January 2013Henrik Gabs Liliendahl4 Comments

As told in the post Business Entity Identifiers there has been a new global numbering system for business entities on the way for some time. The wonder is called LEI (Legal Entity Identifier).

The implementation work has been adapted by the Financial Stability Board. The latest developments are reported in a publication called Fifth progress note on the Global LEI Initiative.

Surely, while the implementations may be in good hands, the set up doesn’t give hope for a speedy process where every legal entity in the world in a short time will have a LEI.

And then the next question will be how long it will take before organizations will have enriched existing databases with that LEI and implemented on-boarding processes where a LEI is captured with every new insertion of party master data describing a legal entity.

A good way to start to be prepared will be to implement features in on-boarding business processes where available external reference data are captured when new party entities are added to your databases. Having best available information about names, addresses and business entity identifiers available today and a culture of capturing such information will be a great starting point.

And oh, the instant Data Quality concept is precisely all about doing that.

Postal Code Musings

10th January 201311th January 2013Henrik Gabs Liliendahl5 Comments

When working with master data management and data quality including data matching one of the most frequent pieces of information you work with is a postal code.

Wikipedia has a good article about postal code.

Some of the data quality issues related to the datum postal code are:

Metadata

Over the world different words are used for a postal code:

ZIP code, the United States implementation of a postal code, is often used synonymously for a postal code in many databases and user interfaces. This is not seriously wrong, but not right either.
In India a postal code (in English) is called a PIN Code (Postal Index Number). This could definitely trick me.

Format

There are basically two different formats of postal codes around:

Numeric postal codes are the most common ones. The number of digits does however differ between countries. And there may be some additional considerations:
- For example the 9 digit United States ZIP code is split into the original 5 digits and the additional 4 digits implemented later.
- Postal codes may begin with 0 which may create formatting errors when treated as numeric.
Some countries, for example the United Kingdom, the Netherlands, Canada and Argentina, have alphanumeric postal codes.

Embedded Information

Numeric postal codes usually forms some kind of hierarchy in which you can guess the geographical position within the country and make ranges representing smaller or larger geographical areas. But you never know.

This also goes for Dutch (you know, the ones in the Netherlands) postal codes as the first 4 characters are numeric.

The UK postal codes usually start with a mnemonic of the main city in the area, except in a lot of cases.

Precision

Some postal code systems have postal codes covering larger areas with many streets and some postal code systems are very granular where each street, or part of a street, has a distinct postal code.

The UK postal code system is very granular which have paved the way for using rapid addressing as told in a recent article on the UK Database Marketing Magazine.

Coverage

Utilizing rapid addressing requires that reference data for postal codes practically covers every spot in the country and updates are available on a near real time basis.

Some countries have postal code systems not covering every corner and some countries haven’t a postal code system at all.

Uniqueness

The main reason for implementing postal code systems is that a town or city name in many cases isn’t unique within a country.

But that doesn’t mean that uniqueness works the other way as well. A postal code may in many countries cover several town names. France is an example.

Consistency

While we basically have granular and not so granular postal code systems we of course also have hybrids.

In Denmark for example there is a granular system in the capital Copenhagen with a postal code for each street, named by the street, and a system in the rest of country with a postal code for an area named by the suburban or town.

Fit for purpose

A postal code is a hierarchical element in a postal address. We basically have two forms of postal addresses:

A geographical address where the postal address including the postal code points to place you also can visit and meet the people receiving the things sent to there
A post-office box which may have more or less geographical connection to where the people receiving the things sent to there are

Penetration of post-office boxes differs around the world. In Namibia it is mandatory. In Sweden most companies have a post-office box address.

Trying to compare data with these different concepts is like comparing apples and oranges, which often goes bananas.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph