March 2012 – Liliendahl on Data Quality

Eating the MDM Elephant

27th March 2012Henrik Gabs LiliendahlLeave a comment

The idiom of eating the elephant one bite at time is often used when trying to vision a roadmap for Master Data Management (MDM).

It’s a bit of a contradiction to look at it that way, because the essence of MDM is an enterprise wide single source of truth eventually for all master data domains.

But it may be the only way.

Using a cliché MDM is (as any discipline) about people, processes and technology.

In an earlier post called Lean MDM a data quality and entity resolution technology focused approach to start consuming the elephant was described, here starting with building universal data models for party master data and rationalizing the data within a short frame of time.

I have often encountered that many organizations actually don’t want an entity revolution but are more comfortable with having entity evolution when it comes to entity resolution as examined the post Entity Revolution vs Entity Evolution.

The term “Evolutionary MDM” is used by the MDM vendor Semarchy as seen on this page here called What is Evolutionary MDM?

The idea is to have technology that supports an evolutionary way of implementing MDM. This is in my eyes very important, as people, processes and technology may be prioritized in the said order, but shouldn’t be handled in a serial matter that reveals the opportunities and restrictions related to technology at a very late stage in implementing MDM.

Costs of a Single Citizen View

24th March 201224th March 2012Henrik Gabs Liliendahl7 Comments

Recently Andrew Dean made a blog post called National Identity Numbers. The post generated some comments in the Data Matching group on LinkedIn.

Andrew’s post is based on the ongoing project in India called Aadhaar, where every citizen is assigned a unique identification number to be used for multiple purposes when interacting with the government and financial institutions.

As Andrew mentions the United Kingdom cancelled such a project a few years ago. This cancellation was, in some part, due to fear of excessive costs. The question Andrew, and comments in the LinkedIn group, poses, is if the (feared) costs will justify the benefits of getting a “single citizen view”.

Indeed large governmental projects have a bad name these days all over the world as I know it.

Back in the late 60’s the United States was able to put a man on the moon.

It was at the same time that the Scandinavian countries implemented their “single citizen view”.

Besides digitalizing the national identification number Sweden also, in 1967, managed to change from driving on the left side of the road to driving on the right side. I’m not sure if Sweden could afford turning to the right side today not to say the United Kingdom doing the same.

Big Reference Data Musings

23rd March 201229th May 2012Henrik Gabs Liliendahl4 Comments

The term “big data” is huge these days. As Steve Sarsfield suggest in a blog post yesterday called Big Data Hype is an Opportunity for Data Management Pros, well, let’s ride on the wave (or is it tsunami?).

The definition of “big data” is as with many buzzwords not crystal clear as examined in a post called It’s time for a new definition of big data on Mike2.0 by Robert Hillard. The post suggests that big may be about volume, but is actually more about big complexity.

As I have worked intensively with large amounts of rich reference data, I have a homemade term called “big reference data”.

Big Reference Data Sets

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organization. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organizations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

An example of a big reference data set is the Dun & Bradstreet WorldBase. This reference data set holds around 300 different attributes describing over 200 million business entities from all over world.

This data set is at first glance well structured with a single (flat) data model for all countries. However, when you work with it you learn that the actual data is very different depending on the different original sources for each country. For example addresses from some countries are standardized, while this isn’t the case for other countries. Completeness and other data quality dimensions vary a lot too.

Another example of a large reference data set is the United Kingdom electoral roll that is mentioned in the post Inaccurately Accurate. As told in the post there are fit for purpose data quality issues. The data set is pretty big, not at least if you span several years, as there is a distinct roll for every year.

Big Reference Data Mashup

Complexity, and opportunity, also arises when you relate several big reference data sets.

Lately DataQualityPro had an interview called What is AddressBase® and how will it improve address data quality? Here Paul Malyon of Experian QAS explains about a new combined address reference source for the United Kingdom.

Now, let’s mash up the AddressBase, the WorldBase and the Electoral Rolls – and all the likes.

Image called Castle in the Sky found on photobotos.

Real World Identity

20th March 201229th May 2012Henrik Gabs Liliendahl2 Comments

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Data Quality at Terminal Velocity

18th March 201219th March 2012Henrik Gabs Liliendahl2 Comments

Recently the investment bank Saxo Bank made a marketing gimmick with a video showing a BASE jumper trading foreign currency with the banks mobile app at terminal velocity (e.g. the maximum speed when free falling).

Today business decisions have to be taken faster and faster in the quest for staying ahead of competition.

When making business decisions you rely on data quality.

Traditionally data quality improvement has been made by downstream cleansing, meaning that data has been corrected long time after data capture. There may be some good reasons for that as explained in the post Top 5 Reasons for Downstream Cleansing.

But most data quality practitioners will say that data quality prevention upstream, at data capture, is better.

I agree; it is better. Also, it is faster. And it supports faster decision making.

The most prominent domain for data quality improvement has always been data quality related to customer and other party master data. Also in this quest we need instant data quality as explained in the post Reference Data at Work in the Cloud.

Know Your Foreign Customer

13th March 201229th May 2012Henrik Gabs Liliendahl5 Comments

I’m not saying that Customer Master Data Management is easy. But if we compare the capabilities within most companies with handling domestic customer records they are often stellar compared to the capabilities of handling foreign customer records.

It’s not that the knowledge, services and tools doesn’t exist. If you for example are headquartered in the USA, you will typically use best practice and services available there for domestic records. If you are headquartered in France, you will use best practice and services available there for domestic records. Using the best practices and services for foreign (seen from where you are) records is more seldom and if done, it is often done outside enterprise wide data management.

This situation can’t, and will not, continue to exist. With globalization running at full speed and more and more enterprise wide data management programs being launched, we will need best practices and services embracing worldwide customer records.

Also new regulatory compliance will add to this trend. Being effective next year the US Foreign Account Tax Compliance Act (FATCA) will urge both US Companies and Foreign Financial Institutions to better know your foreign customers and other business partners.

In doing that, you have to know about addresses, business directories and consumer/citizen hubs for an often large range of countries as described in the post The Big ABC of Reference Data.

It may seem a daunting task for each enterprise to be able to embrace big reference data for all the countries where you have customers and other business partners.

My guess, well, actually plan, is, that there will be services, based in the cloud, helping with that as indicated in the post Partnerships for the Cloud.

Your Point, My Comma

7th March 20127th March 2012Henrik Gabs Liliendahl9 Comments

Spam mails can be great food for thought.

This morning I had this one in one of my many mailboxes:

So, the amount in question was:

It’s interesting to see how the spammer used points and commas in the large amount of money he wanted to trick me with. Don’t know if he was sloppy or had the problem of showing an amount to a not segmented audience of the world that are:

Using point as decimal mark and comma as thousand separator
Using comma as decimal mark and point as thousand separator

The use of a sign for decimal mark and thousand separators is indeed divided across the globe as seen on this map:

The blue countries are using point as decimal mark and comma as thousand separator and the green countries are doing the opposite.

Then there may be diversities within a country as in Canada there are always questions about Quebec, where they are following the French custom. India also has its own numerals with 100 groupings besides the English heritage.

The pattern of a approximately one half world using one standard and approximately another half of the world using an opposite standard is seen in other notations as arranging person names, writing street addresses as well as place names and postal codes as told in the post Having the Right Element to the Left.

Broken Links

3rd March 20123rd March 2012Henrik Gabs LiliendahlLeave a comment

When passing the results of data cleansing activities back to source systems I have often encountered what one might call broken links, which have called for designing data flows that doesn’t go by book, doesn’t match the first picture of the real world and eventually prompts last minute alternate ways of doing things.

I have had the same experience when passing some real (and not real) world bridges lately.

The Trembling Lady: An Unsound Bridge

When walking around in London a sign on the Albert Bridge caught my eye. The sign instructs troops to break steps when marching over.

In researching the Albert Bridge on Wikipedia I learned that the bridge has an unsound construction that makes it vibrate not at least when a bunch of troops marches across in rhythm. The bridge has therefore got the nickname “The Trembling Lady”.

It’s an old sign. The bridge is an old bridge. But it’s still standing.

The same way we often have to deal with old systems running on unstable databases with unsound data models. That’s life. Though it’s not the way we want to see it, we most break the rhythm of else perfectly cleansed data as discussed in the post Storing a Single Version of the Truth.

The Øresund Bridge: The Sound Link

The sound between the city of Malmö in Sweden and København (Copenhagen) in Denmark can be crossed by the Øresund Bridge. If looking at a satellite picture you may conclude that the bridge isn’t finished. That’s because a part of the link is in fact an undersea tunnel as told in the post Geocoding from 100 Feet Under.

Your first image about what can be done and what can’t be done isn’t always the way of the world. Dig into some more sources, find some more charts and you may find a way.

However, life isn’t always easy. Sometimes charts and maps can be deceiving.

Wodna: The Sound of Silence.

As reported in the post Troubled Bridge over Water I planned a cycling trip last summer. The route would take us across the Polish river Świna by a bridge I found on Google Maps.

When, after a hard day’s ride in the saddle, we reached the river, the bridge wasn’t there. We had to take a ferry across the river instead.

I maybe should have known. The bridge on the map was named Wodna. That is Polish for (something with) water.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Month: March 2012