Small Business Owners

1st February 2012

A challenge I encounter over and over again within Data Matching and customer Master Data Management is what to do with small business owners.

Examples of small business owners are:

  • Farmers
  • Healthcare professionals with an own clinic
  • Small family driven shop owners
  • Modest membership organisation administrators
  • Local hospitality providers as Basil Fawlty of Fawlty Towers
  • Independent Data Quality consultants as myself

When handling customer master data we often like to divide those into Business-to-consumer (B2C) or Business-to-business (B2B). We may have different source systems, different data models and different data owners and data stewards for each of the two divisions.

But small business owners usually belong to both divisions. In some transactions they act as private persons (B2C) and in some other transactions they act as a business contact (B2B). If you like to know your customer, have a single customer view , engage in social media and all that jazz, you must have a unique view of the person, the business and the household.

In several industries small business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

So here are plenty of business cases for multi-domain Master Data Management embracing customer master data and product master data.

The capability to handle a single customer view of small business owners is in my experience very poorly fulfilled in Data Quality and Master Data Management solutions around. Here is certainly room for improvement and entrepreneurship.

Bookmark and Share


The Present Birthday

28th September 2011

Today (or maybe yesterday) Steve Jones of Capgemeni wrote a blog post called Same name, same birth date – how likely is it? The post examines the likelihood of that two records with the same name and birthday is representing same real world individual. The chance that a match is a false positive is of course mainly depending on the frequency of the name.

Another angle in this context I have observed over and over again is the chance of a false negative if the name and other data are the same, but the birthday is different. In this case you may miss matching two records that are actually reflecting the same real world individual.

One should think that a datum like a birthday usually should be pretty accurate. My practical experience is that it in many cases isn’t.

Some examples:

Running against the time

Every fourth year when we have Olympic Games there is always controversies about if a tiny female athlete really is as old as said.

I have noticed the same phenomenon when I had the chance to match data about contesters from several years of subscription data at a large city marathon in order to identify “returning customers”.

I’m always looking for false positives in data matching and was really surprised when I found several examples of same name and contact data but a birthday been raised one year for each appearance at the marathon.

That’s not my birthday, this is my birthday

Swedish driving license numbers includes the birthday of the holder as the driving license number is the same as the all-purpose national ID that starts with the birthday.

In a database with both a birthday field and a driving license number field there where heaps of records with mismatch between those two fields.

This wasn’t usually discovered because this rule only applies to Swedish driving license numbers and the database also had registrations for a lot of other nationalities.  

When investigating the root cause of this there were as usual not a single explanation and the problem could be both that the birthday belonged to someone else and the driving license belonged to someone else.

Using both fields cut down the number of false negatives here.

Today’s date format is?

In the United States and a few other countries it’s custom to use the month-day-year format when typing a date. In most other places we have the correct sequence of either day-month-year or year-month-day.  Once I matched data concerning foreign seamen working on ships in the Danish merchant fleet. When tuning the match process I found great numbers of good matches when twisting the date formats for birthdays, as the same seaman was registered on different ships with different captains and at different ports around the world.

When adding the fact that many birthdays was typed as 1st January of the known year of birth or 1st day in the known month of birth a lot of false positives was saved.

The question about occupation in the merchant fleet was actually a political hot potato at that time and until then the parliament had discussed the matter based on wrong statistics.

PS

I have used birthday synonymously with “date of birth” which of course is a (meta) data quality problem.

Bookmark and Share


The 20 Million Rupees Question

11th August 2011

Here we go again. The same old question: “What is the definition of customer?”  Latest Informatica (a data quality, master data management and data integration firm) has hired David Loshin to find out – started in the blog post The Most Dangerous Question to Ask Data Professionals.

Shortly, my take is that this question in practice has two major implications for data quality and master data management but in theory, it should only have one:

  • The first one is real world alignment. In theory real world alignment is independent of the definition of a customer as it is about the party behind the customer.
  • The second is party roles. It’s actually here we can have an endless discussion.

In practice we of course mix things up as discussed in the post Entity Revolution vs Entity Evolution.

And Now for Something Completely Different

Instead of saying that “What is the definition of customer?”  is the million dollar question it’s probably more like the 20 million rupees question as most data management these days are taking place in India.

The amount of money involved is taken from the film Slumdog Millionaire where 20 million rupees is the top prize in the local “Who Wants to Be a Millionaire?” (Kaun Banega Crorepati), which by the way has the same jingle and graphics as all over the world.

And oh, how much is 20 million rupees? It’s near ½ million US dollars or 300.000 euro (with a dot as thousand separator). But a lot in buying power for a local customer. Exactly 2 crores (2,00,00,000 rupees).  

Party on.

Bookmark and Share


Psychographic Data Quality

5th July 2011

I have just read an article on Mashable by Jamie Beckland called The End of Demographics: How Marketers Are Going Deeper With Personal Data.

The article explains how new sources of available data makes it possible for marketers to get a much closer look at potential customers and thereby going from delivering a broad message to a huge crowd to delivering a very targeted message to a small group of people with a high probability of getting a response.  In short: Marketers are going from demographic marketing to psychographic marketing.

I believe this is true and ongoing (as I have also been involved in such activities).

The data quality issues we have always known in direct marketing is surely very similar in the psychographic marketing which is going on in the social media realm and in connection with eBusiness.

In my eyes, the concept of a single customer view is also a key to getting success in psychographic marketing.  

You are not delivering a targeted message if you are delivering two different messages to two user profiles belonging to the same real world individual.

Your message will be very frustrating if you treat someone as a prospect customer if that someone already is an existing customer perhaps in another channel.

The effectiveness of psychographic marketing depends on a match between the psychographic variables, the behavioral variables and the demographic variables. As seen in the example in the Mashable article a good old thing as geocoding will be needed here.

An exciting thing in the rise of psychographic marketing is that it will add to the trend in data quality technology where it’s much more than simple name and address cleansing and deduplication.  Rich location data will despite the virtual playground be further important. The relations between customers and products as described in the post Customer Product Matrix Management will be further refined in psychographic marketing.       

Bookmark and Share


Single Company View

27th April 2011

Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.

One Number Utopia

Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.

But in most other countries there are several different public registries for companies resulting in multiple numbering systems.

Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.

The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.  

2-Tier Reality

The common structure of a company is that you have a legal entity occupying one or several addresses.

The French company numbering system is a good example of how this is modeled. You have two numbers:

  • SIREN is a 9-digit number for each legal entity (on the head quarter address).
  • SIRET is a 14-digit (9 + 5) number for each business location.

This model is good for companies with several locations but strange for single location companies.

Treacherous Family Trees (and Restaurants)

The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.

Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.

Bookmark and Share


What is Identity Resolution?

8th March 2011

We are continuously struggling with defining what it is we are doing like defining: What is data quality? What is Master Data? Lately I’ve been involved in discussions around: What is Identity Resolution? A current discussion on this topic is rolling in the Data Matching LinkedIn group.

This discussion has roots in one of my blog posts called Entity Revolution vs Entity Evolution. Jeffrey Huth of IBM Initiate followed up with the post Entity Resolution & MDM: Interchangeable? In January Phillip Howard of Bloor made a post called There’s identity resolution and then there’s identity resolution (followed up by a correction post the other day called My bad).

It is a “same same but different” discussion. Traditional data matching (or record linkage) as seen in a data quality tool and master data management solution is the bright view: Being about finding duplicates and making a “single business partner view” (or “single party view” or “single customer view”). Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.

The Gartner Hype Cycle describes the dark view as ”Entity Resolution and Analysis”. This discipline is approaching the expectation peak and will, according to Gartner, be absorbed by other disciplines as no one can tell the difference I guess.  

Certainly there are poles. In an article from 2006 called Identity Resolution and Data Integration David Loshin said: There is a big difference between trying to determine if the same person is being mailed two catalogs instead of one and determining if the individual boarding the plane is on the terrorist list.

But there is also a grey zone.

From a business perspective for example the prevention of misuse of a restricted campaign offer is a bit of both sides. Here you want to avoid that an existing customer is using an offer only meant for new customers. How does that apply to members of the same household or the same company family tree? Or you want to avoid someone using an introduction offer twice by typing her name and address a bit different.

From a technical perspective I have an example from working with a newspaper in a big fraud scam described in the post Big Time ROI in Identity Resolution. Here I had no trouble using a traditional deduplication tool in discovering non-obvious relationships. Also the relationships discovered in traditional data matching ends up quite nicely in hierarchy management as part of master data management as described in the post Fuzzy Hierarchy Management.

And then there is the use of the words identity (resolution) versus entity (resolution).

My feeling is that we could use identity resolution for describing all kind of matching and linking with party master data and entity resolution could be used for describing all kind of matching and linking with all master data entity types as seen in multi-domain master data management. But that’s just my words.

Bookmark and Share


Fuzzy Hierarchy Management

2nd March 2011

When evaluating results from automated data matching your goal is typically to find false positives and false negatives being entities that are matched, but shouldn’t be (false positives) and entities that are not matched, but should have been (false negatives).

However the fuzziness often used in the data matching process also apply to the evaluation of the results as many dubious results isn’t a question about if the matched database rows are reflecting the same real world entity but more a question about if the matched (or not matched) database rows are reflecting different members of a real world hierarchy.

Example 1:

John Smith on 1 Main Street in Anytown
Mary & John Smith on 1 Main Str in Anytown

Example 2:

Anytown Municipality, Technical Dept
Municipality of Anytown

Example 3:

Acme Corporation, Anytown
Acme Corporation, Anywhere

All three examples above may be considered a false positive if matched and a false negative if not matched.

You may say that it depends on the purpose of use, which is true.

But if we are talking master data management we may probably encompass multiple requirements where we simultaneously need the match and don’t want the match, which is why we need to be able to resolve and store the results from fuzzy data matching into hierarchies.

Bookmark and Share


Citizen ID and Biometrics

9th January 2011

As I have stated earlier on this blog: The solution to the single most frequent data quality problem being party master data duplicates is actually very simple: Every person (and every legal entity) gets a unique identifier which is used everywhere by everyone.

Some countries, like Denmark where I live, has a unique Citizen ID (National identification number). Some countries are on the way like India with the Aadhaar project. But some of the countries with the largest economies in the world like United Kingdom, Germany and United States don’t seem to getting it in the near future.

I think United Kingdom was close lately, but as I understand it the project was cancelled. As seen in a tweet from a discussion on twitter today the main obstacles were privacy considerations and costs:

A considerable cost in the suggested project in United Kingdom, and also as I have seen in discussions for a US project, may be that an implementation today should also include biometric technology.

The question is however if that is necessary.  

If we look at the systems in force today for example in Scandinavia they were implemented +40 years ago, and the Swedish citizen ID was actually implemented without digitalization in 1947. There are discussions going on about biometrics also as this is inevitable for issuing passports anyway. In the mean time the systems however continues to make a lot of data quality prevention and party master data management a lot easier than else around the world without having biometrics as a component.

No doubt about that biometrics will solve some problems related to fraud and so. But these are rare exceptions. So the cost/benefit analysis for enhancing an existing system with biometrics seems to be negative.     

I guess the alleged need for biometric may have something to do with privacy considerations in a strange way: Privacy considerations are often overruled by the requirements for fighting terrorism – and here you need biometrics in identity resolution.    

Bookmark and Share


Storing a Single Version of the Truth

30th November 2010

An ever recurring subject in the data quality and master data management (MDM) realms is whether we can establish a single version of the truth.

The most prominent example is whether an enterprise can implement and maintain a single version of the truth about business partners being customers, prospects, suppliers and so on.

In the quest for establishing that (fully reachable or not) single version of the truth we use identity resolution techniques as data matching and we are exploiting ever increasing sources of external reference data.

However I am often met with the challenge that despite what is possible in aiming for that (fully reachable or not) single version of the truth, I am often limited by the practical possibilities for storing it.

In storing party master data (and other kind of data) we may consider these three different ways:

Flat files

This “Keep It Simple, Stupid” way of storing data has been on an ongoing retreat – however still common, as well as new inventions of big flat file structures of data are emerging.

Also many external sources of reference data is still flat file like and the overwhelming choice of exchanging reference and master data is doing it by flat files.

Despite lots of work around solutions for storing the complex links of the real world in flat files we basically ends up with using very simplified representations of the real world (and the truth derived) in those flat files.  

Relational databases

Most Customer Relationship Management (CRM) systems are based on a relational data model, however mostly quite basic regarding master data structures making it not straight forward to reflect the most common hierarchical structures of the real world as company family trees, contacts working for several accounts and individuals forming a household.  

Master Data Management hubs are of course built for storing exactly these hierarchical kinds of structures. Common challenges here are that there often is no point in doing that as long as the surrounding applications can’t follow and that you often may restrict your use to a simplified model anyway like an industry model.   

Neural networks

The relations between parties in the real world are in fact not truly hierarchical. That is why we look into the inspiration from the network of biological neurons.

Doing that has been an option I have heard about for many years but still waits to meet as a concrete choice when delivering a single version of the truth.   

Bookmark and Share


Entity Revolution vs Entity Evolution

18th November 2010

Entity resolution is the discipline of uniquely identifying your master data records, typically being those holding data about customers, products and locations. Entity resolution is closely related to the concept of a single version of the truth.

Questions to be asked during entity resolution are like these ones:

  • Is a given customer master data record representing a real world person or organization?
  • Is a person acting as a private customer and a small business owner going to be seen as the same?
  • Is a product coming from supplier A going to identified as the same as the same product coming from supplier B?
  • Is the geocode for the center of a parcel the same place as the geocode of where the parcel is bordering a public road?

We may come a long way in automating entity resolution by using advanced data matching and exploiting rich sources of external reference data and we may be able to handle the complex structures of the real world by using sophisticated hierarchy management and hereby make an entity revolution in our databases.

But I am often faced with the fact that most organizations don’t want an entity revolution. There are always plenty of good reasons why different frequent business processes don’t require full entity resolution and will only be complicated by having it (unless drastic reengineered). The tangible immediate negative business impact of an entity revolution trumps the softer positive improvement in business insight from such a revolution.

Therefore we are mostly making entity evolutions balancing the current business requirements with the distant ideal of a single version of the truth.          

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 109 other followers