What’s so special about your party master data?

My last blog post was called Is Managing Master Data a Differentiating Capability? The post is an introduction to a conference session being a case story about managing master data at Philips.

During my years working with data quality and master data management it has always struck me how different organizations are managing the party master data domain while in fact the issues are almost the same everywhere.

business partnersFirst of all party master data are describing real world entities being the same to everyone. Everyone is gathering data about the same individuals and the same companies being on the same addresses and having the same digital identities. The real world also comes in hierarchies as households, company families and contacts belonging to companies which are the same to everyone. We may call that the external hierarchy.

Based on that everyone has some kind of demand for intended duplicates as a given individual or company may have several accounts for specific purposes and roles. We may call that the internal hierarchy.

A party master data solution will optimally reflect the internal hierarchy while most of the business processes around are supported by CRM-systems, ERP-systems and special solutions for each industry.

Fulfilling reflecting the external hierarchy will be the same to everyone and there is no need for anyone to reinvent the wheel here. There are already plenty of data models, data services and data sources out there.

Right now I’m working on a service called instant Data Quality that is capable of embracing and mashing up external reference data sources for addresses, properties, companies and individuals from all over the world.

The iDQ™ service already fits in at several places as told in the post instant Data Quality and Business Value. I bet it fits your party master data too.

Bookmark and Share

The Greenland Problem in MDM

In a recent comment here on this blog the relevance of Master Data Management (MDM) solutions was questioned because in real business life different business units sees master data very differently though the data describes the same real world entity. And it’s not the first time I hear this argument.

The Greenland ProblemThe issue is similar to the Greenland problem in geography. When using the most common projection for visualizing a round earth on a flat map, the Mercator projection, Greenland has a true shape but will look as being of same size as Africa, though Africa is over 10 times as large as Greenland.

As examined in the post Sharing data is key to a single version of the truth this is similar to the problems in fulfilling multiple uses embracing all business units in an enterprise:

  • If a map shows a limited part of the world the difference doesn’t matter that much. This is similar to fitting the purpose of use in a single business unit.
  • If the map shows the whole world we may have all kind of different projections offering different kind of views on the world having some advantages and disadvantages like when we do enterprise MDM.

Today we have new technology coming to the rescue. If you go into Google Earth the world indeed looks round and you may have any high altitude view of an apparently round world. If you go closer the map tends to be more and more flat.

Google EarthMy guess is that the solutions to fit the multiple uses conundrum within MDM also will be offered from the cloud by having innovative solutions reflecting the real world entities and relate those to a variety of business functions used in different business units offering a range of views that supports multiple purposes of use.

Bookmark and Share

Future Identities

Recently I stumbled upon a report called Future Identities in the UK. The purpose of the report is to provide the government in the UK insight into how identities of citizens will develop over the next 10 years. But the insight certainly also applies to how private companies will have to react to this development and certainly also not just in the UK.

The report talks about three different kinds of identities:

identies in the UK

Applied to data quality and master data management I think these future kinds of identities will have these consequences:

Biometric identities relates to hard core identity resolution as in fighting terrorism, crime investigation and physical access control but is sometimes even used in simple commercial checks as told in the post Real World Identity. My guess is that we will see biometrics used more as a mean to have better data quality, but not considerable more due to return of investment also as examined in the post Citizen ID and Biometrics.

Biographical identities and the related attributes resembles what we often also calls demographic attributes used in handling data for direct marketing and other purposes of data management. Direct marketing may, as reported in the post Psychographic Data Quality, be in transition to go deeper into big data in order to be psychographic marketing.

Social identities is the new black. As discussed on this blog, latest in the post Defining Social MDM, my guess is that social data master management is going to be big and has to be partly interwoven with using traditional biographical attributes and even, like it or not, biometric attributes. The art of doing that in a proper way is going to be very exciting.

Bookmark and Share

Making Data Quality Gangnam Style

The 21st December 2012 wasn’t the end of the world. But it was the day a music video for the first time passed one billion views on YouTube. It has been said that a reason for this success for Gangnam Style was that the Korean pop singer PSY hasn’t pursued any copyrights related to the video. But that doesn’t mean that PSY doesn’t earn money from the video. On the contrary related commercials are making money Gangnam Style.

A hindrance for better data quality by better real world alignment has traditionally been lack of free and open reference data.  Some issues has been availability and heavy price tags on government collected data.

In my current daily work I mostly use such data within the United Kingdom and Denmark. And here the authorities are taking different paths.

The prices on UK public reference data has traditionally been fairly high and there’s certainly room for innovation around open government data as reported on DataQualityPro in the post Introduction to the Open Data User Group UK.

In Denmark the 21st December 2012 was the day it was published that a unanimous parliament had agreed on the laws behind having Free and Open Public Sector Master Data. From the 1st January 2013 there are no price tags on reference data about addresses, properties, companies (and citizens) and there are plans for making those data even more available, consistent and timely.

Great news for data quality, Gangnam Style.

Data Quality Gangnam Style

Bookmark and Share

The New Year in Identity Resolution

identity resolutionYou may divide doing identity resolution into these categories:

  • Hard core identity check
  • Light weight real world alignment
  • Digital identity resolution

Hard Core Identity Check

Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.

Services for identity checks vary from country to country because of different regulations and different availability of reference data.

An identity check usually involves the entity who is being checked.

Light Weight Real World Alignment

In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.

Doing such kind of identity resolution usually doesn’t involve the entity being examined.

Digital Identity Resolution

Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.

There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.

Future Trends

With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.

My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.

That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.

Bookmark and Share

Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

Data that is not aligned with the real world usually provides bad information

The shortcomings of data being fit for some purpose of use compared to data that is aligned with the real world is a repeating topic on this blog latest in the post “Fitness for Use” is Dead.

Today I had a reminder of that when waiting for baggage at Copenhagen Airport.

There is an information screen telling when your baggage will start rolling in. What actually seems to happen is that a fixed time is assigned to every flight and then it starts counting down the minutes. Most baggage then starts rolling in (and this is showed on the screen) before zero minutes is reached. If it, as with my flight, happens that zero minutes is reached without delivery, the information screen shows that the baggage from this flight is delayed – but not how long.

So, the information provided is when you could expect your baggage probably according to some service level goal. OK, fit for that purpose. But in fact that doesn’t help you as a passenger a lot and doesn’t help at all when that goal isn’t reached.

End of rant.

Bookmark and Share

Where is the Spot?

One of things we often struggle with in data quality improvement and master data management is postal addresses. Postal addresses have different formats around the world, names of streets are spelled alternatively and postal codes may be wrong, too short or suffer from other flaws.

An alternative way of identifying a place is a geocode and sometimes we may think: Hurray, geocodes are much better in uniquely identifying a place.

Well, unfortunately not necessarily so.

First of all geocodes may be expressed in different systems. The most used ones are:

  • Latitude and longitude: Even though the globe is not completely round, this system for most purposes is good for aligning positions with the real world.
  • UTM: When the world is reflected on a paper or on a computer screen it becomes flat. UTM reflects the world on a flat surface very well aligned with the metric system making distance calculations straight forward.
  • WGS: This is the system in use in many GPS devices and also the one behind Google Maps.

Next, where is the address exactly placed?

I have met at least three different approaches:

  • It could be where the building actually is and then if the precision is deep and/or the building is big on different places around the building.
  • It could be where the ground meets a public road. This is actually most often the case, as route planning is a very common use case for geocodes. The spot is fit for the purpose of use so to say.
  • It could, as reported in the post  Some Times Big Brother is Confused, be any place on (and beside) the street as many reference data sources interpolates numbers equally along the street or in other ways gets it wrong by keeping it simple.

Bookmark and Share

Going in the Wrong Direction

When travelling with the London Underground I have several times noticed that the onboard passenger information system is set wrong, typically as if we are going in the opposite direction as what was announced on the station and where the train actually is heading.

People’s reactions

The reaction among the passengers to this data quality flaw varies. Most people who seem to be frequent commuters don’t seem to bother but keeps calm and carries on. Tourists on the other hand get confused and immediately try to appoint the culprit among them who apparently got them on the wrong train.

As the information system keeps on announcing the next station as the one we just left everyone not being new passengers keeps calm and carries on in the opposite direction of the data presented.

Big data quality issues

The problem with wrong journey settings in data collection within public transportation has actually been a challenge I have worked with a lot.

Besides confusing the passengers if presented on the onboard passenger information display and voicing, the data collection may also be corrupted leading to data quality issues when data is stored in a data warehouse or by other techniques in order to facilitate analysis of passenger travel patterns, how well the services applies to schedules and other reporting based on these big numbers of transaction data collected every day.

Aligning with master data

The challenge is to correctly join the transaction data with the right master data entities. A vehicle stop, and in some cases the passenger boarding and alighting, must be associated with the right product being a given journey on a given service according to a given time schedule.

Many other exploitations of big data shares the same basic data quality challenge. If we don’t get the transaction data joined correctly with the master data entities involved, any analysis and reporting may be going in the wrong direction.

Bookmark and Share

”Fitness for Use” is Dead

The definition of data quality as being ”fitness for use” is challenged. “Real world alignment” or similar expressions are gaining traction.

Back in May Malcolm Chisholm made a tweet about the shortcomings of the “fitness for use” definition reported here on the blog in the post The Problem with Multiple Purposes of Use.

Last week the tweet was elaborated on the Information Management article called Data Quality is Not Fitness for Use. Today Jim Harris has a follow post called Data and its Relationships with Quality.

When working with data quality in the domain with far the most data quality issues being the quality of contact data (customer, supplier, employee and other party master data) I have many times experienced that making data fit for more than a single purpose of use almost always is about better real world alignment. Having data that actually represents what it purports to represent always helps with making data fit for use, even with more than one purpose of use.

In practice that in the contact data realm for example means:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data for multiple purposes.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources making data fit for many use cases.
  • Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

Also, making data real world aligned from the start is a big help when maintaining data as the real world will change over time.

Data quality tools will in my eyes also have to apply to this trend as discussed with Gartner in the post Quality of Data behind the Data Quality Magic Quadrant.

Bookmark and Share