Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

Free and Open Public Sector Master Data

Yesterday the Danish Ministry of Finance announced an agreement between local authorities and the central government to improve and link public registers of basic data and to make data available to the private sector.

Once the public authorities have tidied up, merged the data and put a stop to parallel registration, annual savings in public administration could amount to 35 million EUR in 2020.

Basic open data includes private addresses, companies’ business registration numbers, cadastral numbers of real properties and more. These master data are used for multiple purposes by public sector bodies.

Private companies and other organizations can look forward to large savings when they no longer have to buy their basic data from the public authorities.

In my eyes this is a very clever move by the authorities exactly because of the two main opportunities mentioned:

  • The public sector will see savings and related synergies from a centralized master data management approach
  • The private sector will gain a competitive advantage from better and affordable reference data accessibility and thereby achieve better master data quality.

Denmark have, along with the other Nordic countries, always had a more mature public sector master data approach than we see in most other countries around the world.

I remember I worked with the committee that prepared a single registry for companies in Denmark back in the 80’s as mentioned in the post Single Company View.

Today I work with a solution called iDQ (instant Data Quality) which is about mashing up internal master data and a range of external reference data from social networks and not at least public sector sources. In that realm there is certainly not something rotten in Denmark. Rather there is a good answer to the question about to be free and open or not to be.

Bookmark and Share

Killing Keystrokes

Keystrokes are evil. Every keystroke represents a potential root cause of poor data quality by spelling things wrongly, putting the right thing in the wrong place, putting the wrong thing in the right place and so on. Besides that every keystroke is a cost of work summing up with all the other keystrokes to gigantic amounts of work costs.

In master data management (MDM) you will be able to getting things right, and reduce working costs, by killing keystrokes wherever possible.

Killing keystrokes in Product Information Management (PIM)

I have seen my share of current business processes where product master data are reentered or copied and pasted from different sources extracted from one product master data container and, often via spreadsheets, captured into another product master data container.

This happens inside organizations and it happens in the ecosystem of business partners in supply chains encompassing manufactures, distributors and retailers.

As touched in the post Social PIM there might be light at the end of the tunnel by the rise of tools, services and platforms setting up collaboration possibilities for sharing product master data and thus avoiding those evil keystrokes.

Killing keystrokes in Party Master Data Management

With party master data there are good possibilities of exploiting external data from big reference data sources and thus avoiding the evil keystrokes. The post instant Data Quality at Work tells about how a large utility company have gained better data quality, and reduced working costs, by using the iDQ™ service in that way within customer on-boarding and other business processes related to customer master data maintenance.

The next big thing in this area will be the customer data integration (CDI) part of what I call Social MDM, where you may avoid the evil keystrokes by utilizing the keystrokes already made in social networks by who the master data is about.

Bookmark and Share

instant Data Quality at Work

DONG Energy is one of the leading energy groups in Northern Europe with approximately 6,400 employees and EUR 7.6 billion in revenue in 2011.

The other day I sat down with Ole Andres, project manager at DONG Energy, and talked about how they have utilized a new tool called iDQ™ (instant Data Quality) in order to keep up with data quality around customer master data.

iDQ™ is basically a very advanced search engine capable of being integrated into business processes in order to get data quality for contact data right the first time and at the same time reduce the time needed for looking up and entering contact data.

Fit for multiple business processes

Customer master data is used within many different business processes. Dong Energy has successfully implemented iDQ™ within several business processes, namely:

  • Assigning new customers and ending old customers on installation addresses
  • Handling returned mail
  • Debt collection

Managing customer master data in the utility sector has many challenges as there are different kinds of addresses to manage such as installation addresses, billing addresses and correspondence addresses as well as different approaches to private customers and business customers including considering the grey zone between who is a private account and who is a business account.

New technology requires change management

Implementing new technology into a large organization doesn’t just go by itself. Old routines tend to stick around for a while. DONG Energy has put a lot of energy, so to say, into training the staff in reengineering business processes around customer master data on-boarding and maintenance including utilizing the capabilities of the iDQ™ tool.

Acceptance of new tools comes with building up trust in the benefits of doing things in a new way.

Benefits in upstream data quality 

A tool like iDQ™ helps a lot with safeguarding the quality of contact data where data is born and when something happens in the customer data lifecycle. A side effect, which is at least as important stresses Ole Andres, is that data collection is going much faster.

Right now DONG Energy is looking into further utilizing the rich variety of reference data sources that can be found in the iDQ™ framework.

Bookmark and Share

Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share

Where is the Spot?

One of things we often struggle with in data quality improvement and master data management is postal addresses. Postal addresses have different formats around the world, names of streets are spelled alternatively and postal codes may be wrong, too short or suffer from other flaws.

An alternative way of identifying a place is a geocode and sometimes we may think: Hurray, geocodes are much better in uniquely identifying a place.

Well, unfortunately not necessarily so.

First of all geocodes may be expressed in different systems. The most used ones are:

  • Latitude and longitude: Even though the globe is not completely round, this system for most purposes is good for aligning positions with the real world.
  • UTM: When the world is reflected on a paper or on a computer screen it becomes flat. UTM reflects the world on a flat surface very well aligned with the metric system making distance calculations straight forward.
  • WGS: This is the system in use in many GPS devices and also the one behind Google Maps.

Next, where is the address exactly placed?

I have met at least three different approaches:

  • It could be where the building actually is and then if the precision is deep and/or the building is big on different places around the building.
  • It could be where the ground meets a public road. This is actually most often the case, as route planning is a very common use case for geocodes. The spot is fit for the purpose of use so to say.
  • It could, as reported in the post  Some Times Big Brother is Confused, be any place on (and beside) the street as many reference data sources interpolates numbers equally along the street or in other ways gets it wrong by keeping it simple.

Bookmark and Share

instant Data Quality and Business Value

During the last couple of years I have been working with a cloud service called instant Data Quality (iDQ™).

iDQ™ is basically a very advanced search engine capable of being integrated into business processes in order to get data quality for contact data right the first time and at the same time reduce the time needed for looking up and entering contact data.

With iDQ™ you are able to look up what is known about a given address, company and individual person in external sources (I call these big reference data) and what is already known inside your internal master data.

Orchestrating the contact data entry and maintenance processes this way does create better data quality along with creating business value.

The testimonials from current iDQ™ clients tells that story.

Dong Energy, a leader in providing clean and reliable energy, says:

Dong says

From the oil and gas industry Kuwait Petroleum, a company with trust as a core value, adds in:

Q8 says

In the non-profit sector the DaneAge Association, an organization supporting and counselling older people to make informed decisions, also get it:

DaneAge says

You may learn more about iDQ™ on the instant Data Quality site.

Bookmark and Share

Doctor Livingstone, I Presume?

The title of this blog post is a famous quote from history (which as most quotes are disputed) said by Henry Morton Stanley (who actually was born John Rowlands) when he found Doctor Livingstone (David Livingstone) deep into the African jungle in 1871 after a 6 month expedition with 200 men through unknown territory.

Today it’s much easier to find people. Mobile phone use, credit card transactions and tweet positions leads the way, unless of course you really, really don’t want to be found as it was with Osama bin Mohammed bin Awad bin Laden.

One of the biggest issues in data quality is real world alignment of the data registered about persons. As told in the post out Out of Africa there are some issues in the way we handle such data, as:

  • Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
  • Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
  • “Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Besides that I have found that many organizations don’t use the sources available today in getting data quality right when it comes to contact data.

It’s not that I suggest actually hacking into mobile phone use logs and so. There are a lot of sources not compromising with privacy that let you exploit external reference data as explained in the post Beyond Address Validation.

Bookmark and Share

Return on Investment in Big Reference Data

Currently I’m working with a cloud based service where we are exploiting available data about addresses, business entities and consumers/citizens from all over the world.

The cost of such data varies a lot around the world.

In Denmark, where the product is born, the costs of such data are relatively low. The joys of the welfare state also apply to access to open public sector data as reported in the post The Value of Free Address Data. Also you are able to check the identity of an individual in the citizen hub. Doing it online on a green screen you will be charged (what resembles) 50 cent, but doing it with cloud service brokerage, like in iDQ™, it will only cost you 5 cent.

In the United Kingdom the prices for public sector data about addresses, business entities and citizens are still relatively high. The Royal Mail has a license tag on the PAF file even for government bodies. Ordnance Survey is given the rest of AddressBase free for the public sector, but there is a big tag for the rest of the society. The electoral roll has a price tag too even if the data quality isn’t considered for other uses than the intended immediate purpose of use as told in the post Inaccurately Accurate.

At the moment I’m looking into similar services for the United States and a lot of other countries. Generally speaking you can get your hands on most data for a price, and the prices have come down since I checked the last time. Also there is a tendency of lowering or abandoning the price for the most basic data as names and addresses and other identification data.

As poor data quality in contact data is a big cost for most enterprises around the world, the news of decreasing prices for big reference data is good news.

However, if you are doing business internationally it is a daunting task to keep up with where to find the best and most cost effective big reference data sources for contact data and not at least how to use the sources in business processes.

Wednesday the 25th July I’m giving a presentation, in the cloud, on how iDQ™ comes to the rescue. More information on DataQualityPro.

Bookmark and Share

The Big Tower of Babel

3 years ago one of the first blog posts on this blog was called The Tower of Babel.

This post was the first of many posts about multi-cultural challenges in data quality improvement. These challenges includes not only language variations but also different character sets reflecting different alphabets and script systems, naming traditions, address formats, measure units, privacy norms, government registration practice to name some of the ones I have experienced.

When organizations are working internationally it may be tempting to build a new Tower of Babel imposing the same language for metadata (probably English) and the same standards for names, addresses and other master data (probably the ones of the country where the head quarter is).

However, building such a high tower may end up the same way as the Tower of Babel known from the old religious tales.

Alternatively a mapping approach may be technically a bit more complex but much easier when it comes to change management.

The mapping approach is used in the Universal Postal Unions’ (UPU) attempt to make a “standard” for worldwide addresses. The UPU S42 standard is mentioned in the post Down the Street. The S42 standard does not impose the same way of writing on envelopes all over the world, but facilitates mapping the existing ways into a common tagging mapped to a common structure.

Building such a mapping based “standard” for addresses, and other master data with international diversity, in your organization may be a very good way to cope with balancing the need for standardization and the risks in change management including having trusted and actionable master data.

The principle of embracing and mapping international diversity is a core element in the service I’m currently working with. It’s not that the instant Data Quality service doesn’t stretch into the clouds. Certainly it is a cloud service pulling data quality from the cloud. It’s not that that it isn’t big. Certainly it is based on big reference data.

Bookmark and Share