The Big Tower of Babel

3 years ago one of the first blog posts on this blog was called The Tower of Babel.

This post was the first of many posts about multi-cultural challenges in data quality improvement. These challenges includes not only language variations but also different character sets reflecting different alphabets and script systems, naming traditions, address formats, measure units, privacy norms, government registration practice to name some of the ones I have experienced.

When organizations are working internationally it may be tempting to build a new Tower of Babel imposing the same language for metadata (probably English) and the same standards for names, addresses and other master data (probably the ones of the country where the head quarter is).

However, building such a high tower may end up the same way as the Tower of Babel known from the old religious tales.

Alternatively a mapping approach may be technically a bit more complex but much easier when it comes to change management.

The mapping approach is used in the Universal Postal Unions’ (UPU) attempt to make a “standard” for worldwide addresses. The UPU S42 standard is mentioned in the post Down the Street. The S42 standard does not impose the same way of writing on envelopes all over the world, but facilitates mapping the existing ways into a common tagging mapped to a common structure.

Building such a mapping based “standard” for addresses, and other master data with international diversity, in your organization may be a very good way to cope with balancing the need for standardization and the risks in change management including having trusted and actionable master data.

The principle of embracing and mapping international diversity is a core element in the service I’m currently working with. It’s not that the instant Data Quality service doesn’t stretch into the clouds. Certainly it is a cloud service pulling data quality from the cloud. It’s not that that it isn’t big. Certainly it is based on big reference data.

Bookmark and Share

Avoiding Contact Data Entry Flaws

Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.

Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.

I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:

  • Correction and
  • Assistance

Correction

With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.

After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.

As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.

Assistance

With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.

For example when entering address data assistance based data entry will start with the highest geographical level:

  • If we are dealing with international data the country will set the context and know about if a state or province is needed.
  • Where postal codes (like ZIP) exists, this is the fast path to the city.
  • In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.

(I guess many people know this approach from navigation devices for cars.)

When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.

When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.

Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.

Bookmark and Share

Social Commerce and Multi-Domain MDM

The term social commerce is said to be a subset of eCommerce where social media is used to ultimately drag prospects and returning customers to your website, where a purchase of products and services can be made.

In complex sales processes, typically for Business-to-Business (B2B) sales, the website may offer product information sheets, demo requests, contact forms and other pipeline steps.

This is the moment where your social media engaged (prospective) customer meets your master data as:

  • The (prospective) customer creates and maintains name, address and communication information by using registration functions
  • The (prospective) customer searches for and reads product information on web shops and information sites

One aspect of this transition is how master data is carried over, namely:

  • How the social network profile used in engagement is captured as part of (prospective) customer master data or if it should be part of master data at all?
  • How product information from the governed master data hub has been used as part of the social media engagement or if the data governance of product data should be extended to use in social media at all?

Any thoughts?

Bookmark and Share

At Least Two Versions of the Truth

Precisely one year ago I wrote a post called Single Company View examining the challenges of getting a single business partner view in business-to-business (B2B) party master data.

Yesterday Robert Hawker of Vodafone made a keynote at the MDM Summit Europe 2012 telling about supplier master data management.

One of the points was that sometimes you really want the exactly same real world entity to be two golden records in your master data hub, as there may be totally different business activities made with the same legal entity. The Vodafone example was:

  • Having an antenna placed on the top of a building owned by a certain company and thus paying a fee for that
  • Buying consultancy services from the same company

I have met such examples many times when doing data matching as told in the post Entity Revolution vs Entity Evolution.

However at one occasion, many years ago, I worked in a company where not having a single business partner view nearly became a small disaster.

Our company delivered software for membership administration and was at the same time a member of an employer organisation that also happened to be a customer.

A new director got the brilliant idea, that cancelling the membership of the employer organization was an obvious cost reduction.

The cancellation was sent. The employer organisation confirmed the cancellation adding, that they were very sorry that internal business rules at the same time forced them to not being a customer anymore.

Cancellation was cancelled of course and damage control was initiated.

Bookmark and Share

MDM Summit Europe 2012 Preview

I am looking forward to be at the Master Data Management Summit Europe 2012 next week in London. The conference runs in parallel with the Data Governance Conference Europe 2012.

Data Governance

As I am living within a short walking distance of the venue I won’t have so much time thinking as Jill Dyché had when she recently was on a conference within driving distance, as reported on her blog post After Gartner MDM in which Jill considers MDM and takes the road less traveled. In London Jill will be delivering a key note called: Data Governance, What Your CEO Needs to know.

On the Data Governance tracks there will be a panel discussion called Data Governance in a Regulatory Environment with some good folks: Nicola Askham, Dylan Jones, Ken O’Connor and Gwen Thomas.

Nicola is currently writing an excellent blog post series on the Six Characteristics Of A Successful Data Governance Practitioner. Dylan is the founder of DataQualityPro. Ken was the star on the OCDQblog radio show today discussing Solvency II and Data Quality.

Gwen, being the founder of The Data Governance Institute, is chairing the Data Governance Conference while Aaron Zornes, the founder of The MDM Institute, is chairing the MDM Summit.

Master Data, Social MDM and Reference Data Management

The MDM Institute lately had an “MDM Alert”  with Master Data Management & Data Governance Strategic Planning Assumptions for 2012-13 with the subtitle: Pervasive & Pandemic MDM is in Your Future.

Some of the predictions are about reference data and Social MDM.

Social master data management has been a favorite subject of mine the last couple of years, and I hope to catch up with fellow MDM practitioners and learning how far this has come outside my circles.

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and initially maintained outside a single enterprise. Examples from the customer master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend as I see it is that enterprises seek to benefit from having reference data in more depth than those often modest populated lists mentioned above. In the customer master data realm such big reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data.

As I’m working with an international launch of a product called instant Data Quality (iDQ™) I look forward to explore how MDM analysts and practitioners are seeing this field developing.

Bookmark and Share

Fit for repurposing

Reading a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about if data are of high quality if they are:

  • Fit for the purpose of use
  • Fit for repurposing

The first definition has been around for many years and has been adapted by many data quality practitioners. I have however often encountered situations where the reuse of data for other purposes than the original purpose has raised data quality issues with else cleared data. One of my first pieces on my own blog discussed that challenge in a post called Fit for what purpose?

Not at least within master data management where data are maintained for multiple uses, this problem is very common.

Data in a master data hub may either:

  • Be entered directly into the hub where multiple uses is handled
  • Be loaded from other sources where data capture was done

In the latter case the data governance necessary to ensure fitness for multiple uses must stretch to the ingestion in these sources.

Now, if repurposing is seen as a future not yet discovered purpose of use, what can you then do to ensure that data today are fit for future repurposing?

The only answer is probably real world alignment as discussed here on a page called Data Quality 3.0. Make sure your data are reflecting the real world as close as we can when captured and make sure data can be maintained in order to keep that alignment. And make sure this is done and facilitated where data are entered.

Bookmark and Share

Turning a Blind Eye to Data Quality

The idiom turning a blind eye originates from the sea battle at Copenhagen where Admiral Nelson ignored a signal with permission to withdraw by raising the telescope to his blind eye and say “I really do not see the signal”.

Nelson went on and won the battle.

As a data quality practitioner you are often amazed by how enterprises turns the blind eye to data quality challenges and despite horrible data quality conditions keeps on and wins the battle by growing as a successful business.

The evidence about how poor data quality is costing enterprises huge sums has been out there for a long time. But business success are made over and again despite of bad data. There may be casualties, but the business goals are met anyway. So, the poor data quality is just something that makes the fight harder, not impossible.

I guess we have to change the messaging about data quality improvement away from the doomsday prophesies, which make decision makers turn a blind eye to data quality challenges, and be more specific on maybe smaller but tangible wins where data quality improvement and business efficiency goes hand in hand.        

Bookmark and Share

Tear Down This Wall!

Today is the 50th anniversary of the Berlin Wall. The wall is fortunately gone today, torn down as suggested by Ronald Reagan in 1987 with his famous words: Mr. Gorbachev, tear down this wall!

But today we have another bad wall, saying that an enterprise has two parts: Business and IT.

I disagree. So do many other people as for example Michael Baylon in this blog post called Is IT part of the business?

Yes, IT is part of the business. Tear down this wall!

Bookmark and Share

Marco Polo and Data Provenance

Besides being a data geek I am also interested in pre-modern history. So it’s always nice when I’m able to combine data management and history.

A recurring subject in historian circles is a suspicion saying that Explorer Marco Polo never actually went to China.

As said in the linked article from The Telegraph: “It is more likely that the Venetian merchant adventurer picked up second-hand stories of China, Japan and the Mongol Empire from Persian merchants whom he met on the shores of the Black Sea – thousands of miles short of the Orient”.

When dealing with data and ramping up data quality a frequent challenge is that some data wasn’t captured by the data consumer – not even by the organization using the data. Some of the data stored in company databases are second-hand data and in some cases the overwhelming part of data is captured outside the organization.

As with the book telling about Marco Polo’s (alleged) travels called “Description of the World” this doesn’t mean that you can’t trust anything. But maybe some data are mixed up a bit and maybe some obvious data are missing.

I have earlier touched this subject in the post Outside Your Jurisdiction and identified second-hand data as one of the Top 5 Reasons for Downstream Cleansing.

Bookmark and Share

Proactive Data Governance at Work

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

  • Data entry directly in the master data hub
  • Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Bookmark and Share