Party On

13th July 2011

The most frequent data domain addressed in data quality improvement and master data management is parties.

Some of the issues related to parties that keeps on creating difficulties are:

  • Party roles
  • International diversity
  • Real world alignment

Party roles

Party data management is often coined as customer data management or customer data integration (CDI).

Indeed, customers are the lifeblood of any enterprise – also if we refer to those who benefit from our services as citizens, patients, clients or whatever term in use in different industries.

But the full information chain within any organization also includes many other party roles as explained in the post 360° Business Partner View. Some parties are suppliers, channel partners and employees. Some parties play more than one role at the same time.

The classic question “what is a customer?” is of course important to be answered in your master data management and data quality journey. But in my eyes there is lot of things to be solved in party data management that don’t need to wait for the answer to that question which anyway won’t be as simple as cutting the Gordian Knot as said in the post Where is the Business.

International diversity

As discussed in the post The Tower of Babel more and more organizations are met with multi-cultural issues in data quality improvement within party data management.

Whether and when an organization has to deal with international issues is of course dependent on whether and in what degree that organization is domestic or active internationally. Even though in some countries like Switzerland and Belgium having several official languages the multi-cultural topic is mandatory. Typically in large countries companies grows big before looking abroad while in smaller countries, like my home country Denmark, even many fairly small companies must address international issues with data quality.

However, as Karen Lopez recently pondered in the post Data Quality in The Wild, Some Where …, actually everyone, even in the United States, has some international data somewhere looking very strange if not addressed properly.

Real world alignment

I often say that real world alignment, sometimes as opposed to the common definition of data quality as being fit for purpose, is the short cut to getting data quality right related to party master data.

It is however not a straight forward short cut. There are multiple challenges connected with getting your business-to-business (B2B) records aligned with the real world as discussed in the post Single Company View.  When it comes to business-to-consumer (B2C) or government-to-citizen (G2C) I think the dear people who sometimes comments on this blog did a fine job on balancing mutating tables and intelligent design in the post Create Table Homo_Sapiens.

Bookmark and Share


When a Cloudburst Hit

11th July 2011

Some days ago Copenhagen was hit by the most powerful cloudburst ever measured here.

More powerful cloudbursts may be usual in warmer regions on the earth, but this one was very unusual at 55 degrees north.

Fortunately there was only material damage, but the material damage was very extensive. When you take a closer look you may divide the underground constructions into two categories.

The first category is facilities constructed with the immediate purpose of use in mind. Many of these facilities are still out of operation.

The second category is facilities constructed with the immediate purpose of use in mind but also designed to resist heavy pouring rain. These facilities kept working during the cloudburst. One example is the metro. If the metro was constructed for only the immediate purpose of use, being circling trains below ground, it would have been flooded within minutes, with the risk of lost lives and a standstill for months.

We have the same situation in data management. Things may seem just fine if data are fit for the immediate purpose of use. But when a sudden change in conditions hit, then you know about data quality.

Bookmark and Share


A Sudden Change: South Sudan

9th July 2011

This tenth Data Quality World Tour blog post is about South Sudan, a new country born today the 9th July 2011.

Reference data

The term “reference data” is often used to describe small collections of data that are basically maintained outside an enterprise and being common to all organizations. A list of countries is a good example of what is reference data.

Sometimes the terms “reference data” and “master data” are used interchangeable. I started a discussion on that subject on the mdm community some time ago.

One problem with reference data as a country list is if you are able to keep such a list updated. A country list doesn’t change every day, but sometimes it actually does like today with South Sudan as a new country.  

Suddenly changing dimensions

If you have master data entities linking to reference data like a country list it is not that simple when the reference data changes. If you have a customer placed in what is South Sudan today that entity should rightfully link to Sudan regarding yesterday’s transactions, but you may also have changed the name of Sudan to North Sudan which is the continuing part of the former Sudan. 

We call that kind of challenge “slowly changing dimensions” but it actually looks like “suddenly changing dimensions” when we have to figure out who belongs to where at a certain time.

Previous Data Quality World Tour blog posts:


No NOT NULL

18th May 2011

A basic way of ensuring data quality in a database is to define that a certain attribute must be filled. This is done by specifying that the value “null” isn’t allowed or as said in SQL’ish: Setting the NOT NULL constraint.

A common data quality issue is that such constraints almost always are too rigid.

In my last post called Notes about the North Pole it was discussed that every place on earth has a latitude and a longitude except that the North Pole – and the South Pole – hasn’t a longitude. So if you have a table with geocodes you can’t set NOT NULL for the longitude if you (though very unlikely) should store the coordinates for the poles. Alternatively you could store 0 for longitude to make it complete – but then it would be very inaccurate. 360 degree inaccurate so to speak.

Another infrequent example from this blog is that every person in my country has a given (first) name and a family (last) name. But there are a few Royal Exceptions. So, no NOT NULL for the family name.

Related to people and places there are plenty of more frequent examples. If you only expect addresses form United States, Australia or India setting the NOT NULL for the state attribute seems wise. But expect foolish values in here when you get addresses from most other parts of the world. So, no NOT NULL for the state.  

A common variant of the mandatory state value is when you register for data quality webinars, white papers and so on. Most often you must select from a value list containing the United States of America – in some cases also mixed in with Canadian Provinces. The NULL option to be used by strangers may hide as “Not Applicable” way down the list among states beginning with N.

I usually select Alaska which is among the first states in the alphabetical order – which also brings me back close to the North Pole making my data close to 360 degree inaccuracy.     

Bookmark and Share


My View

15th May 2011

This post is inspired by the view from our roof terrace, where I’m sitting with the laptop right now.

One of the buildings I can see in the skyline is the spectacular new Hotel Bella Sky that will open tonight.

The new hotel is situated by the main fair in Copenhagen called Bella Center, the venue of the recent disastrous climate change summit where Wen, Obama and Singh couldn’t agree about anything.     

The Bella Sky isn’t the only new high rising hotel in the nearby skyline. Actually there is currently an overcapacity of hotel rooms in Copenhagen. But as it is said, the new hotels were planned before the credit crunch and couldn’t be stopped.   

Planning several years in advance has always been difficult. Within information technology it’s also a well known fact that projects that is set to deliver some years ahead almost always fails to meet the actual business needs when that time is reached.

On the one hand we need some more agile hotel projects – and agile information technology projects – including agile master data management and data quality programs.

On the under hand, I like it when I see some nice hotel architecture and some good data architecture.

Bookmark and Share


Georgian Geography and History

1st May 2011

This is the sixth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Georgia

Georgia is the English name for a sovereign state in the South Caucasus where Europe meets Asia. Georgia was a part of the Soviet Union under the English name Georgian SSR from 1922 to 1991. Back in the 4th century BC a unified kingdom of Georgia was established as an early example of an advanced state organization under one king and an aristocratic hierarchy.

Georgia

Georgia is a state located in the southeastern United States. Back in the 18th century the area was known as the Province of Georgia within the British colonies. Before the arrival of the Europeans some of current Georgia was part of the Cofitachequi paramount chiefdom.

Ambiguous place names and slowly changing dimensions

Like with Georgia there are lots of examples of place names belonging to more than one place on Earth. Besides that location reference data like the Georgia’s have slowly changing dimensions as what area is covered, where in a hierarchy it belongs and what it is called at a certain time.

Previous Data Quality World Tour blog posts:


Does One Size Fit Anyone?

5th April 2011

Following up on a recent post about data silos I have been thinking (and remembering) a bit about the idea that one company can have all master data stored in a single master data hub.

Supply Chain Musings

If you for example look at a manufacturer the procurement of raw materials is of course an important business process.

Besides purchasing raw materials the manufacturer also buys machinery, spare parts for the machinery and maintenance services for the machinery.

Like everyone else the manufacturer also buys office supplies – including rare stuff as data quality tools and master data management consultancy.

If you look at the vendor table in such a company the number of “supporting suppliers” are much higher than the number of the essential suppliers of raw materials. The business processes, data structures and data quality metrics for on-boarding and maintaining supplier data and product data are “same same but very different” for these groups of suppliers and the product data involved.

Supply Chain Centric Selling

I remember at one client in manufacturing a bi-function in procurement was selling bi-products from the production to a completely different audience than the customers for the finished products. They had a wonderful multi-domain data silo for that.

Hierarchical Customer Relations

A manufacturer may have a golden business rule saying that all sales of finished products go through channel partners. That will typically mean a modest number of customers in the basic definition being someone who pays you. Here you typically need a complex data structure and advanced workflows for business-to-business (B2B) customer relationship management.

Your channel partners will then have customers being either consumers (B2B2C) or business users within a wider range of companies. I have noticed an increasing interest in keeping some kind of track of the interaction with end users of your products, and I guess embracing social media will only add to that trend. The business processes, data structures and data quality metrics for doing that are “same same but very different” from your basic customer relationship management.

Conclusion

The above musings are revolved around manufacturing companies, but I have met similar ranges of primary and secondary constructs related to master data management in all other industry verticals.   

So, can all master data in a given company be handled in a single master data hub?

I think it’s possible, but it has to be an extremely flexible hub either having a lot of different built-in functionality or being open for integration with external services.

Bookmark and Share


Boiling Data Silos

30th March 2011

Yesterday there where some blog posts dealing with data silos.

Graham Rhind posted: Data silos – learn to live with them.

Rob Karel posted: Stop trying to put a monetary value on data – it’s the wrong path. Though not being the main subject there was a remark saying: “Attempting to boil the ocean and trying to solve Customer, Product, or Financial data for all processes and decisions across the whole organization is too big an effort destined to fail before it starts”.  

Mark Montgomery made a comment on Rob’s post saying: “I also have trouble with the boil the ocean metaphor, which is used too often these days to justify all kinds of protectionist policies in the enterprise. You can’t have it both ways in the enterprise– either you have data silos or you don’t, and I argue that increasingly the world cannot afford them, albeit in highly secure formats in most situations”.

I guess we have to go for the golden mean on this one also. We shouldn’t accept data silos but we must expect them. We could go for eliminating them probably not in one big bang but slice by slice as we climb up the levels in an information maturity model.

I would definitely expect to see fewer and smaller data silos at the top level of an information maturity model than on a bottom level of a data quality immaturity model.

Bookmark and Share


Holistic Accuracy

29th March 2011

In community economics you have two terms called

  • Partitive accuracy and
  • Holistic accuracy

In short, partitive accuracy is the accuracy of a single measure being part of a model while holistic accuracy is the accuracy of the model structure and its use. More information here.

I find these terms being very useful in data quality and master data management as well.

The distinction between partitive accuracy and holistic accuracy resembles the distinction between data quality and information quality.

One problem with the term information quality is that it implies a certain context of use, which makes it hard to prepare data for having high data quality for multiple uses other than assuring the accuracy of the single data elements – being similar to the term partitive accuracy.

One clue for assuring better information quality is looking at the model structure of data – being similar to the term holistic accuracy. Here I am thinking beyond traditional data modeling, which is anchored in the technical world, and into how end users of master data hubs are able to build structures of data (with partitive accuracy) that fits the daily business use.

Examples of such holistic information capabilities in master data management will be building flexible product hierarchies and hierarchies of party master data that at the same time reflects hierarchies in the real world as households and company family trees and hierarchies of related accounts and addresses used within the enterprise.

While a single data element as an address component like a postal code may be partitive accurate, the holistic accuracy is seen as how data elements contribute to a holistic accuracy as a part of a data structure that fits multiple purposes of use.

Bookmark and Share


Non-Obvious Entity Relationship Awareness

16th March 2011

In a recent post here on this blog it was discussed: What is Identity Resolution?

One angle was the interchangeable use of the terms “Identity Resolution” and “Entity Resolution”. These terms can be seen as truly interchangeable, as that “Identity Resolution” is more advanced than “Entity Resolution” or as (my suggestion) that “Identity Resolution” is merely related to party master data, but “Entity Resolution” can be about all master data domains as parties, locations and products.

Another term sometimes used in this realm is “Non-Obvious Relationship Awareness”. Also this term is merely related to finding relationships between parties, for example individuals at a casino that seems to do better than the croupiers. Here’s a link to a (rather old) O’Reilly Radar post on Non-Obvious Relationship Awareness.

Going Multi-Domain

So “Non-Obvious Entity Relationship Awareness” could be about finding these hidden relationships in a multi-domain master data scope.

An example could be non-obvious relationships in a customer/product matrix.

The data supporting this discovery will actually not be found in the master data itself, but in transaction data probably being in an Enterprise Data Warehouse (EDW). But a multi-domain master data management platform will be needed to support the complex hierarchies and categorizations needed to make the discovery.   

One technical aspect of discovering such non-obvious relationships is how chains of keys are stored in the multi-domain master data hub.

Customer Master Data

The transactions or sums hereof in the data warehouse will have keys referencing customer accounts. These accounts can be stored in staging areas in the master data hub with references to a golden record for each individual or company in the real world. Depending on the identity resolution available the golden records will have golden relations to each other as they are forming hierarchies of households, company family trees, contacts within companies and their movements between companies and so on.

My guess as described in the post Who is working where doing what? is that this will increasingly include social media data.

Product Master Data

Some of the same transactions or sums hereof in the data warehouse will have keys referencing products. These products will exist in the master data hub as members of various hierarchies with different categorizations.

My guess is that future developments in this field will further embrace not just your own products but also competitor products and market data available in the cloud all attached to your hierarchies and categorizations.   

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 125 other followers