Business and Pleasure

The data quality and master data management (MDM) realm has many wistful songs about unrequited love with “the business”.

This morning I noticed yet a tweet on twitter expressing the pain:

Here Gartner analyst Ted Friedman foresees the doom of MDM if we don’t get at least the traction from “the business” that BI (Business Intelligence) is getting.

In my eyes everything we do in Information Technology is about “the business”. Even computer games and digital entertainment is a core part of the respective industries. I also believe that IT is part of “the business”.

“The rest of the business” does see that some disciplines belong in the IT realm. This goes for database management, programming languages and network protocols. These disciplines are not doomed at all because it is so. “The rest of the business” couldn’t work today without these things around.

Certainly I have seen some IT based disciplines and related tools emerged and then been doomed during my years in the IT business. Anyone remembers case tools?   

With case tools I remember great expectations about business involvement in application design. But according to Wikipedia the main problems with case tools are (were): Inadequate standardization, unrealistic expectations, slow implementation and weak repository controls.

In other words: “The rest of the business” never really got in touch with the case tools because they didn’t work as supposed.

The business traction we see around BI (and the enabling tools) now is in my eyes very much about that the tools have matured, actually works, have become more user friendly and seems to create useful results for “the rest of the business”.

Data quality tools and MDM tools must continue to follow that direction too, because for sure: Data Quality tools and MDM tools does not solve any severe problems internally in the IT part of “the business”.

It’s my pleasure being part of that.

Bookmark and Share

Survival of the Fit Enough

When working with data quality and master data management at the same time you are constantly met with the challenge that data quality is most often defined as data being fit for the purpose of use, but master data management is about using the same data for multiple purposes at the same time.

Finding the right solution to such a challenge within an organization isn’t easy, because it despite all good intentions is difficult to find someone in the business with an overall answer to that kind of problems as explained in the blog post by David Loshin called Communications Gap? Or is there a Gap between Chasms?

An often used principle for overcoming these issues may (based on Darwin) be seen as “survival of the fittest”. You negotiate some survivorship rules between “competing” data providers and consumers and then the data being the fittest measured by these rules wins. All other data gets the KISS of death. Most such survivorship rules are indeed simple often based on a single dimension as timeliness, completeness or provenance.

Recently the phrase “survival of the fittest” in evolution theory has been suggested to be changed to “survival of the fit enough” because it seems that many times specimens haven’t competed but instead found a way into empty alternate spaces.

It seems that master data management and related data quality is going that way too. Data that is fit enough will survive in the master data hub in alternate spaces where the single source of truth exists in perfect symbioses with multiple realities.

Bookmark and Share

Timeliness of Times

One of my several current engagements is within public transit.

I have earlier written about Real World Alignment issues in public transit (in my culture) as well as the special Multi-Entity Master Data Quality challenges there is in this specific industry.

Usually we talk about party master data and product master data as the most common domains of master data and sometimes we add places (locations) as the third domain in a P trinity of “parties, products and places” or perhaps a W trinity of “who, what and where”.

The when dimension, the times where events are taking place, is most often seen as belonging to the transaction side of life in the databases.

However in public transit you certainly also have timetables as an important master data domain. The service provided by a public transit authority or operator is described as belonging to a certain timeframe where a given combination of services is valid. An example is the “Summer Schedule 2011”.

An other industry with a time depending master data domain I have seen is education, where the given services (lessons) usually are described as belonging to a semester.

Wonder if you have met other master data types that is more belonging to the “when” domain than the “who, what and where” domains?  Did you have any problems with the timeliness of times?

Bookmark and Share

Product Placement

This wasn’t actually meant as a blog post series about the place entity in multi-domain master data management. But I think I have been carried away by my work, so now it is.

Places probably are most common related to the party domain as seen in the previous post called A Place in Time. But places certainly also have multiple relations to the product domain then forming a P trinity of parties, products and places in multi-domain master data management as seen in the post Your Place or My Place?

As with most things in the product domain also the product-place relations usually are very industry specific.

Some of the product-place relations I have worked with come from these industries:

Insurance

The fees you have to pay for some insurance products are related to the place where you live. In order to having the right fees (and for a lot of other reasons) an insurance company needs to analyze data based on the product-place relations. This may by the way go very wrong as told in the post A Really Bad Address.

Hospitality

Your product is a place where the selling attributes includes both the properties belonging to the place itself and the properties of the places being nearby.

Real Estate

Do I have to say more than three words: Location, Location, Location.

Your product-place relations

Tell me about what product-place relations you have worked with?

Bookmark and Share

A Place in Time

I remember when I had the first chemistry lesson in high school our teacher told us that we should forget all about the chemistry we had learned in primary school, because this was a too simply model not reflecting how the real world of chemistry actually work.

Since I have started working with data quality and master data management I have a pet peeve in data modeling, namely being the probably most common example of doing data modeling: The classic customer table. Example from a SQL tutorial here:

Compared to how the real world works this example has some diversity flaws, like:

  • state code as a key to a state table will only work with one country (the United States)
  • zipcode is a United States description only opposite to the more generic “Postal Code”
  • fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
  • The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

  • fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
  • company name should belong to an own party entity acting in the role as customer
  • address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

Now I know this is just a simple example from a tutorial where you should not confuse by adding too much complexity. Agreed.

However many home grown solutions in business life and even many commercial ready-made applications use that kind of a data model to describe one of the most important business entities being our customers.

It may be that such a model does fit the purpose of use in some operations. Sometimes yes, sometimes no. But when reusing data from such a model on enterprise level and when adding business intelligence you are in big trouble. That is why we need master data hubs and why we need to transform data coming into the master data hub.

From such a customer record we don’t create just one golden record. We make or link several different related multi-domain entities as:

  • The contact as a person in our party domain – maybe we knew her before
  • The company in our party domain – maybe we knew the sister as a supplier before
  • The address in our place (location) domain – maybe we knew that address as a place in time before

Bookmark and Share

Your Place or My Place?

We, and that’s including myself, often talk about multi-domain master data management as a marriage between party master data management (also called Customer Data Integration abbreviated as CDI) and Product Master Data Management (also called Product Information Management abbreviated as PIM).

The third most common master data domain is locations (or places). I like the term place, because then we have a P trinity: Parties, Products and Places. However there may be a fourth P involved, as I read a post today by Steven Jones of Capgemini telling that multi-domain MDM is a Pointless question.

The Premise of the Pointlessness is that Party and Product is an IT Perspective. The rest of the business sees the world from mainly either a customer centric perspective or a supply centric perspective.

I agree about that these perspectives exists too and actually made a blog post recently on sell side vs buy side master data quality.

I don’t agree about that this is an (pointless) IT versus business question, obviously also because I have a hard time recognizing the great divide between IT and business. From my perspective is IT a part of the business just like sales, marketing and purchase is it too. And from a product vendor perspective in the MDM realm you actually address the conjunction of business and technological needs a bit opposite to either being a database manager vendor aimed mostly at the IT part of business or a CRM or SCM vendor aimed mostly at the sales or purchase part of business.

Multi-domain MDM isn’t in my perspective a pointless place, but a meeting place between IT and all the other places in business and the core business entities being parties, products and places.

Bookmark and Share

Lots of Product Names

In master data management the two most prominent domains are:

  • Parties and
  • Products

In the quest for finding representations of parties actually being the same real world party and finding representations of products actually being the same real world product we typically execute fuzzy data matching of:

  • Party names as person names and company names
  • Product descriptions

However I have often seen party names being an integral part of matching products.

Some examples:

Manufacturer Names:

A product is most often being regarded as distinct not only based on the description but also based on the manufacturer. So besides being sharp on matching product descriptions for light bulbs you must also consider if for example the following manufacturer company names are the same or not:

  • Koninklijke Philips Electronics N.V.
  • Phillips
  • Philips Electronic

Author Names:

A book is a product. The title of the book is the description. But also the author’s person name counts. So how do we collect the entire works made by the author:

  • Hans Christian Andersen
  • Andersen, Hans Christian
  • H. C. Andersen

as all three representations are superb bad data?

Bear Names:

A certain kind of teddy bears has a product description like “Plush magenta teddy bear”. But each bear may have a pet name like “Lots-O’-Huggin’ Bear” or just short “Lotso” as seen in the film “Toy Story 3”. And seriously: In real business I have worked with building a bear data model and the related data matching.

PS: For those who have seen Toy Story 3: Is that Lotso one or two real world entities?  

Bookmark and Share

Citizen ID and Biometrics

As I have stated earlier on this blog: The solution to the single most frequent data quality problem being party master data duplicates is actually very simple: Every person (and every legal entity) gets a unique identifier which is used everywhere by everyone.

Some countries, like Denmark where I live, has a unique Citizen ID (National identification number). Some countries are on the way like India with the Aadhaar project. But some of the countries with the largest economies in the world like United Kingdom, Germany and United States don’t seem to getting it in the near future.

I think United Kingdom was close lately, but as I understand it the project was cancelled. As seen in a tweet from a discussion on twitter today the main obstacles were privacy considerations and costs:

A considerable cost in the suggested project in United Kingdom, and also as I have seen in discussions for a US project, may be that an implementation today should also include biometric technology.

The question is however if that is necessary.

If we look at the systems in force today for example in Scandinavia they were implemented +40 years ago, and the Swedish citizen ID was actually implemented without digitalization in 1947. There are discussions going on about biometrics also as this is inevitable for issuing passports anyway. In the mean time the systems however continues to make a lot of data quality prevention and party master data management a lot easier than else around the world without having biometrics as a component.

No doubt about that biometrics will solve some problems related to fraud and so. But these are rare exceptions. So the cost/benefit analysis for enhancing an existing system with biometrics seems to be negative.

I guess the alleged need for biometric may have something to do with privacy considerations in a strange way: Privacy considerations are often overruled by the requirements for fighting terrorism – and here you need biometrics in identity resolution.

Bookmark and Share

A Prince and a Princess

Even though I’m not a royalist I’m afraid this will be the second hypocritical blog post within a year with a royal introduction.  The first one was about Royal Exceptions.

The big news on all channels today in Denmark (and Australia) is that (Australian born) Crown Princess Mary has given birth to twins; a boy and a girl then being a prince and a princess or as we say in blunt data quality language: A male and a female.  

The gender of individuals has always been a prominent element in party master data management and not at least in data matching.

Right now we are having a discussion in the LinkedIn Data Matching group concerning Data Quality of Gender / Sex Codes and the Impacts on Identity Data Matching.

So far we have covered issues as:

  • Trustworthiness for assigned gender codes
  • Scoring mechanisms in matching including gender codes
  • Diversity impact in assigning/verifying gender from names
  • Using gender codes for salutation

Please join the discussion and if you are not already a member of the LinkedIn Data Matching group: Join the group here.

Bookmark and Share

Right the First Time

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

Bookmark and Share