Liliendahl on Data Quality

Timeliness of Times

26th January 2011Henrik Gabs LiliendahlLeave a comment

One of my several current engagements is within public transit.

I have earlier written about Real World Alignment issues in public transit (in my culture) as well as the special Multi-Entity Master Data Quality challenges there is in this specific industry.

Usually we talk about party master data and product master data as the most common domains of master data and sometimes we add places (locations) as the third domain in a P trinity of “parties, products and places” or perhaps a W trinity of “who, what and where”.

The when dimension, the times where events are taking place, is most often seen as belonging to the transaction side of life in the databases.

However in public transit you certainly also have timetables as an important master data domain. The service provided by a public transit authority or operator is described as belonging to a certain timeframe where a given combination of services is valid. An example is the “Summer Schedule 2011”.

An other industry with a time depending master data domain I have seen is education, where the given services (lessons) usually are described as belonging to a semester.

Wonder if you have met other master data types that is more belonging to the “when” domain than the “who, what and where” domains? Did you have any problems with the timeliness of times?

Raising the Bar

24th January 201124th January 2011Henrik Gabs Liliendahl2 Comments

Setting new goals for earnings and savings by data quality improvement:

Product Placement

23rd January 201129th May 2012Henrik Gabs LiliendahlLeave a comment

This wasn’t actually meant as a blog post series about the place entity in multi-domain master data management. But I think I have been carried away by my work, so now it is.

Places probably are most common related to the party domain as seen in the previous post called A Place in Time. But places certainly also have multiple relations to the product domain then forming a P trinity of parties, products and places in multi-domain master data management as seen in the post Your Place or My Place?

As with most things in the product domain also the product-place relations usually are very industry specific.

Some of the product-place relations I have worked with come from these industries:

Insurance

The fees you have to pay for some insurance products are related to the place where you live. In order to having the right fees (and for a lot of other reasons) an insurance company needs to analyze data based on the product-place relations. This may by the way go very wrong as told in the post A Really Bad Address.

Hospitality

Your product is a place where the selling attributes includes both the properties belonging to the place itself and the properties of the places being nearby.

Real Estate

Do I have to say more than three words: Location, Location, Location.

Your product-place relations

Tell me about what product-place relations you have worked with?

A Place in Time

22nd January 201129th May 2012Henrik Gabs Liliendahl6 Comments

I remember when I had the first chemistry lesson in high school our teacher told us that we should forget all about the chemistry we had learned in primary school, because this was a too simply model not reflecting how the real world of chemistry actually work.

Since I have started working with data quality and master data management I have a pet peeve in data modeling, namely being the probably most common example of doing data modeling: The classic customer table. Example from a SQL tutorial here:

Compared to how the real world works this example has some diversity flaws, like:

state code as a key to a state table will only work with one country (the United States)
zipcode is a United States description only opposite to the more generic “Postal Code”
fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
company name should belong to an own party entity acting in the role as customer
address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

Now I know this is just a simple example from a tutorial where you should not confuse by adding too much complexity. Agreed.

However many home grown solutions in business life and even many commercial ready-made applications use that kind of a data model to describe one of the most important business entities being our customers.

It may be that such a model does fit the purpose of use in some operations. Sometimes yes, sometimes no. But when reusing data from such a model on enterprise level and when adding business intelligence you are in big trouble. That is why we need master data hubs and why we need to transform data coming into the master data hub.

From such a customer record we don’t create just one golden record. We make or link several different related multi-domain entities as:

The contact as a person in our party domain – maybe we knew her before
The company in our party domain – maybe we knew the sister as a supplier before
The address in our place (location) domain – maybe we knew that address as a place in time before

Your Place or My Place?

21st January 201129th May 2012Henrik Gabs Liliendahl4 Comments

We, and that’s including myself, often talk about multi-domain master data management as a marriage between party master data management (also called Customer Data Integration abbreviated as CDI) and Product Master Data Management (also called Product Information Management abbreviated as PIM).

The third most common master data domain is locations (or places). I like the term place, because then we have a P trinity: Parties, Products and Places. However there may be a fourth P involved, as I read a post today by Steven Jones of Capgemini telling that multi-domain MDM is a Pointless question.

The Premise of the Pointlessness is that Party and Product is an IT Perspective. The rest of the business sees the world from mainly either a customer centric perspective or a supply centric perspective.

I agree about that these perspectives exists too and actually made a blog post recently on sell side vs buy side master data quality.

I don’t agree about that this is an (pointless) IT versus business question, obviously also because I have a hard time recognizing the great divide between IT and business. From my perspective is IT a part of the business just like sales, marketing and purchase is it too. And from a product vendor perspective in the MDM realm you actually address the conjunction of business and technological needs a bit opposite to either being a database manager vendor aimed mostly at the IT part of business or a CRM or SCM vendor aimed mostly at the sales or purchase part of business.

Multi-domain MDM isn’t in my perspective a pointless place, but a meeting place between IT and all the other places in business and the core business entities being parties, products and places.

Lots of Product Names

18th January 2011Henrik Gabs Liliendahl2 Comments

In master data management the two most prominent domains are:

Parties and
Products

In the quest for finding representations of parties actually being the same real world party and finding representations of products actually being the same real world product we typically execute fuzzy data matching of:

Party names as person names and company names
Product descriptions

However I have often seen party names being an integral part of matching products.

Some examples:

Manufacturer Names:

A product is most often being regarded as distinct not only based on the description but also based on the manufacturer. So besides being sharp on matching product descriptions for light bulbs you must also consider if for example the following manufacturer company names are the same or not:

Koninklijke Philips Electronics N.V.
Phillips
Philips Electronic

Author Names:

A book is a product. The title of the book is the description. But also the author’s person name counts. So how do we collect the entire works made by the author:

Hans Christian Andersen
Andersen, Hans Christian
H. C. Andersen

as all three representations are superb bad data?

Bear Names:

A certain kind of teddy bears has a product description like “Plush magenta teddy bear”. But each bear may have a pet name like “Lots-O’-Huggin’ Bear” or just short “Lotso” as seen in the film “Toy Story 3”. And seriously: In real business I have worked with building a bear data model and the related data matching.

PS: For those who have seen Toy Story 3: Is that Lotso one or two real world entities?

Things Change

16th January 201116th January 2011Henrik Gabs Liliendahl7 Comments

Yesterday I posted a small piece called So I’m not a Capricorn? about how astrology may (also) be completely wrong because something has changed.

On the serious side: Don’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

The most known example in data quality prevention is probably that it is of course important that when you enter the address belonging to a customer, you get it right. But as people (and companies) relocates you must also have procedures in place tracking those movements by establishing an Ongoing Data Maintenance program in order to ensure the timeliness of your data.

The other thing, so to speak, is that having things right (the first time) is always seen in the context of what was right at that time. Maybe you always asked your customers for a physical postal address, but because your way of doing business has changed, you actually become much more interested in having the eMail address. And, because What’s in an eMail Address, you would actually like to have had all of them. So your completeness went from being just fine to being just awful by following the same procedure as last year.

Predicting accuracy is hard. Expect to deal with Unpredictable Inaccuracy.

So I’m not a Capricorn?

15th January 201116th January 2011Henrik Gabs Liliendahl4 Comments

Yesterday was my birthday. Being born the 14^th January makes me a Capricorn according to astrology.

Only there is a slight problem. As told in an article on Huffingtonpost an astronomer has kindly remarked that the assignment of signs with the calendar was made thousands of years ago. In the mean time the earth’s orbit has changed, so we should have completely new signs (and personalities?) today.

I guess astrology qualifies as a data and information quality trainwreck by forgetting one of the most common pitfalls in data quality: Things change.

We Will Become More Open

12th January 201115th April 2012Henrik Gabs Liliendahl4 Comments

Yesterday I read a post called Taking Stock Of DQ Predictions For 2011 by Clarke Patterson of Informatica Corporation. Informatica is a well established vendor within data integration, data quality and master data management. The post is based on post called Six Data Management Predictions for 2011 by Steve Sarsfield of Talend. Talend is an open source vendor within data integration, data quality and master data management.

One of the six predictions for 2011 is: Data will become more open.

Steves (open source based) take on this is:

“In the old days good quality reference data was an asset kept in the corporate lockbox. If you had a good reference table for common misspellings of parts, cities, or names for example, the mind set was to keep it close and away from falling into the wrong hands. The data might have been sold for profit or simply not available. Today, there really is no “wrong hands”. Governments and corporations alike are seeing the societal benefits of sharing information. More reference data is there for the taking on the internet from sites like data.gov and geonames.org. That trend will continue in 2011. Perhaps we’ll even see some of the bigger players make announcements as to the availability of their data. Are you listening Google?”

Clarkes (propriety software based) take is as follows:

“As data becomes more open, data quality tools will need to be able to handle data from a greater number of sources used for a broader number of purposes. Gone are the days of single domain data manipulation. To excel in this new, open market, you’ll need a data quality tool that can profile, cleanse and monitor data regardless of domain, that is also locale-aware and has pre-built rules and reference data.”

I agree with both views which by the way are on each of The Two Sides To The IT Coin – Data Centric IT vs Process Centric IT as explained by Robin Bloor in another recent post on the blog by data integration vendor Pervasive Software.

Steves and Clarkes perspectives are also close to me as my 2011 to do list includes:

Involvement in a solution called iDQ (instant Data Quality). The solution is about how we can help system users doing data entry by adding some easy to use technology that explores the cloud for relevant data related to the entry being done.
Helping enhancing a hot MDM hub solution with further data quality and multi-domain capabilities.

Citizen ID and Biometrics

9th January 201129th May 2012Henrik Gabs Liliendahl3 Comments

As I have stated earlier on this blog: The solution to the single most frequent data quality problem being party master data duplicates is actually very simple: Every person (and every legal entity) gets a unique identifier which is used everywhere by everyone.

Some countries, like Denmark where I live, has a unique Citizen ID (National identification number). Some countries are on the way like India with the Aadhaar project. But some of the countries with the largest economies in the world like United Kingdom, Germany and United States don’t seem to getting it in the near future.

I think United Kingdom was close lately, but as I understand it the project was cancelled. As seen in a tweet from a discussion on twitter today the main obstacles were privacy considerations and costs:

A considerable cost in the suggested project in United Kingdom, and also as I have seen in discussions for a US project, may be that an implementation today should also include biometric technology.

The question is however if that is necessary.

If we look at the systems in force today for example in Scandinavia they were implemented +40 years ago, and the Swedish citizen ID was actually implemented without digitalization in 1947. There are discussions going on about biometrics also as this is inevitable for issuing passports anyway. In the mean time the systems however continues to make a lot of data quality prevention and party master data management a lot easier than else around the world without having biometrics as a component.

No doubt about that biometrics will solve some problems related to fraud and so. But these are rare exceptions. So the cost/benefit analysis for enhancing an existing system with biometrics seems to be negative.

I guess the alleged need for biometric may have something to do with privacy considerations in a strange way: Privacy considerations are often overruled by the requirements for fighting terrorism – and here you need biometrics in identity resolution.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph