Liliendahl on Data Quality

Data Governance in the Self-Service Age

8th November 20148th November 2014Henrik Gabs Liliendahl3 Comments

The term self-service is used increasingly within data management. Self-service may be about people within your organization using self-service capabilities as in self-service business intelligence. But probably more disruptive it may be about customer self-service and supplier self-service meaning that people outside your organization are increasingly more dependent on the level of data quality you can offer within your services.

Customer self-service will not succeed without you offering decent data quality related to product information as exemplified in the post Falsus in Uno, Falsus in Omnibus. There will be more happy customer self-service events with more complete product information. Knowing your customer better helps with helping your customer doing self-serving. And in that sense it may be Time To Turn Your Customer Master Data Management Social?

Supplier self-service will not fly if you do not know your suppliers and their differences, which is quite similar to the concept of knowing your customer as explained in the post Single Business Partner View. When it comes to approaches to data management within supplier engagement there are several options as those examined in the post Sharing Product Master Data.

Do you think data governance is hard enough when dealing with the dear people within your own organization? I have news for you. It’s going to be even tougher when dealing with all the lovely people outside your organization who you will ask to be part of your data collection and consumption workspace.

The Place for Data Matching in and around MDM

6th November 2014Henrik Gabs Liliendahl4 Comments

Data matching has increasingly become a component of Master Data Management (MDM) solutions. This has mostly been the case for MDM of customer data solutions, but it is also a component of MDM of product data solutions not at least when these solutions are emerging into the multi-domain MDM space.

The deployment of data matching was discussed nearly 5 years ago in the post Deploying Data Matching.

Neural Network While MDM solutions since then have been picking up on the share of the data matching being done around it is still a fairly small proportion of data matching that is performed within MDM solutions. Even if you have a MDM solution with data matching capabilities, you might still consider where data matching should be done. Some considerations I have come across are:

Acquisition and silo consolidation circumstances

A common use case for data matching is as part of an acquisition or internal consolidation of data silos where two or more populations of party master data, product master data and other important entities are to be merged into a single version of truth (or trust) in terms of uniqueness, consistency and other data quality dimensions.

While the MDM hub must be the end goal for storing that truth (or trust) there may be good reasons for doing the data matching before the actual on-boarding of the master data.

These considerations includes

Ability to handle a large batch of data as mentioned in the post Testing a Data Matching Tool
Inclusion of external data in the matching logic as examined in the post Some Depuplication Tactics
The actual matching effectiveness as explored in the post 3 out of 10

The point of entry

The MDM solution isn’t for many good reasons not always the system of entry. To do the data matching at the stage of data being put into the MDM hub may be too late. Expanding the data matching capabilities as Service Oriented Architecture component may be a better way as pondered in the post Service Oriented Data Quality.

Avoiding data matching

Even being a long time data matching practitioner I’m afraid I have to bring up the subject of avoiding data matching as further explained in the post The Good, The Better and The Best Way of Avoiding Duplicates.

Falsus in Uno, Falsus in Omnibus

2nd November 2014Henrik Gabs Liliendahl3 Comments

The title of this blog post is a Latin legal phrase meaning “false in one thing, false in everything”. It refers to a principle about regarding everything a witness says as not credible, if one thing said by the witness is proven not to be true. This has been a part of the plot in plenty of courtroom films and TV-shows.

This principle has meaning related to data quality too. An example from direct marketing will be a receiver of a direct mail saying: “If you can’t get my name right, how can I trust you in getting anything right during a purchase?”

Somed data quality dimensions — Some data quality dimensions

An example from the multi-channel world, or should we say omni-channel today, would be a shopper saying: “If you say one thing about the product in the shop and another thing on the website, how can I trust any of your product information?” Falsehood in omni-channel so to speak.

Measuring the impact of such attitudes and thereby the Return on Investment (ROI) in data quality improvement based on this principle is very hard. We usually only have random anecdotal evidence about that this happens.

But, what we can say is: Don’t lie in court and don’t neglect your data quality. It will hurt your credibility and then in the end your creditworthiness.

The Path to Multi-Domain MDM

1st November 20141st November 2014Henrik Gabs LiliendahlLeave a comment

Multi-Domain Master Data Management (MDM) is about dealing with master data in several different data domains as customer (or party), product, location, asset or calendar. The typical track today is starting in one domain. There are many, even contradicting, good reasons for that.

Depending on in what industry vertical you are the main pain points that urges you to start doing MDM belongs to either of the MDM domains. Customer MDM is the most common one typically seen where you have a large number of customer records in your databases. We see starting with product MDM in organizations with many products in the databases. This is for example the case for large retailers and distributors.

It can be other domains as well. One example from a MDM conference I recall is that Royal Mail in the UK started with the calendar domain. Besides that this domain had pain points for that organization a reason to do that was to start small before taking on the big chunks.

Even though you start with one domain, you must think about the end state. One thing to consider multi-domain wise is the data governance part, as you will not come out well if you choose different approaches to data governance for each master data domain. Of course, the technology part is there too. Choosing a solution that eventually will take you all the way is appealing to many organizations looking for a MDM platform.

Another approach to multi-domain MDM can be through what I know at least one MDM tool vendor calls Evolutionary MDM™. But we can call it other things. Agile or lean MDM for example. Using that approach you do not solve everything within one domain before going on to the next one.

It is about eliminating as many pain points as possible in the shortest feasible time-frame.

The Scary Data Lake

30th October 201431st October 2014Henrik Gabs LiliendahlLeave a comment

The concept of the data lake seems to have a revival these days. Perhaps it reemerged about a year ago as told in the post Do You Like the Lake?

The idea of having a data lake scares the hell out of data quality people as seen in the title used by Garry Allemann in the post Data Lake vs Data Cesspool.

The data lake is mostly promoted as a data source for analytics opposite to something being part of daily operations. That is horrifying enough. Imagine Joe last month using 80 % of his time fixing data quality issues when doing one batch of analytics. And this month Sue spend 80 % of her time fixing data quality issues in the same data lake in her analytic quest and 50 % of Sue’s data quality issues are in fact the same as Joe’s challenges from last month.

As Halloween is just around the corner, it is time to ask: What is your data lake horror story?

Data = Money

27th October 2014Henrik Gabs Liliendahl8 Comments

It has often been said, written, blogged and tweeted that data itself is useless. It is all about information.

Indeed. In the same way money itself is worthless. It is all about all the good stuff you can buy for money.

So, if you care about money, you should care about data too.

Post No. 666

26th October 201427th October 2014Henrik Gabs Liliendahl2 Comments

This is post number 666 on this blog. 666 is the number of the beast. Something diabolic.

The first post on my blog came out in June 2009 and was called Qualities in Data Architecture. This post was about how we should talk a bit less about bad data quality and instead focus a bit more on success stories around data quality. I haven’t been able to stick to that all the time. There are so many good data quality train wrecks out there, as the one told in the post called Sticky Data Quality Flaws.

Some of my favorite subjects around data quality were lined up in Post No. 100. They are:

The role of technology in data quality improvement. This subject was discussed not long ago in the post Reading the right Reading.
Fit for purpose versus real world alignment, a subject revisited recently in the post called The “Fit for Purpose” Trap.
Diversity in data quality was touched latest in the post American Exceptionalism in Data Management.

The biggest thing that has happened in the data quality realm during the five years this blog has been live is probably the rise of big data. Or rather the rise of the term big data. This proves to me that changes usually starts with technology. Then we after sometime starts thinking about processes and finally peoples roles and responsibilities.

The “Fit for Purpose” Trap

20th October 201420th October 2014Henrik Gabs Liliendahl7 Comments

Gartner (the analyst firm), represented by Saul Judah, takes data quality back to basics in the recent post called Data Quality Improvement.

While I agree with the sentiment around measuring the facts as expressed in the post I have cautions about relying on that everything is good when data are fit for the purpose for business operations.

Some clues lies in the data quality dimensions mentioned in the post:

Accuracy (for now):

As said in the Gartner post data are indeed temporal. The real world changes and so does business operations. When you got your data fit for the purpose of use the business operations has changed. And when you got your data re-fit for the new purpose of use the business operations has changed again.

Furthermore most organizations can’t take all business operations into account at the same time. If you go down the fit for purpose track you will typically address a single business objective and make data fit for that purpose. Not at least when dealing with master data there are many business objectives and derived purposes of use. In my experience that leads to this conclusion:

“While we value that data are of high quality if they are fit for the intended use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”

Existence – an aspect of completeness:

The Gartner post mentions a data quality dimension being existence. I tend to see this as an aspect of the broader used term completeness.

For example having a fit for purpose completeness related to product master data has been a huge challenge for many organizations within retail and distribution during the last years as explained in the post Customer Friendly Product Master Data.

Omni

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph