Prevention – Page 3 – Liliendahl on Data Quality

Single Customer Hierarchy View

28th August 201129th August 2011Henrik Gabs LiliendahlLeave a comment

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity.

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier. The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.

Proactive Data Governance at Work

28th July 201119th May 2015Henrik Gabs Liliendahl1 Comment

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

Data entry directly in the master data hub
Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

When a Cloudburst Hit

11th July 201111th July 2011Henrik Gabs Liliendahl2 Comments

Some days ago Copenhagen was hit by the most powerful cloudburst ever measured here.

More powerful cloudbursts may be usual in warmer regions on the earth, but this one was very unusual at 55 degrees north.

Fortunately there was only material damage, but the material damage was very extensive. When you take a closer look you may divide the underground constructions into two categories.

The first category is facilities constructed with the immediate purpose of use in mind. Many of these facilities are still out of operation.

The second category is facilities constructed with the immediate purpose of use in mind but also designed to resist heavy pouring rain. These facilities kept working during the cloudburst. One example is the metro. If the metro was constructed for only the immediate purpose of use, being circling trains below ground, it would have been flooded within minutes, with the risk of lost lives and a standstill for months.

We have the same situation in data management. Things may seem just fine if data are fit for the immediate purpose of use. But when a sudden change in conditions hit, then you know about data quality.

Extreme Data Quality

6th February 20113rd May 2012Henrik Gabs LiliendahlLeave a comment

This blog post is inspired by reading a blog post called Extreme Data by Mike Pilcher. Mike is COO at SAND, a leading provider of columnar database technology.

The post circles around a Gartner approach to extreme data. While the concept of “Big Data” is focused on the volume of data the concept of “Extreme Data” also takes into account the velocity and the variety of data.

So how do we handle data quality with extreme data being data of great variety moving in high velocity and coming in huge volumes? Will we be able to chase down all root causes of eventual poor data quality in extreme data and prevent the issues upstream or will we have to accept the reality of downstream cleansing of data at the time of consumption?

We might add a sixth reason being the rise of extreme data to the current Top 5 Reasons for Downstream Cleansing.

Things Change

16th January 201116th January 2011Henrik Gabs Liliendahl7 Comments

Yesterday I posted a small piece called So I’m not a Capricorn? about how astrology may (also) be completely wrong because something has changed.

On the serious side: Don’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

The most known example in data quality prevention is probably that it is of course important that when you enter the address belonging to a customer, you get it right. But as people (and companies) relocates you must also have procedures in place tracking those movements by establishing an Ongoing Data Maintenance program in order to ensure the timeliness of your data.

The other thing, so to speak, is that having things right (the first time) is always seen in the context of what was right at that time. Maybe you always asked your customers for a physical postal address, but because your way of doing business has changed, you actually become much more interested in having the eMail address. And, because What’s in an eMail Address, you would actually like to have had all of them. So your completeness went from being just fine to being just awful by following the same procedure as last year.

Predicting accuracy is hard. Expect to deal with Unpredictable Inaccuracy.

Right the First Time

6th January 20116th January 2011Henrik Gabs Liliendahl3 Comments

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

Going Upstream in the Circle

21st July 201021st July 2010Henrik Gabs Liliendahl6 Comments

One of the big trends in data quality improvement is going from downstream cleansing to upstream prevention. So let’s talk about Amazon. No, not the online (book)store, but the river. Also as I am a bit tired about that almost any mention of innovative IT is about that eShop.

A map showing the Amazon River drainage basin may reveal what may go to be a huge challenge in going upstream and solve the data quality issues at the source: There may be a lot of sources. Okay, the Amazon is the world’s largest river (because it carries more water to the sea than any other river), so this may be a picture of the data streams in a very large organization. But even more modest organizations have many sources of data as more modest rivers also have several sources.

By the way: The Amazon River also shares a source with the Orinoco River through the natural Casiquiare Canal, just as many organizations also shares sources of data.

Some sources are not so easy to reach as the most distant source of the Amazon being a glacial stream on a snowcapped 5,597 m (18,363 ft) peak called Nevado Mismi in the Peruvian Andes.

Now, as I promised that the trend on this blog should be about positivity and success in data quality improvement I will not dwell at the amount of work in going upstream and prevent dirty data from every source.

I say: Go to the clouds. The clouds are the sources of the water in the river. Also I think that cloud services will help a lot in improving data quality in a more easy way as explained in a recent post called Data Quality from the Cloud.

Finally, the clouds over the Amazon River sources are made from water evaporated from the Amazon and a lot of other waters as part of the water cycle. In the same way data has a cycle of being derived as information and created in a new form as a result of the actions made from using the information.

I think data quality work in the future will embrace the full data cycle: Downstream cleansing, upstream prevention and linking in the cloud.

What a Lovely Day

14th July 20107th March 2011Henrik Gabs Liliendahl6 Comments

As promised earlier today, here is the first post in an endless row of positive posts about success in data quality improvement.

This beautiful morning I finished yet another of these nice recurring jobs I do from time to time: Deduplicating bunches of files ready for direct marketing making sure that only one, the whole one and nothing but one unique message reaches a given individual decision maker, be that in the online or offline mailbox.

Most jobs are pretty similar and I have a fantastic tool that automates most of the work. I only have the pleasure to learn about the nature of the data and configure the standardisation and matching process accordingly in a user friendly interface. After the automated process I’m enjoying looking for any false positives and checking for false negatives. Sometimes I’m so lucky that I have the chance to repeat the process with a slightly different configuration so we reach the best result possible.

It’s a great feeling that this work reduces the costs of mailings at my clients, makes them look more smart and professional and facilitates that correct measure of response rates that is so essential in planning future even better direct marketing activities.

But that’s not all. I’m also delighted to be able to have a continuing chat about how we over time may introduce data quality prevention upstream at the point of data entry so we don’t have to do these recurring downstream cleansing activities any more. It’s always fascinating going through all the different applications that many organisations are running, some of them so old that I didn’t dream about they existed anymore. Most times we are able to build a solution that will work in the given landscape and anyway soon the credit crunch is totally gone and here we go.

I’ll be back again with more success from the data quality improvement frontier very soon.

Data Quality is an Ingredient, not an Entrée

9th July 20109th July 2010Henrik Gabs Liliendahl8 Comments

Fortunately it is more and more recognized that you don’t get success with Business Intelligence, Customer Relationship Management, Master Data Management, Service Oriented Architecture and many more disciplines without starting with improving your data quality.

But it will be a big mistake to see Data Quality improvement as an entrée before the main course being BI, CRM, MDM, SOA or whatever is on the menu. You have to have ongoing prevention against having your data polluted again over time.

Improving and maintaining data quality involves people, processes and technology. Now, I am not neglecting the people and process side, but as my expertise is in the technology part I will like to mention some the technological ingredients that help with keeping data quality at a tasty level in your IT implementations.

Mashups

Many data quality flaws are (not surprisingly) introduced at data entry. Enterprise data mashups with external reference data may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

External ID’s

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

Error tolerant search

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for introducing duplicates.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

Seeing Is Believing

1st July 20101st July 2010Henrik Gabs LiliendahlLeave a comment

One of my regular activities as a practice manager at a data quality tool vendor is making what we call a ”Test Report”.

Such a “Test Report” is a preferable presale activity regardless of if we are against a competitor or the option of doing nothing (or no more) to improve data quality. In the latter case I usually name our competitor “Laissez-Faire”.

The most test reports I do is revolving around the most frequent data quality issue being duplicates in party master data – names and addresses.

Looking at what an advanced data matching tool can do with your customer master data and other business partner registries is often the decisive factor for choosing to implement the tool.

I like to do the test with a full extract of all current party master data.

A “Test Report” has two major outcomes:

Quantifying the estimated number of different types of duplicates, which is the basis for calculating expected Return on Investment for implementing such an advanced data matching tool.
Qualifying both some typical and some special examples in order to point at the tuning efforts needed both for an initial match and the recommended ongoing prevention.

When participating in follow up meetings I have found that discussions around what a tool can do (and not do) is much more sensible when backed up by concrete numbers and concrete examples with your particular data.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph