What Will you Complicate in the Year of the Rooster?

rooster-6Today is the first day in the new year. The year of the rooster according to the Lunar Calendar observed in East Asia. One of the characteristics of the year of the rooster is that in this year, people will tend to complicate things.

People usually likes to keep things simple. The KISS principle – Keep It Simple, Stupid – has many fans. But not me. Not that I do not like to keep things simple. I do. But only as simple as it should be as Einstein probably said. Sometimes KISS is the shortcut to getting it all wrong.

When working with data quality I have come across the three below examples of striking the right balance in making things a bit complicated and not too simple:

Deduplication

One of the most frequent data quality issues around is duplicates in party master data. Customer, supplier, patient, citizen, member and many other roles of legal entities and natural persons, where the real world entity are described more than once with different values in our databases.

In solving this challenge, we can use methods as match codes and edit distance to detect duplicates. However, these methods, often called deterministic, are far too simple to really automate the remedy. We can also use advanced probabilistic methods. These methods are better, but have the downside that the matching done is hard to explain, repeat and reuse in other contexts.

My best experience is to use something in between these approaches. Not too simple and not too overcomplicated.

Address verification

You can make a good algorithm to perform verification of postal and visit addresses in a database for addresses coming from one country. However, if you try the same algorithm on addresses from another country, it often fails miserably.

Making an algorithm for addresses from all over the world will be very complicated. I have not seen one yet, that works.

My best experience is to accept the complication of having almost as many algorithms as there are countries on this planet.

Product classification

Classifications of products controls a lot of the data quality dimensions related to product master data. The most prominent example is completeness of product information. Whether you have complete product information is dependent on the classification of the product. Some attributes will be mandatory for one product but make no sense at all to another product by a different classification.

If your product classification is too simple, your completeness measurement will not be realistic. A too granular or other way complicated classification system is very hard to maintain and will probably seem as an overkill for many purposes of product master data management.

My best experience is that you have to maintain several classification systems and have a linking between them, both inside your organization and between your trading partners.

Happy New Lunar Year

IT is not the opposite of the business, but a part of it

Yin and yangDuring my professional work and not at least when following the data management talk on social media I often stumble upon sayings as:

  • IT should not drive a CRM / MDM / PIM /  XXX project. The business should do that.
  • IT should not be responsible for data quality. The business should be that.

I disagree with that. Not that the business should not do and be those things. But because IT should be a part of the business.

I have personally always disliked the concept of dividing a company into IT and the business. It is a concept practically only used by the IT (and IT focused consulting) side. In my eyes, IT is part of the business just as much as marketing, sales, accounting and all the other departmental units.

With the raise of digitalization the distinction between IT and the business becomes absolutely ridiculous – not to say dangerous.

We need business minded IT people and IT savvy business people to drive digitilization and take responsibility of data quality.

Used abbreviations:

  • IT = Information Technology
  • CRM = Customer Relationship Management
  • MDM = Master Data Management
  • PIM = Product Information Management

It is not all about People or Processes or Technology

People Processes TechnologyWhen following the articles, blog posts and other inspirational stuff in the data management realm you frequently stumble upon sayings about a unique angle towards what it is all about, like:

  • It is all about people, meaning that if you can change and control the attitude of people involved in data management everything will be just fine. The problem is that people have been around for thousands of years and we have not nailed that one yet – and probably will not do that isolated in the data management realm. But sure, a lot of consultancy fees will go down that drain still.
  • It is all about processes. Yes it is. The only problem is that processes are dependent on people and technology.
  • It is all about technology. Well, no one actually says so. However, relying on that sentiment – and that shit does happen, is a frequent reason why data management initiatives goes wrong.

The trick is to find a balance between a priceworthy people focused approach, a heartfelt process way of going forward and a solid methodology to exploit technology in the good cause of better data management all aligned with achieving business benefits.

How hard can it be?

Bookmark and Share

The Countryside Data Quality Journey Through 2015

I guess this is the time for blog posts about big things that is going to happen in 2015. But you see, we could also take a route away from the motorways and highways and see how the traditional way of life is still unfolding the data quality landscape.

LostWhile the innovators and early adopters are fighting with big data quality the late majority are still trying get the heads around how to manage small data. And that is a good thing, because you cannot utilize big data without solving small data quality problems not at least around master data as told in the post How important is big data quality?

ShittertonSolving data quality problems is not just about fixing data. It is very much also about fixing the structures around data as explained in a post, featuring the pope, called When Bad Data Quality isn’t Bad Data.

No Mans LandA common roadblock on the way to solving data quality issues is that things that what are everybody’s problem tends to be no ones problem. Implementing a data governance programme is evolving as the answer to that conundrum. As many things in life data governance is about to think big and start small as told in the post Business Glossary to Full-Blown Metadata Management or Vice Versa.

UgleyData governance revolves a lot around peoples roles and there are also some specific roles within data governance. Data owners have been known for a long time, data stewards have been around some time and now we also see Chief Data Officers emerge as examined in the post The Good, the Bad, and the Ugly Data Governance Role.

As experienced recently, somewhere in the countryside, while discussing how to get going with a big and shiny data governance programme there is however indeed still a lot to do with trivial data quality issues as fields being too short to capture the real world as reported in the post Everyday Year 2000 Problems.

Wales

Bookmark and Share

The Matrix

The data governance discipline, the Master Data Management (MDM) discipline and the data quality discipline are closely related and happens to be my fields of work as told in the post Data Governance, Data Quality and MDM.

Every IT enabled discipline has an element of understanding people, orchestrating business processes and using technology. The mix may vary between disciplines. This is also true for the three above-mentioned disciplines.

But how important is people, process and technology within these three disciplines? Are the disciplines very different in that perspective? I think so.

When assigning a value from 1 (less important) to 5 (very important) for Data Governance (DG), Master Data Management (MDM) and Data Quality (DQ) I came to this result:

The Matrix

A few words about the reasoning for the highs and lows:

Data governance is in my experience a lot about understanding people and less about using technology as told in the post Data Governance Tools: The New Snake Oil?

I often see arguments about that data quality is all about people too. But:

  • I think you are really talking about data governance when putting the people argument forward in the quest for achieving adequate data quality.
  • I see little room for having the personal opinion of different people dictating what adequate data quality is. This should really be as objective as possible.

Now I am ready for your relentless criticism.

Bookmark and Share

Data Governance in the Self-Service Age

The term self-service is used increasingly within data management. Self-service may be about people within your organization using self-service capabilities as in self-service business intelligence. But probably more disruptive it may be about customer self-service and supplier self-service meaning that people outside your organization are increasingly more dependent on the level of data quality you can offer within your services.

Customer self-service will not succeed without you offering decent data quality related to product information as exemplified in the post Falsus in Uno, Falsus in Omnibus. There will be more happy customer self-service events with more complete product information. Knowing your customer better helps with helping your customer doing self-serving. And in that sense it may be Time To Turn Your Customer Master Data Management Social?

Data entrySupplier self-service will not fly if you do not know your suppliers and their differences, which is quite similar to the concept of knowing your customer as explained in the post Single Business Partner View. When it comes to approaches to data management within supplier engagement there are several options as those examined in the post Sharing Product Master Data.

Do you think data governance is hard enough when dealing with the dear people within your own organization? I have news for you. It’s going to be even tougher when dealing with all the lovely people outside your organization who you will ask to be part of your data collection and consumption workspace.

Bookmark and Share

Post No. 666

666This is post number 666 on this blog. 666 is the number of the beast. Something diabolic.

The first post on my blog came out in June 2009 and was called Qualities in Data Architecture. This post was about how we should talk a bit less about bad data quality and instead focus a bit more on success stories around data quality. I haven’t been able to stick to that all the time. There are so many good data quality train wrecks out there, as the one told in the post called Sticky Data Quality Flaws.

Some of my favorite subjects around data quality were lined up in Post No. 100. They are:

The biggest thing that has happened in the data quality realm during the five years this blog has been live is probably the rise of big data. Or rather the rise of the term big data. This proves to me that changes usually starts with technology. Then we after sometime starts thinking about processes and finally peoples roles and responsibilities.

Bookmark and Share