Liliendahl on Data Quality

Data governance tools: The new snake oil?

25th March 2014Henrik Gabs Liliendahl8 Comments

Traditionally data governance has been around the people and process side of data management. However we now see tools marketed as data governance tools either as a pure play tool for data governance or as a part of a wider data management suite as told in the post Who needs a data governance tool?

Snake-oil The post refers to a report by Sunil Soares. In this report data governance tools are seen as tools related to six areas within enterprise data management: Data discovery, data quality, business glossary, metadata, information policy management and reference data management.

While IBM have tools for everything, according to the report it does not seem like a single tool cures it all – yet.

But will we go there? If we need tools at all, do we need an all-cure snake oil tool for data governance? Or will we be better off with different lubricants for data discovery, data quality, business glossary, metadata, information policy management and reference data management?

Omni-purpose Data Quality

22nd March 2014Henrik Gabs LiliendahlLeave a comment

A recent post on this blog was called Omni-purpose MDM. Herein it is discussed in what degree MDM solutions should cover all business cases where Master Data Management plays a part.

Master Data Management (MDM) is very much about data quality. A recurring question in the data quality realm is about if data quality should be seen as in what degree data are fit for the purpose of use or if the degree of real world alignment is a better measurement.

The other day Jim Harris published a blog post called Data Quality has a Rotating Frame of Reference. In a comment Jim takes up the example of having a valid address in your database records and how measuring address validity may make no sense for measuring how data quality supports a certain business objective.

My experience is that if you look at each business objective at a time measuring data quality against the purpose of use is sound of course. However, if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs. This is explained further within the concept of Data Quality 3.0.

Using the example of a valid address measurements, and actual data quality prevention, typically work with degrees of validity as notably:

The validity in different levels as area, entrance and specific unit as examined in the post A Universal Challenge.
The validity of related data elements as an address may be valid but the addressee is not as examined in the post Beyond Address Validation.

Data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.

Data Quality in Different Languages

20th March 201421st March 2014Henrik Gabs Liliendahl4 Comments

The term ”data quality” exists in many different languages.

As reported in the post Häagen-Dazs Datakvalitet, the Scandinavian word for data quality is datakvalitet. Well, actually there is no such language as Scandinavian, but datakvalitet is used in Danish, Swedish and Norwegian all together. Maybe even in both Norwegian languages, though Google Translate only know of one Norwegian language.

In other Germanic languages the words for data quality are close to datakvalitet. In German: Datenqualität. In Dutch: Datakwaliteit.

The above terms are compound words. Even though English is also classified as a Germanic language we see a Latin influence as “data quality” is two words in English. And that goes for all English variants. It is only when it comes to if we have to standardise this or standardize that we are in trouble. British English is best when we have to select if data quality improvement is a program or a programme.

In true Latin languages we have three words. French: Qualité des données. Spanish: Calidad de datos.

And then there are of course terms in other alphabets than latin and other script systems:

Omni-purpose MDM

18th March 201418th March 2014Henrik Gabs Liliendahl4 Comments

The terms omni-channel banking and omni-channel retailing are becoming popular within businesses these days.

In this context omni (meaning all) is considered to be something more advanced than multi (meaning many) as in multi-channel retailing.

Data management, including Master Data Management (MDM), is always a bit behind the newest business trends. In our discipline we have hardly even entered the multi stage yet.

Some moons ago I wrote about multi-channel data matching on the Informatica Perspectives blog in the post Five Future Data Matching Trends. Today, on the same blog, Stephan Zoder has the post asking: Is your social media investment hampered by your “data poverty”?

Herein Stephan examines the possible benefits of multi-channel data matching based on a business case within the gambling industry.

Using omni in relation to MDM was seen in a vendor presentation at the Gartner MDM Summit in London last week as reported in the post Slicing the MDM Space. Omnidomain MDM was the proposed term here.

The end goal should probably be something that could be coined as omni-purpose MDM. This will be about advancing MDM capabilities to cover multiple domains and embrace multiple channels in order to obtain a single view of every core entity that can be used in every business process.

The Intersections of Big Data, Data Quality and Master Data Management

15th March 201415th March 2014Henrik Gabs Liliendahl3 Comments

This blog has since 2009 been very much about the intersection between Master Data Management (MDM) and data quality. These two disciplines are closely related as the vast majority of work with data quality improvement going on is related to master data taking some slightly different forms depending on if we are fighting with party master data, product master data, location master data or other master data domains.

In mid 2011 the term big data became more popular than data quality as reported in post Data Quality vs Big Data. After initial euphoria about big data and focus on the analytical side of big data the question about big data quality has fortunately gained traction. Apart from the quality of the algorithms used in big data analytics the quality of the big data is definitely a factor to be taken very serious when deciding to act on the outcomes of big data analytics.

There are questions about the quality of the big data itself as for example told in the post Crap, Damned Crap, and Big Data. This story is about social data and how crappy these data streams may be. Another prominent flavor of big data is sensor data where there also may be issues of data quality as in the example mentioned in the post Going in the Wrong Direction.

As examined in the latter example the quality of big data will in many cases have to be measured by how well big data relates to internal master data and external reference data. You may find more examples of that in the post Big Data and Multi-Domain Master Data Management.

Slicing the MDM Space

13th March 201414th March 2014Henrik Gabs Liliendahl4 Comments

These days I am attending the Gartner MDM summit in London.

MDM (Master Data Management) initiatives and MDM solutions are not created equal and different ways of slicing the MDM world were put forward on the first day.

Gartner is famous for the magic quadrants and during the customer master data quadrant presentation I heard Bill O’Kane explain why this is a separate quadrant from the product master data quadrant and why there are no challengers and no visionaries.

In another session about MDM milestones Bill O’Kane for this context sliced the MDM world a bit differently based on moving between MDM styles. Here we had:

Business-to-consumer (B2C) Customer Data Integration (CDI)
Business-to-business (B2B) customer MDM, Product Information Management (PIM) and other domains.

The vendors in general seems to want to do everything MDM.

Stibo Systems, a traditional PIM vendor, presented the case for multidomain MDM based on how things have developed within eCommerce. Stibo even smuggled the term omnidomain MDM into the slides. A marketing gig in the making perhaps.

The megavendors has bought who ever they need to be multidomain.

Some new solutions are born in the multidomain age. Semarchy is an interesting example as they are so the evolutionary way.

Data Entry by Employees

11th March 2014Henrik Gabs Liliendahl4 Comments

A recent infographic prepared by Trillium Software highlights a fact about data quality I personally have been preaching about a lot:

This number is (roughly) sourced from a study by Wayne W. Eckerson of The Data Warehouse Institute made in 2002:

So, in the fight against bad data quality, a good place to start will be helping data entry personnel doing it right the first time.

One way of achieving that is to cut down on the data being entered. This may be done by picking the data from sources already available out there instead of retyping things and making those annoying flaws.

If we look at the two most prominent master data domains, some ideas will be:

In the product domain I have seen my share of product descriptions and specifications being reentered when flowing down in the supply chain of manufacturers, distributors, re-sellers, retailers and end users. Better batch interfaces with data quality controls is one way of coping with that. Social collaboration is another one as told in the post Social PIM.
In the customer, or rather party, domain we have seen an uptake of using address validation. That is good. However, it is not good enough as discussed in the post Beyond Address Validation.

Attending a MDM Summit

8th March 2014Henrik Gabs Liliendahl3 Comments

Going to MDM (Master Data Management) conferences is a great learning experience.

If we look at world-wide conferences there are two series of conferences going on every year:

The Master Data Management Summit series lead by the MDM Institute, which is Aaron Zornes
The Master Data Management summit series organized by Gartner (the analyst firm)

Both those traveling events are coming to London this spring. First up is the Gartner event the 12^th and 13^th March. As I have been to the Zornes show several times before, I am looking forward to be at the more expensive Gartner performance this year.

The learning actually starts when you are looking at company names on the attendee list. Some master data issues are showcased here:

There will be people from these three well-known British supermarkets:

The good folks at Kühne + Nagel (AG & Co.) KG is having a hard time putting their proper name in there:

And what a timely name for this Swiss company:

Agile MDM. Using IT.

6th March 2014Henrik Gabs Liliendahl1 Comment

MDM (Master Data Management) projects may have a bad name as large IT projects using huge amount of resources, taken a lot of time and ending up with producing very little measurable results.

This phenomenon isn’t new at all in the IT world. There are often two answers to that challenge:

Don’t treat it as an IT project. It’s all about people and culture.
Do it the agile way using IT.

After having a lot of fun with option one you will sooner or later realize that the master data pain points still exists and then come to option two.

I have earlier written some agile posts about Lean MDM and Eating the MDM Elephant and the relevance of having MDM technology that supports the agile way has in my eyes only become more and more apparent since then.

What are your experiences? Who is doing agile MDM – using IT? Is it good?

Data is the new petroleum

1st March 2014Henrik Gabs LiliendahlLeave a comment

”Data is the new oil” is a well-known term today used to emphasize on the fact that data and your ability to exploit data can make you rich.

The rise of big data has put some more fire to this burning issue indeed with the variant saying “Big data is the new oil”.

Now, as oil is many things, data is many things too. As few of us actually use crude oil, also called petroleum, few of us don’t use raw data to get rich. We use information distilled from raw data for specific purposes. One example is examined in the post Mashing Up Big Reference Data and Internal Master Data.

This brings me to that we have the question of quality of oil just as we have the question of the quality of data as explained nicely by Ken O’Connor in the post Data is the new oil – what grade is yours?

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph