Extreme Data Quality

This blog post is inspired by reading a blog post called Extreme Data by Mike Pilcher. Mike is COO at SAND, a leading provider of columnar database technology.

The post circles around a Gartner approach to extreme data. While the concept of “Big Data” is focused on the volume of data the concept of “Extreme Data” also takes into account the velocity and the variety of data.

So how do we handle data quality with extreme data being data of great variety moving in high velocity and coming in huge volumes? Will we be able to chase down all root causes of eventual poor data quality in extreme data and prevent the issues upstream or will we have to accept the reality of downstream cleansing of data at the time of consumption?

We might add a sixth reason being the rise of extreme data to the current Top 5 Reasons for Downstream Cleansing.

Bookmark and Share

Business and Pleasure

The data quality and master data management (MDM) realm has many wistful songs about unrequited love with “the business”.

This morning I noticed yet a tweet on twitter expressing the pain:

Here Gartner analyst Ted Friedman foresees the doom of MDM if we don’t get at least the traction from “the business” that BI (Business Intelligence) is getting.

In my eyes everything we do in Information Technology is about “the business”. Even computer games and digital entertainment is a core part of the respective industries. I also believe that IT is part of “the business”.

“The rest of the business” does see that some disciplines belong in the IT realm. This goes for database management, programming languages and network protocols. These disciplines are not doomed at all because it is so. “The rest of the business” couldn’t work today without these things around.

Certainly I have seen some IT based disciplines and related tools emerged and then been doomed during my years in the IT business. Anyone remembers case tools?   

With case tools I remember great expectations about business involvement in application design. But according to Wikipedia the main problems with case tools are (were): Inadequate standardization, unrealistic expectations, slow implementation and weak repository controls.

In other words: “The rest of the business” never really got in touch with the case tools because they didn’t work as supposed.

The business traction we see around BI (and the enabling tools) now is in my eyes very much about that the tools have matured, actually works, have become more user friendly and seems to create useful results for “the rest of the business”.

Data quality tools and MDM tools must continue to follow that direction too, because for sure: Data Quality tools and MDM tools does not solve any severe problems internally in the IT part of “the business”.

It’s my pleasure being part of that.

Bookmark and Share

Magic Quadrant Diversity

The Magic Quadrants from Gartner Inc. ranks the tool vendors within a lot of different IT disciplines. Related to my work the quadrants for data quality tools and master data management is the most interesting ones.

However, the quadrants examine the vendors in a global scope. But, how are the vendors doing in my country?

I tried to look up a few of the vendors in a local business directory for Denmark provided (free to use on the web) by the local Experian branch.

DataFlux

First up is DataFlux, the (according to Gartner) leading data quality tool vendor.

Result: No hits.

Knowing that DataFlux is owned by SAS Institute will however, with a bit of patience, finally bring you to information about the DataFlux product deep down on the SAS local website.

PS: Though SAS is more known here as the main airline (Scandinavian Airlines System), SAS Institute is actually very successful in Denmark having a much larger part of the Business Intelligence market here than most places else.

Informatica

Next up is Informatica, a well positioned company in both the quadrant for data quality tools and customer master data management.

Result: No Hits.

Here you have to know that Informatica is represented in the Nordic area by a company called Affecto. You will find information about the Informatica products deep down on the Affecto website – along with the competing product FirstLogic owned by Business Objects (owned by SAP) also historically represented by Affecto.

Stibo Systems

Stibo Systems may not be as well known as the two above, but is tailing the mega vendors in the quadrant for Product Master Data Management, as mentioned recently in a blog post by Dan Power.

Result: Hit:

They are here with over 500 employees – at least in the legal entity called Stibo where Stibo Systems is an alternate name and brand. And it’s no kidding; I visited them last month at the impressive head quarter near Århus (the second largest city in Denmark).

Bookmark and Share

No Re-Tweets?

12 hours ago from now I noticed the following tweet on Twitter from the profile @GartnerTedF:

The person behind @GartnerTedF is the analyst Ted Friedman of Gartner, Inc. He is a very important person in the data quality realm as he co-writes the Magic Quadrant.

Many of Ted’s tweets are usually re-tweeted by other tweeps.

But not this one.

I think I know why: It’s because technology simply doesn’t work.

I have noticed this often. What happens is that twitter somehow simply doesn’t index some tweets from time to time, so people don’t see them.

What is Data Quality anyway?

The above question might seem a bit belated after I have blogged about it for 9 months now. But from time to time I ask myself some questions like:

Is Data Quality an independent discipline? If it is, will it continue to be that?

Data Quality is (or should) actually be a part of a lot of other disciplines.

Data Governance as a discipline is probably the best place to include general data quality skills and methodology – or to say all the people and process sides of data quality practice. Data Governance is an emerging discipline with an evolving definition, says Wikipedia. I think there is a pretty good chance that data quality management as a discipline will increasingly be regarded as a core component of data governance.

Master Data Management is a lot about Data Quality, but MDM could be dead already. Just like SOA. In short: I think MDM and SOA will survive getting new life from the semantic web and all the data resources in the cloud. For that MDM and SOA needs Data Quality components. Data Quality 3.0 it is.

You may then replace MDM with CRM, SCM, ERP and so on and here by extend the use of Data Quality components from not only dealing with master data but also transaction data.

Next questions: Is Data Quality tools an independent technology? If it is, will it continue to be that?

It’s clear that Data Quality technology is moving from being stand alone batch processing environments, over embedded modules to, oh yes, SOA components.

If we look at what data quality tools today actually do, they in fact mostly support you with automation of data profiling and data matching, which is probably only some of the data quality challenges you have.

In the recent years there has been a lot of consolidation in the market around Data Integration, Master Data Management and Data Quality which certainly is telling that the market need Data Quality technology as components in a bigger scheme along with other capabilities.

But also some new pure Data Quality players are established – and I think I often see some old folks from the acquired entities at these new challengers. So independent Data Quality technology is not dead and don’t seem to want to be that.

Bookmark and Share