Proactive Data Governance at Work

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

  • Data entry directly in the master data hub
  • Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Bookmark and Share

History of Data Quality

When did the first data quality issue occur? Wikipedia says in the data quality article section titled history that it began with the mainframe computer in the United States of America.

Fellow data quality blogger Steve Sarsfield made a blog post a few years ago called A Brief History of Data Quality where it is said “Believe it or not, the concept of data quality has been touted as important since the beginning of the relational database”.

However, a predominant sentiment in the data quality realm is that data quality is not about technology. It is about people. People are the sinners of data quality flaws and as the main part of the problem people should also be the overwhelming part, if not the only part, of the solution.

So I guess data quality challenges were introduced when people showed up in the real world. How and when that happened is a matter of discussion as discussed in the blog post Out of Africa.

As explained in the post Movable Types the invention of movable types in printing some hundreds of years ago (the most important invention since someone invented the wheel for the first time) made a big boost in knowledge sharing among people – and also a big boost in data and information quality issues.

But I think the saying “To err is human, but to really foul things up you need a computer” is valid. Consequently I also think you may need a computer to help with cleaning up the mess and to prevent the mess from happening again. End of (hi)story.    

Bookmark and Share

Finding Finland

This is the fourth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Let’s start with Finnish

Finland is situated in the North Eastern corner of Europe. The Finnish language is together with Estonian and Hungarian much longer south in Europe totally different from the neighboring countries languages which are Germanic or Slavic. Swedish is also an official language in Finland, and in some parts of Finland cities and streets have both (usually totally different) Finnish and Swedish names.

Galoshes

The by far largest company in Finland is the cell phone maker Nokia. Before the cell phone was invented Nokia made paper and galoshes – the old way of connecting people. Nokia also from 2006 to 2008 owned the data quality firm Identity Systems. It was sold to Informatica. I guess Identity Systems connected with the Gaelic Tiger firm Similarity Systems make up the data matching capabilities at Informatica.

Syslore

One of the remaining (relatively) larger independent data matching firms in the world is Syslore. Syslore is hiding in Finland.

Previous Data Quality World Tour blog posts:

Bookmark and Share

The Worst Best Sale

One of my large disappointments from my data quality tool selling days was being involved in a great license sale.

It was a new way of doing business. The initial contact was made through social media by getting in talk with a key employee in one of the not so small players in world-wide multi-channel fashion selling.

It also from there was the good old way of doing business. We spend plus one year with proof of concept and price bargaining until finally standing head to head with one other competitor: The Data Quality quadrant leader owned by a company with the same name as our local airline.

Done deal – and then a few days after much of the business in question was outsourced. I’m actually not aware if the outsource partner had some homemade data quality techniques or couldn’t care less about data quality.

But there is a not opened box with a data quality tool somewhere out there. 

Bookmark and Share

Business and Pleasure

The data quality and master data management (MDM) realm has many wistful songs about unrequited love with “the business”.

This morning I noticed yet a tweet on twitter expressing the pain:

Here Gartner analyst Ted Friedman foresees the doom of MDM if we don’t get at least the traction from “the business” that BI (Business Intelligence) is getting.

In my eyes everything we do in Information Technology is about “the business”. Even computer games and digital entertainment is a core part of the respective industries. I also believe that IT is part of “the business”.

“The rest of the business” does see that some disciplines belong in the IT realm. This goes for database management, programming languages and network protocols. These disciplines are not doomed at all because it is so. “The rest of the business” couldn’t work today without these things around.

Certainly I have seen some IT based disciplines and related tools emerged and then been doomed during my years in the IT business. Anyone remembers case tools?   

With case tools I remember great expectations about business involvement in application design. But according to Wikipedia the main problems with case tools are (were): Inadequate standardization, unrealistic expectations, slow implementation and weak repository controls.

In other words: “The rest of the business” never really got in touch with the case tools because they didn’t work as supposed.

The business traction we see around BI (and the enabling tools) now is in my eyes very much about that the tools have matured, actually works, have become more user friendly and seems to create useful results for “the rest of the business”.

Data quality tools and MDM tools must continue to follow that direction too, because for sure: Data Quality tools and MDM tools does not solve any severe problems internally in the IT part of “the business”.

It’s my pleasure being part of that.

Bookmark and Share

The Art in Data Matching

I’ve just investigated a suspicious customer data match:

A Company on Kunstlaan no 99 in Brussel

was matched with high confidence with:

The Company on Avenue des Arts no 99 in Bruxelles

At first glance it perhaps didn’t look as a confident match, but I guess the computer is right.

The diverse facts are:

  • Brussels is the Belgian capital
  • Belgium has two languages: French and Flemish (a variant of Dutch)
  • Some parts of the country is French, some parts is Flemish and the capital is both
  • Brussels is Bruxelles in French and Brussel in Flemish
  • Kunst is Flemish meaning Art (as in Dutch, German and Scandinavian too)
  • Laan is Flemish meaning Avenue (same origin as Lane I guess)
  • Avenue des Arts is French meaning Avenue of Art (French is easy)

Technically the computer in this case did as follows:

  • Compared the names like “A Company” and “The Company” and found a close edit distance between the two names.
  • Remembered from some earlier occasions that “Kunstlaan” and “Avenue des Arts” was accepted as a match.
  • Remembered from numerous earlier occasions that “Brussel”(or “Brüssel) and “Bruxelles” was accepted as a match.

It may also have been told beforehand that “Kunstlaan” and “Avenue des Art” are two names of the same street in some Belgian address reference data which I guess is a must when doing heavy data matching on the Belgian market.

In this case it was a global match environment not equipped with worldwide address reference data, so luckily the probabilistic learning element in the computer program saved the day.

Bookmark and Share

We Will Become More Open

Yesterday I read a post called Taking Stock Of DQ Predictions For 2011 by Clarke Patterson of Informatica Corporation. Informatica is a well established vendor within data integration, data quality and master data management. The post is based on post called Six Data Management Predictions for 2011 by Steve Sarsfield of Talend. Talend is an open source vendor within data integration, data quality and master data management.

One of the six predictions for 2011 is: Data will become more open.

Steves (open source based) take on this is:

“In the old days good quality reference data was an asset kept in the corporate lockbox. If you had a good reference table for common misspellings of parts, cities, or names for example, the mind set was to keep it close and away from falling into the wrong hands.  The data might have been sold for profit or simply not available.  Today, there really is no “wrong hands”.  Governments and corporations alike are seeing the societal benefits of sharing information. More reference data is there for the taking on the internet from sites like data.gov and geonames.org.  That trend will continue in 2011.  Perhaps we’ll even see some of the bigger players make announcements as to the availability of their data. Are you listening Google?”

Clarkes (propriety software based) take is as follows:

“As data becomes more open, data quality tools will need to be able to handle data from a greater number of sources used for a broader number of purposes.  Gone are the days of single domain data manipulation.  To excel in this new, open market, you’ll need a data quality tool that can profile, cleanse and monitor data regardless of domain, that is also locale-aware and has pre-built rules and reference data.”

I agree with both views which by the way are on each of The Two Sides To The IT Coin – Data Centric IT vs Process Centric IT as explained by Robin Bloor in another recent post on the blog by data integration vendor Pervasive Software.

Steves and Clarkes perspectives are also close to me as my 2011 to do list includes:

  • Involvement in a solution called iDQ (instant Data Quality). The solution is about how we can help system users doing data entry by adding some easy to use technology that explores the cloud for relevant data related to the entry being done.
  • Helping enhancing a hot MDM hub solution with further data quality and multi-domain capabilities.

Bookmark and Share

Technology and Maturity

A recurring subject for me and many others is talking and writing about people, processes and technology including which one is most important, in what sequence they must be addressed and, which is my main concern, how they must be aligned.

As we practically always are referring to the three elements in the same order being people, processes and technology there is certainly an implicit sequence.

If we look at maturity models related to data quality we will recognize that order too.

In the low maturity levels people are the most important aspect and the subject that needs the first and most attention and people are the main enablers for starting moving up in levels.

Then in the middle levels processes are the main concerns as business process reengineering enables going up the levels.

At the top levels we see implemented technology as a main component in the description of being there.    

An example of the growing role of technology is (not surprisingly of course) in the data governance maturity model from the data quality tool vendor DataFlux.

One thing is sure though: You can’t move your organization from the low level to the high level by buying a lot of technology.

It is an evolutionary journey where the technology part comes naturally step by step by taking over more and more of the either trivial or extremely complex work done by people and where technology becomes an increasingly integrated and automated part of the business processes.

Bookmark and Share

Diversity in Data Quality in 2010

Diversity in data quality is a favorite topic of mine and diversity has been my theme word in social media engagement this year.

Fortunately I’m not alone. Others have been writing about diversity in data quality in the past year. Here are some of the contributions I remember:

The Dutch data quality tool vendor Human Inference has a blog called Data Value Talk. Here several posts are about diversity in data quality including the post World Languages Day – Linguistic diversity rules in Switserland!

Another blog based in the Netherlands is from Graham Rhind. Graham (a Brit stranded in Amsterdam) is an expert in international issues with data quality and one of his blog posts this year is called Robert the Carrot.

The MDM Vendor IBM Initiate has a lively blog about Master Data Management and Data Quality. One of the posts this year was an introduction to a webinar. The post by Scott Schumacher (in which I’m proud to be mentioned) is called Join Us to Demystify Multi-Cultural Name Matching.

Rich Murnane posted a funny but learning video with Derek Sivers about Japanese addresses called What is the name of that block? (Again, thanks Rich for the mention).

In the eLearningCurve free webinar series there was a very educational session with Kathy Hunter called Overcoming the Challenges of Global Data.  There is also an interview with Kathy Hunter on the DataQualityPro site.

I also remember we debated the state of the art of data quality tools when it comes to international data in the post by Jim Harris called OOBE-DQ, Where Are You? As Jim mentions in his later post called Do you believe in Magic (Quadrants)?: “It must be noted that many vendors (including the “market leaders”) continue to struggle with their International OOBE-DQ”.

I guess that international capabilities in data quality tools and party master data management solutions will be on the agenda in 2011 as well.

Bookmark and Share

Automation

The article on Wikipedia about automation begins like this:

“Automation is the use of control systems and information technologies to reduce the need for human work in the production of goods and services. In the scope of industrialization, automation is a step beyond mechanization. Whereas mechanization provided human operators with machinery to assist them with the muscular requirements of work, automation greatly decreases the need for human sensory and mental requirements as well. Automation plays an increasingly important role in the world economy.

Automation has had a notable impact in a wide range of industries beyond manufacturing (where it began). Once-ubiquitous telephone operators have been replaced largely by automated telephone switchboards and answering machines.”

Often we discuss the role of technology in solving data and information quality issues. Viewpoints differ between:

  • Technology may be part of the problem, but should not be part of the solution
  • Tools may solve a certain part of the problems by automating else time consuming processes

I am deliberately not stating the extreme viewpoint that tools (or a certain tool) will solve everything, as I have never seen or heard that viewpoint as mentioned in the post Data Quality Tool Exaggerations.

So, given that range, my viewpoint is the second extreme viewpoint of the ones mentioned above.

If you surprisingly should have a more extreme viewpoint you may go to the OCDQ Blog post called What Does Data Quality Technology Want? and vote for the second option there.

Bookmark and Share