Liliendahl on Data Quality

A Prince and a Princess

8th January 20118th January 2011Henrik Gabs LiliendahlLeave a comment

Even though I’m not a royalist I’m afraid this will be the second hypocritical blog post within a year with a royal introduction. The first one was about Royal Exceptions.

The big news on all channels today in Denmark (and Australia) is that (Australian born) Crown Princess Mary has given birth to twins; a boy and a girl then being a prince and a princess or as we say in blunt data quality language: A male and a female.

The gender of individuals has always been a prominent element in party master data management and not at least in data matching.

Right now we are having a discussion in the LinkedIn Data Matching group concerning Data Quality of Gender / Sex Codes and the Impacts on Identity Data Matching.

So far we have covered issues as:

Trustworthiness for assigned gender codes
Scoring mechanisms in matching including gender codes
Diversity impact in assigning/verifying gender from names
Using gender codes for salutation

Please join the discussion and if you are not already a member of the LinkedIn Data Matching group: Join the group here.

Right the First Time

6th January 20116th January 2011Henrik Gabs Liliendahl3 Comments

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

A Data Quality Immaturity Model

5th January 20115th January 2011Henrik Gabs Liliendahl7 Comments

There are several maturity models related to data quality out there. I have found a good collection in this document from NASCIO.

I guess the mother of all maturity models is the Capability Maturity Model (CMM). This model is related to software development.

There is also a parody model for that called the Capability Immaturity Model (CIMM). Inspired by an article yesterday by Jill Dyché on Information Management called Anti-Predictions for 2011 I have found that the CIMM model is easily adapted to a data quality immaturity model with levels from zero to minus three as this:

0 : Negligent

The organization pays lip service, often with excessive fanfare, to implementing data quality processes, but lacks the will to carry through the necessary effort. Whereas level 1 assumes eventual success in producing and measuring quality data, level 0 organizations generally fail to have any idea about the actual horrible quality of the data assets.

-1 : Obstructive

Processes, however inappropriate and ineffective, are implemented with rigor and tend to obstruct work. Adherence to process is the measure of success in a level -1 organization. Any actual creation of quality data is incidental. The quality of any data is not assessed, presumably on the assumption that if the proper process was followed, high quality data is guaranteed.

-2 : Contemptuous

While processes exist, they are routinely ignored by the staff and those charged with overseeing the processes are regarded with hostility. Measurements are fudged to make the organization look good.

-3 : Undermining

Not content with faking their own performance, undermining departments within the organization routinely work to downplay and sabotage the efforts of rival departments. This is worst where company policy causes departments to compete for scarce resources, which are allocated to the loudest advocates.

Technology and Maturity

4th January 20114th January 2011Henrik Gabs Liliendahl2 Comments

A recurring subject for me and many others is talking and writing about people, processes and technology including which one is most important, in what sequence they must be addressed and, which is my main concern, how they must be aligned.

As we practically always are referring to the three elements in the same order being people, processes and technology there is certainly an implicit sequence.

If we look at maturity models related to data quality we will recognize that order too.

In the low maturity levels people are the most important aspect and the subject that needs the first and most attention and people are the main enablers for starting moving up in levels.

Then in the middle levels processes are the main concerns as business process reengineering enables going up the levels.

At the top levels we see implemented technology as a main component in the description of being there.

An example of the growing role of technology is (not surprisingly of course) in the data governance maturity model from the data quality tool vendor DataFlux.

One thing is sure though: You can’t move your organization from the low level to the high level by buying a lot of technology.

It is an evolutionary journey where the technology part comes naturally step by step by taking over more and more of the either trivial or extremely complex work done by people and where technology becomes an increasingly integrated and automated part of the business processes.

1/1/11

1st January 20112nd January 2011Henrik Gabs Liliendahl4 Comments

Date formats have always been a trouble maker.

1/1/11 is one format for expressing today’s date. 2011/01/01 is another one. 1^st January 2011 is a third way. January 1, 2011 is a fourth way.

That is of course given you use the Gregorian calendar and you don’t live far east from me, where it’s already a new day when I post this post.

1/1/11 is not one of those days where we have the usual confusion between the American way of expressing a date using the sequence month/day/year opposite to the common straight forward European sequence being day/month/year.

But in a few hours when it’s 2/1/11 in Europe and some hours later when it’s 1/2/11 in North America we are confused.

So, data quality folks, remember putting your dates in a unique format starting from tomorrow the 2^nd January 2011 or, if you like, January 2, 2011.

Happy New Unique Year.

Superb Bad Data

29th December 2010Henrik Gabs Liliendahl2 Comments

When working with data and information quality we often use words as rubbish, poor, bad and other negative words when describing data that need to be enhanced in order to achieve better data quality. However, what is bad may have been good in the context where a particular set of data originated.

Right now I have some fun with author names.

An example of good and bad could be with an author I have used several times on this blog, namely the late fairy tale writer called in full name:

Hans Christian Andersen

When gazing through data you will meet his name represented this way:

Andersen, Hans Christian

This representation is fit for purpose of use for example when looking for a book by this author at a library, where you sort the fictional books by the surname of the author.

The question is then: Do you want to have the one representation, the other representation or both?

You may also meet his name in another form in another field than the name field. For example there is a main street in Copenhagen called:

H. C. Andersens Boulevard

This is the representation of the real world name of the street holding a common form of the authors name with only initials.

Diversity in Data Quality in 2010

27th December 2010Henrik Gabs LiliendahlLeave a comment

Diversity in data quality is a favorite topic of mine and diversity has been my theme word in social media engagement this year.

Fortunately I’m not alone. Others have been writing about diversity in data quality in the past year. Here are some of the contributions I remember:

The Dutch data quality tool vendor Human Inference has a blog called Data Value Talk. Here several posts are about diversity in data quality including the post World Languages Day – Linguistic diversity rules in Switserland!

Another blog based in the Netherlands is from Graham Rhind. Graham (a Brit stranded in Amsterdam) is an expert in international issues with data quality and one of his blog posts this year is called Robert the Carrot.

The MDM Vendor IBM Initiate has a lively blog about Master Data Management and Data Quality. One of the posts this year was an introduction to a webinar. The post by Scott Schumacher (in which I’m proud to be mentioned) is called Join Us to Demystify Multi-Cultural Name Matching.

Rich Murnane posted a funny but learning video with Derek Sivers about Japanese addresses called What is the name of that block? (Again, thanks Rich for the mention).

In the eLearningCurve free webinar series there was a very educational session with Kathy Hunter called Overcoming the Challenges of Global Data. There is also an interview with Kathy Hunter on the DataQualityPro site.

I also remember we debated the state of the art of data quality tools when it comes to international data in the post by Jim Harris called OOBE-DQ, Where Are You? As Jim mentions in his later post called Do you believe in Magic (Quadrants)?: “It must be noted that many vendors (including the “market leaders”) continue to struggle with their International OOBE-DQ”.

I guess that international capabilities in data quality tools and party master data management solutions will be on the agenda in 2011 as well.

Happy Days

23rd December 201023rd December 2010Henrik Gabs Liliendahl4 Comments

Whether you are celebrating Christmas or not, whether you say Merry Christmas, Feliz Navidad, Frohe Weihnachten, Joyeux Noël, God Jul or plenty of other greetings from around the world: May these days be a wonderful time for you and yours and thanks for reading this blog.

Automation

22nd December 201022nd December 2010Henrik Gabs LiliendahlLeave a comment

The article on Wikipedia about automation begins like this:

“Automation is the use of control systems and information technologies to reduce the need for human work in the production of goods and services. In the scope of industrialization, automation is a step beyond mechanization. Whereas mechanization provided human operators with machinery to assist them with the muscular requirements of work, automation greatly decreases the need for human sensory and mental requirements as well. Automation plays an increasingly important role in the world economy.

Automation has had a notable impact in a wide range of industries beyond manufacturing (where it began). Once-ubiquitous telephone operators have been replaced largely by automated telephone switchboards and answering machines.”

Often we discuss the role of technology in solving data and information quality issues. Viewpoints differ between:

Technology may be part of the problem, but should not be part of the solution
Tools may solve a certain part of the problems by automating else time consuming processes

I am deliberately not stating the extreme viewpoint that tools (or a certain tool) will solve everything, as I have never seen or heard that viewpoint as mentioned in the post Data Quality Tool Exaggerations.

So, given that range, my viewpoint is the second extreme viewpoint of the ones mentioned above.

If you surprisingly should have a more extreme viewpoint you may go to the OCDQ Blog post called What Does Data Quality Technology Want? and vote for the second option there.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph