A Prince and a Princess

Even though I’m not a royalist I’m afraid this will be the second hypocritical blog post within a year with a royal introduction.  The first one was about Royal Exceptions.

The big news on all channels today in Denmark (and Australia) is that (Australian born) Crown Princess Mary has given birth to twins; a boy and a girl then being a prince and a princess or as we say in blunt data quality language: A male and a female.  

The gender of individuals has always been a prominent element in party master data management and not at least in data matching.

Right now we are having a discussion in the LinkedIn Data Matching group concerning Data Quality of Gender / Sex Codes and the Impacts on Identity Data Matching.

So far we have covered issues as:

  • Trustworthiness for assigned gender codes
  • Scoring mechanisms in matching including gender codes
  • Diversity impact in assigning/verifying gender from names
  • Using gender codes for salutation

Please join the discussion and if you are not already a member of the LinkedIn Data Matching group: Join the group here.

Bookmark and Share

Right the First Time

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

Bookmark and Share

A Data Quality Immaturity Model

There are several maturity models related to data quality out there. I have found a good collection in this document from NASCIO.

I guess the mother of all maturity models is the Capability Maturity Model (CMM). This model is related to software development.

There is also a parody model for that called the Capability Immaturity Model (CIMM). Inspired by an article yesterday by Jill Dyché on Information Management called Anti-Predictions for 2011 I have found that the CIMM model is easily adapted to a data quality immaturity model with levels from zero to minus three as this:

0 : Negligent

The organization pays lip service, often with excessive fanfare, to implementing data quality processes, but lacks the will to carry through the necessary effort. Whereas level 1 assumes eventual success in producing and measuring quality data, level 0 organizations generally fail to have any idea about the actual horrible quality of the data assets.

-1 : Obstructive

Processes, however inappropriate and ineffective, are implemented with rigor and tend to obstruct work. Adherence to process is the measure of success in a level -1 organization. Any actual creation of quality data is incidental. The quality of any data is not assessed, presumably on the assumption that if the proper process was followed, high quality data is guaranteed.

-2 : Contemptuous

While processes exist, they are routinely ignored by the staff and those charged with overseeing the processes are regarded with hostility. Measurements are fudged to make the organization look good.

-3 : Undermining

Not content with faking their own performance, undermining departments within the organization routinely work to downplay and sabotage the efforts of rival departments. This is worst where company policy causes departments to compete for scarce resources, which are allocated to the loudest advocates.

Bookmark and Share

Technology and Maturity

A recurring subject for me and many others is talking and writing about people, processes and technology including which one is most important, in what sequence they must be addressed and, which is my main concern, how they must be aligned.

As we practically always are referring to the three elements in the same order being people, processes and technology there is certainly an implicit sequence.

If we look at maturity models related to data quality we will recognize that order too.

In the low maturity levels people are the most important aspect and the subject that needs the first and most attention and people are the main enablers for starting moving up in levels.

Then in the middle levels processes are the main concerns as business process reengineering enables going up the levels.

At the top levels we see implemented technology as a main component in the description of being there.    

An example of the growing role of technology is (not surprisingly of course) in the data governance maturity model from the data quality tool vendor DataFlux.

One thing is sure though: You can’t move your organization from the low level to the high level by buying a lot of technology.

It is an evolutionary journey where the technology part comes naturally step by step by taking over more and more of the either trivial or extremely complex work done by people and where technology becomes an increasingly integrated and automated part of the business processes.

Bookmark and Share

1/1/11

Date formats have always been a trouble maker.

1/1/11 is one format for expressing today’s date. 2011/01/01 is another one. 1st January 2011 is a third way. January 1, 2011 is a fourth way.

That is of course given you use the Gregorian calendar and you don’t live far east from me, where it’s already a new day when I post this post.

1/1/11 is not one of those days where we have the usual confusion between the American way of expressing a date using the sequence month/day/year opposite to the common straight forward European sequence being day/month/year.

But in a few hours when it’s 2/1/11 in Europe and some hours later when it’s 1/2/11 in North America we are confused.

So, data quality folks, remember putting your dates in a unique format starting from tomorrow the 2nd January 2011 or, if you like, January 2, 2011.  

Happy New Unique Year.

Bookmark and Share

Superb Bad Data

When working with data and information quality we often use words as rubbish, poor, bad and other negative words when describing data that need to be enhanced in order to achieve better data quality. However, what is bad may have been good in the context where a particular set of data originated.

Right now I have some fun with author names.

An example of good and bad could be with an author I have used several times on this blog, namely the late fairy tale writer called in full name:

Hans Christian Andersen

When gazing through data you will meet his name represented this way:

Andersen, Hans Christian

This representation is fit for purpose of use for example when looking for a book by this author at a library, where you sort the fictional books by the surname of the author.

The question is then: Do you want to have the one representation, the other representation or both?

You may also meet his name in another form in another field than the name field. For example there is a main street in Copenhagen called:

H. C. Andersens Boulevard

This is the representation of the real world name of the street holding a common form of the authors name with only initials.

Bookmark and Share

Diversity in Data Quality in 2010

Diversity in data quality is a favorite topic of mine and diversity has been my theme word in social media engagement this year.

Fortunately I’m not alone. Others have been writing about diversity in data quality in the past year. Here are some of the contributions I remember:

The Dutch data quality tool vendor Human Inference has a blog called Data Value Talk. Here several posts are about diversity in data quality including the post World Languages Day – Linguistic diversity rules in Switserland!

Another blog based in the Netherlands is from Graham Rhind. Graham (a Brit stranded in Amsterdam) is an expert in international issues with data quality and one of his blog posts this year is called Robert the Carrot.

The MDM Vendor IBM Initiate has a lively blog about Master Data Management and Data Quality. One of the posts this year was an introduction to a webinar. The post by Scott Schumacher (in which I’m proud to be mentioned) is called Join Us to Demystify Multi-Cultural Name Matching.

Rich Murnane posted a funny but learning video with Derek Sivers about Japanese addresses called What is the name of that block? (Again, thanks Rich for the mention).

In the eLearningCurve free webinar series there was a very educational session with Kathy Hunter called Overcoming the Challenges of Global Data.  There is also an interview with Kathy Hunter on the DataQualityPro site.

I also remember we debated the state of the art of data quality tools when it comes to international data in the post by Jim Harris called OOBE-DQ, Where Are You? As Jim mentions in his later post called Do you believe in Magic (Quadrants)?: “It must be noted that many vendors (including the “market leaders”) continue to struggle with their International OOBE-DQ”.

I guess that international capabilities in data quality tools and party master data management solutions will be on the agenda in 2011 as well.

Bookmark and Share

Automation

The article on Wikipedia about automation begins like this:

“Automation is the use of control systems and information technologies to reduce the need for human work in the production of goods and services. In the scope of industrialization, automation is a step beyond mechanization. Whereas mechanization provided human operators with machinery to assist them with the muscular requirements of work, automation greatly decreases the need for human sensory and mental requirements as well. Automation plays an increasingly important role in the world economy.

Automation has had a notable impact in a wide range of industries beyond manufacturing (where it began). Once-ubiquitous telephone operators have been replaced largely by automated telephone switchboards and answering machines.”

Often we discuss the role of technology in solving data and information quality issues. Viewpoints differ between:

  • Technology may be part of the problem, but should not be part of the solution
  • Tools may solve a certain part of the problems by automating else time consuming processes

I am deliberately not stating the extreme viewpoint that tools (or a certain tool) will solve everything, as I have never seen or heard that viewpoint as mentioned in the post Data Quality Tool Exaggerations.

So, given that range, my viewpoint is the second extreme viewpoint of the ones mentioned above.

If you surprisingly should have a more extreme viewpoint you may go to the OCDQ Blog post called What Does Data Quality Technology Want? and vote for the second option there.

Bookmark and Share

Referrers

I have earlier written about how search terms are a way people gets to my blog in the post Picture This.

Another way is being referred from other sources. Lately WordPress, which is my blog service, improved the statistics so the referring sources are consolidated which gives you much more meaningful information about your referrers.

My current all time statistics looks like this:

At the time the total number of pageviews was 46,263.

LinkedIn seems to be my main supplier of readers. I am regularly sharing my posts as status updates and as news items in different LinkedIn groups.

But I do think that the figures for Twitter is lying though as they are counted based on where from the tweets and re-tweets are read. Twitter is probably only the twitter site. Hootsuite is another way of reading and clicking on links to a blog in a tweet. People who read and click via TweetDeck is as I understand it not counted as a referring source as TweetDeck is a desktop application.

Though I write in English I do from time to time post user blogs and comments with links on Danish language sources as the local Computerworld and another IT online news site called Version2.   

When someone, which in my case mainly is Rich Murnane I think, StumblesUpon a blog post you sometimes get a lot of pageviews within an hour or so.

Else Jim Harris’s blog called OCDQ Blog is a constant source of referring either due to Jim’s kind links to my blog posts or my self-promoting links in my comments on Jim’s blog posts.

Bookmark and Share