A Data Quality Immaturity Model

There are several maturity models related to data quality out there. I have found a good collection in this document from NASCIO.

I guess the mother of all maturity models is the Capability Maturity Model (CMM). This model is related to software development.

There is also a parody model for that called the Capability Immaturity Model (CIMM). Inspired by an article yesterday by Jill Dyché on Information Management called Anti-Predictions for 2011 I have found that the CIMM model is easily adapted to a data quality immaturity model with levels from zero to minus three as this:

0 : Negligent

The organization pays lip service, often with excessive fanfare, to implementing data quality processes, but lacks the will to carry through the necessary effort. Whereas level 1 assumes eventual success in producing and measuring quality data, level 0 organizations generally fail to have any idea about the actual horrible quality of the data assets.

-1 : Obstructive

Processes, however inappropriate and ineffective, are implemented with rigor and tend to obstruct work. Adherence to process is the measure of success in a level -1 organization. Any actual creation of quality data is incidental. The quality of any data is not assessed, presumably on the assumption that if the proper process was followed, high quality data is guaranteed.

-2 : Contemptuous

While processes exist, they are routinely ignored by the staff and those charged with overseeing the processes are regarded with hostility. Measurements are fudged to make the organization look good.

-3 : Undermining

Not content with faking their own performance, undermining departments within the organization routinely work to downplay and sabotage the efforts of rival departments. This is worst where company policy causes departments to compete for scarce resources, which are allocated to the loudest advocates.

Bookmark and Share

Technology and Maturity

A recurring subject for me and many others is talking and writing about people, processes and technology including which one is most important, in what sequence they must be addressed and, which is my main concern, how they must be aligned.

As we practically always are referring to the three elements in the same order being people, processes and technology there is certainly an implicit sequence.

If we look at maturity models related to data quality we will recognize that order too.

In the low maturity levels people are the most important aspect and the subject that needs the first and most attention and people are the main enablers for starting moving up in levels.

Then in the middle levels processes are the main concerns as business process reengineering enables going up the levels.

At the top levels we see implemented technology as a main component in the description of being there.    

An example of the growing role of technology is (not surprisingly of course) in the data governance maturity model from the data quality tool vendor DataFlux.

One thing is sure though: You can’t move your organization from the low level to the high level by buying a lot of technology.

It is an evolutionary journey where the technology part comes naturally step by step by taking over more and more of the either trivial or extremely complex work done by people and where technology becomes an increasingly integrated and automated part of the business processes.

Bookmark and Share

Top 5 Reasons for Downstream Cleansing

I guess every data and information quality professional agrees that when fighting bad data upstream prevention is better than downstream cleansing.

Nevertheless most work in fighting bad data quality is done as downstream cleansing and not at least the deployment of data quality tools is made downstream were tools outperforms manual work in heavy duty data profiling and data matching as explained in the post Data Quality Tools Revealed.

In my experience the top 5 reasons for doing downstream cleansing are:

1) Upstream prevention wasn’t done

This is an obvious one. At the time you decide to do something about bad data quality the right way by finding the root causes, improving business processes, affect people’s attitude, building a data quality firewall and all that jazz you have to do something about the bad data already in the databases.

2) New purposes show up

Data quality is said to be about data being fit for purpose and meeting the business requirements. But new purposes will show up and new requirements have to be met in an ever changing business environment.  Therefore you will have to deal with Unpredictable Inaccuracy.

3) Dealing with external born data

Upstream isn’t necessary in your company as data in many cases is entered Outside Your Jurisdiction.

4) A merger/acquisition strikes

When data from two organizations having had different requirements and data governance maturity is to be merged something has to be done.  Some of the challenges are explained in the post Merging Customer Master Data.

5) Migration happens

Moving data from an old system to a new system is a good chance to do something about poor data quality and start all over the right way and oftentimes you even can’t migrate some data without improving the data quality. You only have to figure out when to cleanse in data migration.

Bookmark and Share

Data Quality Milestones

milestoneI have a page on this blog with the heading “Data Quality 2.0”. The page is about what the near future in my opinion will bring in the data quality industry. In recent days there were some comments on the topic. My current summing up on the subject is this:

Data Quality X.X are merely maturity milestones where:

Data Quality 0.0 may be seen as a Laissez-faire state where nothing is done.

Data Quality 1.0 may be seen as projects for improving downstream data quality typically using batch cleansing with national oriented techniques in order to make data fit for purpose.

Data Quality 2.0 may be seen as agile implementation of enterprise wide and small business data quality upstream prevention using multi-cultural combined techniques exploiting cloud based reference data in order to maintain data fit for multiple purposes.