A recent infographic prepared by Trillium Software highlights a fact about data quality I personally have been preaching about a lot:
This number is (roughly) sourced from a study by Wayne W. Eckerson of The Data Warehouse Institute made in 2002:
So, in the fight against bad data quality, a good place to start will be helping data entry personnel doing it right the first time.
One way of achieving that is to cut down on the data being entered. This may be done by picking the data from sources already available out there instead of retyping things and making those annoying flaws.
If we look at the two most prominent master data domains, some ideas will be:
- In the product domain I have seen my share of product descriptions and specifications being reentered when flowing down in the supply chain of manufacturers, distributors, re-sellers, retailers and end users. Better batch interfaces with data quality controls is one way of coping with that. Social collaboration is another one as told in the post Social PIM.
- In the customer, or rather party, domain we have seen an uptake of using address validation. That is good. However, it is not good enough as discussed in the post Beyond Address Validation.