When working with data quality improvement it is crucial to be able to monitor how your various ways of getting better data quality is actually working. Are things improving? What measures are improving and how fast? Are there things going in the wrong direction?
Recently I had a demonstration by Kasper Sørensen, the founder of the open source data quality tool called DataCleaner. The new version 3.0 of the tool has comprehensive support of monitoring how data quality key performance indicators develop over time.
What you do is that you take classic data quality assessment features as data profiling measurements of completeness and duplication counting. The results from periodic executing of these features are then attached to a timeline. You can then visually asses what is improving, at what speed and eventually if anything is not developing so well.
Continuously monitoring how data quality key performance indicators are developing is especially interesting in relation to using concepts of getting data quality right the first time and follow up by ongoing data maintenance through enrichment from external sources.
In a traditional downstream data cleansing project you will typically measure completeness and uniqueness two times: Once before and once after the executing.
With upstream data quality prevention and automatic ongoing data maintenance you have to make sure everything is running well all the time. Having a timeline of data quality key performance indicators is a great feature for doing just that.
Great point. The visibility of the trend should lead to action. Even better is to turn exceptions in to clear action items so it can be clearly seen who needs to do what and then report on that.
Thanks for commenting Diane. Indeed, the nice visualization must be followed up by action when expectations aren’t met.