Let’s look at some statements:
• Business Intelligence and Data Mining is based on looking into historical data in order to make better decisions for the future.
• Some of the best results from Business Intelligence and Data Mining are made when looking at data in different ways than done before.
• It’s a well known fact that Business Intelligence and Data Mining is very much dependent on the quality of the (historical) data.
• We all agree that you should not start improving data quality (like anything else) without a solid business case.
• Upstream prevention of poor data quality is superior to downstream data cleansing.
Unfortunately the wise statements above have some serious interrelated timing issues:
• The business case can’t be established before we start to look at the data in the different way.
• Data is already stored downstream when that happens.
• Anyway we didn’t know precisely what data quality issues we have in that context before trying out new possible ways of looking at data.
Solutions to these timing issues may be:
• Always try to have the data reflect the real world objects they represent as close as possible – or at least include data elements that makes enrichment from external sources possible.
• Accept that downstream data cleansing will be needed from time to time and be sure to have the necessary instruments for that.
Truly remarkable post Henrik!
So many discussions of “best practices” for data quality and its related disciplines ignore the challenging reality that time is perhaps the most important dimension of data quality – meaning that the business and technical organizational evolution (or in some cases – revolution) requires a significant time investment.
And just like time can not be made to stand still, the organization’s current “less than best” business processes also can not be made to stop and wait until after the implementation of to resume daily business activities.
Good One Henrik!!!
The post highlights on some critical points
1. Time variance (data which is 3 years old may be bad, but then how to validate it???)
2. Lack of information or visualizing ‘the need’ in building a BI/MDM/DWH business case
Generally a business case is built based on the following assumptions
1. BI would improve my top line
2. BI would help us achieve greater compliance
3. BI would help us to design products/services akin with customer expectation
4. BI would help us to understand customer behavior better
5. BI would help us ………………………………………(the list continues)
Though these aspects are addressed by a BI/DQ solution. It is to be understood that the outcome relies to a larger extent on the organizations ‘data’ rather than the BI technology itself. And as Jim says if your IT system just replicates the ‘less than best’ business process bad data is gonna keep flowing in!!!
The solution is multifold right from a business process re-engineering, organization’s IT and business team sitting together and arriving at a realistic expectation in building a business case.
On a lighter note ‘Time is what prevents everything from happening at once’ ~ John Archibald Wheeler
May be that what makes BI solutions more important 🙂
Thought provoking post – well done.
Perhaps the first BI and Data Mining to be done should be “Data Profiling”. This looks at data in different ways than done before, and should highlight to “the business” what is ACTUALLY in the data, rather than what is thought to be there.
I like the timeline effect, Henrik! This touches on the circular like conundrum surrounding DQ. One thing that struck me was that bullet points 1-3 seem to be the very best business case for DQ.
I really like the reaslistic approach to the last bullet point about the fact that the we need to “Accept that downstream data cleansing will be needed from time to time” …
To me, the answer to the timing issues lies in the acceptance that data quality, data governance and MDM are a set of required and inter-dependent business processes that need to be embraced if organizations will ever get the levels of BI you’ve eluded to.
I’m interesting in following this post. I have a feeling it is going to generate a lot of comments! 🙂
Another good thought provoking article Henrik.
I like the middle bullets, because to many times the business cases are not developed. Since business is very fluid, and when the cases are made, it is reversed engineered into the operational systems, and eventually, the design will catch up. the catch 22..
Real world context of the data, representing the real world objects… that is a great stepping stone. Without using a crystal ball, trying to establish the real world context at least gives you the opportunity to leverage the info when views change.
keep it up.
Thanks Jim, Satesh, Ken, William and Garnie.
Your comments add precisely some of the thoughts I had when writing the post, but in my sometimes perhaps bad habit of being very brief didn’t – or wasn’t able to – formulate. Thanks again.
Excellent, thought provoking (and response provoking) post.
The concept of time preventing everything happening at once also means that everything cannot be fixed at once. If we accept that we cannot get all the data to the quality levels we require by the time we have to use it, this will ensure businesses use data with suitable care.
Computers can be dangerous tools when used inappropriately – just because some analysis states that the answer is 2.2163757etc. does not actually mean that it is correct. I was once witness to the digital read out of some weighing machinery being believed to be accurate to +/- 0.001% when in fact the mechanical accuracy of the weighing mechanism was actually +/-10%!!
If analysts and managers accept that analysis will not necessarily find ‘the answer’, but provides an indication of the answer area. Further checks can then be used to refine and validate the answer area.
The key test is for organisations to know the quality levels of their data in order to assess the level of controls and validation required on any output.
I think you are right about that analysis often only provides an indication of the answer area and the results may lead to further investigations, where data quality issues often plays a role.
It was exactly that situation that made me write this post. I’m involved in a data management project (including DQ/DW/BI) where we do all the right stuff.
One of the first results from the improved reporting where we look at data in a new way showed a very different reality from what was perceived by the business folks in one area. First there was mistrust, but after further investigations it looks like the reality is indeed different from the leading perception until now – but also there are some data quality issues we falsely didn’t give a high priority until now,
So next week I guess I will do both downstream cleansing and implementing upstream prevention.
I enjoy your posts. I think that for an organization and the individuals in it to learn truly the value of high quality data they need to feel the consequences of loosing money due to being short-sighted about maintaining data quality. Without the initial learning opportunity the organization/individuals will never begin the spiral upward toward cleaner and more tightly defined data elements.
Thanks again for you inspirational posts.
Tom, thanks a lot for your comment.