“The average financial impact of poor data quality on organizations is $9.7 million per year.” This is a quote from Gartner, the analyst firm, used by them to promote their services in building a business case for data quality.
While this quote rightfully emphasizes on that a lot of money is at stake, the quote itself holds a full load of data and information quality issues.
On the pedantic side, the use of the $ sign in international communication is problematic. The $ sign represents a lot of different currencies as CAD, AUD, HKD and of course also USD.
Then it is unclear on what basis this average is measured. Is it among the +200 million organizations in the Dun & Bradstreet Worldbase? Is it among organizations on a certain fortune list? In what year?
Even if you knew that this is an average in a given year for the likes of your organization, such an average would not help you justify allocation of resources for a data quality improvement quest in your organization.
I know the methodology provided by Gartner actually is designed to help you with specific return on investment for your organization. I also know from being involved in several business cases for data quality (as well as Master Data Management and data governance) that accurately stating how any one element of your data may affect your business is fiendishly difficult.
I am afraid that there is no magic around as told in the post Miracle Food for Thought.
There are many signs showing that we are entering the age of business ecosystems. A recent example is an article from Digital McKinsey. This read worthy article is called Adopting an ecosystem view of business technology.
In here, the authors emphasizes on the need to adapt traditional IT functions to the opportunities and challenges of emerging technologies that embraces business ecosystems. I fully support that sentiment.
In my eyes, some of the emerging technologies we see are in large misunderstood as something meant for being behind the corporate walls. My favorite example is the data lake concept. I do not think a data lake will be an often seen success solely within a single company as explained in the post Data Lakes in Business Ecosystems.
The raise of technology for business ecosystems will also affect the data management roles we know today. For example, a data steward will be a lot more focused towards external data than before as elaborated in the post The Future of Data Stewardship.
Encompassing business ecosystems in data management is of course a huge challenge we have to face while most enterprises still have not reached an acceptable maturity when it comes internal data and information governance. However, letting the outside in will also help in getting data and information right as told in the post Data Sharing Is The Answer To A Single Version Of The Truth.
The term infonomics does not yet run unmarked through my English spellchecker, but there are some information available on Wikipedia about infonomics. Infonomics is closely related to the often-mentioned phrases in data management about seeing data / information as an asset.
Much of what I have read about infonomics and seeing data / information as an asset is related to what we call first party data. That is data that is stored and managed within your own company.
Some information is also available in relation to third party data. That is data we buy from external parties in order to validate, enrich or even replace our own first party data. An example is a recent paper from among others infonomic guru Doug Laney of Gartner (the analyst firm). This paper has a high value if you want to buy it as seen here.
Anyway, the relationship between data as an asset and the value of data is obvious when it comes to third party data, as we pay a given amount of money for data when acquiring third party data.
Second party data is data we exchange with our trading and other business partners. One example that has been close to me during the recent years is product information that follows exchange of goods in cross company supply chains. Here the value of the goods is increasingly depending on the quality (completeness and other data quality dimensions) of the product information that follows the goods.
In my eyes, we will see an increasing focus on infonomics when it comes to exchanging goods – and the related second party data – in the future. Two basic factors will be:
Product Data Lake is the new solution to sharing product information between trading partners. While we see many viable in-house solutions to Product Information Management (PIM), there is a need for a solution to exchange product information within cross company supply chains between manufacturers, distributors and retailers.
Completeness of product information is a huge issue for self-service sales approaches as seen in ecommerce. 81 % of e-shoppers will leave a webshop with lacking product information. The root cause of missing product information is often an ineffective cross company data supply chain, where exchange of product data is based on sending spreadsheets back and forth via email or based on biased solutions as PIM Supplier Portals.
However, due to the volume of product data, the velocity required to get data through and the variety of product data needed today, these solutions are in no way adequate or will work for everyone. Having a not working environment for cross company product data exchange is hindering true digital transformation at many organizations within trade.
As a Product Information Management professional or as a vendor company in this space, you can help manufacturers, distributors and retailers in being successful with product information completeness by becoming a Product Data Lake ambassador.
The Product Data Lake encompasses some of the most pressing issues in world-wide sharing of product data:
The first forward looking professionals and vendors in the Product Information Management realm have already joined. I would love to see you as well as our next ambassador.
Interested? Get in contact:
During my professional work and not at least when following the data management talk on social media I often stumble upon sayings as:
- IT should not drive a CRM / MDM / PIM / XXX project. The business should do that.
- IT should not be responsible for data quality. The business should be that.
I disagree with that. Not that the business should not do and be those things. But because IT should be a part of the business.
I have personally always disliked the concept of dividing a company into IT and the business. It is a concept practically only used by the IT (and IT focused consulting) side. In my eyes, IT is part of the business just as much as marketing, sales, accounting and all the other departmental units.
With the raise of digitalization the distinction between IT and the business becomes absolutely ridiculous – not to say dangerous.
We need business minded IT people and IT savvy business people to drive digitilization and take responsibility of data quality.
- IT = Information Technology
- CRM = Customer Relationship Management
- MDM = Master Data Management
- PIM = Product Information Management
Whether the data lake concept is a good idea or not is discussed very intensively in the data management social media community.
The fear, and actual observations made, is that that a data lake will become a data dump. No one knows what is in there, where it came from, who is going to clean up the mess and eventually have a grip on how it should be handled in the future – if there is a future for the data lake concept.
Please folks. We have some concepts from the small data world that we must apply. Here are three of the important ones:
In short, metadata is data about data. Even though the great thing about a data lake is that the structure and all purposes of the data does not have to be cut in stone beforehand, at least all data that is delivered to a data lake must be described. An example of such an implementation is examined in the post Sharing Metadata.
You must also have the means to tag who delivered the data. If your data lake is within a business ecosystem, this should include the legal entity that has provided the data as told in the post Using a Business Entity Identifier from Day One.
Above all, you must have a framework to govern ownership (Responsibility, Accountability, Consultancy and who must be Informed), policies and standards and other stuff we know from a data governance framework. If the data lake expand across organizations by incorporating second party and third party data, we need a cross company data governance framework as for example highlighted on Product Data Lake Documentation and Data Governance.
Recently Daniel O’Connor blogged about Three Keys to a Successful Product Data Project BEFORE You Start the Project. Number one key suggested by Daniel is to know what quality product data looks like. I agree.
Besides Daniel’s very valid points on this matter, I would like to bring data quality dimensions into the game. Looking at data quality from a completeness, timeliness, conformity, consistency and accuracy point of view will help crafting tangible measures and identifying the root causes of where current culture, processes and technology lack the capabilities of meeting the desired state of product data quality.
Here is my take on how to use data quality dimensions for product data:
Completeness of product data is essential for self-service sales approaches. A recent study revealed that 81 % of e-shoppers would leave a webshop with incomplete product information. The root cause of lacking product data is often a not working cross company data supply chain as reported in the post The Cure against Dysfunctional Product Data Sharing.
Timeliness, or currency if you like, of product data is again an issue often related to challenges in cross company supply chains. You can learn more about this subject in the post How to avoid Stale Product Data.
Conformity of product data is first and foremost achieved by adhering to a public standard for product data. However, there are different international, national and industry standards to choose from. These standards also comes in versions that changes over time. Also your variety of product groups may be best served by different standards.
Consistency of product data has to be solved in two scopes. First consistency has to be solved internally within your organisation by consolidating diverse silos of product master data. This is often done using a Product Information Management (PIM) solution. Secondly you have to share your consistent product data with your flock of trading partners as explained in the post What a PIM-2-PIM Solution Looks Like.
Accuracy is usually best at the root, meaning where the product is manufactured. Then accuracy may be challenged when passed along in the cross company supply chain as examined in the post Chinese Whispers and Data Quality. Again, the remedy is about creating transparency in business ecosystems by using a modern data management approach as proposed in the post Data Lakes in Business Ecosystems.