Gartner (the analyst firm), represented by Saul Judah, takes data quality back to basics in the recent post called Data Quality Improvement.
While I agree with the sentiment around measuring the facts as expressed in the post I have cautions about relying on that everything is good when data are fit for the purpose for business operations.
Some clues lies in the data quality dimensions mentioned in the post:
Accuracy (for now):
As said in the Gartner post data are indeed temporal. The real world changes and so does business operations. When you got your data fit for the purpose of use the business operations has changed. And when you got your data re-fit for the new purpose of use the business operations has changed again.
Furthermore most organizations can’t take all business operations into account at the same time. If you go down the fit for purpose track you will typically address a single business objective and make data fit for that purpose. Not at least when dealing with master data there are many business objectives and derived purposes of use. In my experience that leads to this conclusion:
“While we value that data are of high quality if they are fit for the intended use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”
Existence – an aspect of completeness:
The Gartner post mentions a data quality dimension being existence. I tend to see this as an aspect of the broader used term completeness.
For example having a fit for purpose completeness related to product master data has been a huge challenge for many organizations within retail and distribution during the last years as explained in the post Customer Friendly Product Master Data.
Back in 2010 I played around with the term Data Quality 3.0. This concept is about how we increasingly use external data within data management opposite to the traditional use of internal data, which are data that has been typed into our databases by employees or has been internally collected in other ways.
The rise of big data has definitely fueled the thinking around using external data as reported in the post Adding 180 Degrees to MDM.
There are other internal and external aspects for example internal and external business rules as examined in the post Two Kinds of Business Rules within Data Governance. This post has been discussed in the Data Governance Know How group on LinkedIn.
In a comment Thomas Tong says:
“It’s really fun when the internal components of governance are running smooth, giving the opportunity to focus on external connections to your data governance program. Finding the right balance between internal and external influences is key, as external governance partners can reduce the load/complexity of your overall governance program. It also helps clarify the difference between a “external standard” vs “internal standard”, as well as what is “reference data” vs “master data”… and a little preview of your probable integration strategy with external.”
This resonates very much with my mindset. Since 2010 my own data quality journey has increasingly embraced Master Data Management (MDM) and Data Governance as told in the recent blog post called Data Governance, Data Quality and MDM.
So, in my quest to coin these 3 disciplines into one term I, besides the word information, also may put 3.0 into the naming: “Information Quality 3.0”, hmmm …..
Yesterday Daragh O Brien posted an Open Letter to my Information Quality Peers. The essence is that Daragh isn’t completely satisfied with how things are in The International Association for Information and Data Quality (IAIDQ).
That reminds me of that I was a charter member of IAIDQ.
But now checking I probably haven’t renewed the membership. This is not deliberate. It just may have slipped. Maybe, as being one of Daragh’s critique points, because broadcasting from IAIDQ has decreased the last years.
> Correction: Double checking I am actually still a member. I renewed for 2 years last time (usually I’m not that careless with money). I just lost my Charter Mbr designation in the process.
Another critique point raised by Daragh is the failed mission to make the organization truly international, as the organization have had difficulties maintaining chapters around the world.
Forming and maintaining regional chapters is about getting and upholding a critical mass of active members. An example of that this is possible is the German Information Quality Society – Deutsche Gesellschaft für Informations- und Datenqualität e. V. However, this organization doesn’t seem to be a IAIDQ chapter, but being another church obeying the same god.
The current unrest in IAIDQ is not the first of its kind. I remember that some years ago one of the founding members, Larry English, sent a strange email to members telling that he quitted the organization not being satisfied with something.
It is ironic that information quality practitioners are preaching communication and collaboration, but we don’t seem to get it when it comes to organizing our own little world.
The data governance discipline, the data quality discipline and the Master Data Management (MDM) discipline are closely related and happens to be my fields of work.
Data quality improvement is important within data governance and MDM. Furthermore you seldom see an MDM implementation without a (master) data governance work stream today.
Over time it has often been suggested that data quality should rightfully be named information quality as told in the post New Blog Name. In addition, data governance could be referred to as information governance as suggested in the Mike2 Open Methodology here.
Within MDM we have the term Product Information Management (PIM) which is partly, but maybe not fully, the same as Product MDM, as examined by Monica McDonnell of Informatica in the post PIM is Not Product MDM – Product MDM is not PIM.
Product is one of several domains within MDM, where customer (or rather party), location and asset are other domains going into multi-domain MDM as reported in the post Multi-Entity MDM vs Multidomain MDM.
While replacing the term data with the term information for data quality, data governance and for that matter (multi-domain) master data management has had limited success outside academic circles, I do see it very suitable for being part of a term covering these three disciplines as a whole.
So what should these three disciplines be called as a whole? Have you noticed any good terms or smart hypes out there? Or are they just three out of more disciplines within data or information management?
When advising about and doing actual work within the data governance realm you often need to refer to open available resources.
As data governance still is an emerging discipline the available resources are of that nature too. There are plenty of good and insightful articles, blog posts and other pieces of information around. But when you try to put them together to work in a data governance journey, the recommendations may point in a lot of different directions.
When it comes to open available resources where there is a kind of consistent framework for a data governance programme I have seen these two out there:
Have you found, or made available, other more or less complete journey plans for data governance out there?
A very common data quality issue is when a field in a data record is populated with more than one piece of information.
Sometimes this is done as a work around, because we have a piece of information, but we haven’t a field with that distinct purpose of use. Then we find a more or less related existing field where in we can squeeze this additional piece of information.
But we also have some very common cases where this bad habit is required by external business rules or wide spread tradition.
Legal Form in Company Names
This example is examined in the post Legal Forms from Hell.
One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.
An Address Line
An address line will typically hold a couple of elements as a street (thoroughfare) name, a house number and maybe some kind of unit identification.
By the way the order of street name and house number is opposite in approximately two equal parts of the world, with the exception of places where numbering within blocks between streets is the standard.
Education in Person Name
You can put professor in front of your name and even MBA – Master of Business Administration!! – after your name in the name field.
In the next few days I will put AFCM (Accidental Field Content Misuser) after my name.
Data is of high quality if they are fit for the purpose of use. This mantra has been around in the data management realm for many years.
In a recent article by Andy Hayler on CIO about MDM at Harrods there is a good example of a piece of data of such a high quality. It is a product description:
XX 6621/74 BLK VNN SS TOP 969B S
This product description was nicely fit for the purpose of use when Harrods handled their product data in a material master in an ERP system I guess. But when switching from buy-side focus to sell-side focus in a multi-channel world, this product description gives no meaning to the customer.
Such problems with changing purposes of use for product master data is not only a luxury problem at Harrods but a common challenge within retail and distribution. The challenge involve having customer friendly product descriptions, a range of atomized product attributes that varies by product category and having related digital assets that helps the customer.
Organizations around are, as explained by Andy Hayler, tackling this challenge by implementing Master Data Management (MDM) solutions – in this case those ones specialized in Product Information Management (PIM).
MDM is said to be about a single version of the truth. While this in the customer (or rather party) MDM world is much about achieving uniqueness by matching and merging several different representations of the same real world individual or legal entity, the main challenge in product MDM is a bit different. Here completeness is a big issue. This involves gathering several different pieces of the truth from different sources. And a certain level of completeness may be fit for the purpose of use today but not fit enough tomorrow.
So, how can organizations overcome the huge task of gathering so much product data? I think it is much about Sharing Product Master Data.