If you search on Google for “data quality” you will find the ever-recurring discussion on how we can define data quality.
This is also true for the top ranked none sponsored articles as the Wikipedia page on data quality and an article from Profisee called Data Quality – What, Why, How, 10 Best Practices & More!
The two predominant definitions are that data is of high quality if the data:
- Is fit for the intended purpose of use.
- Correctly represent the real-world construct that the data describes.
Personally, I think it is a balance.
In theory I am on the right side. This is probably because I most often work with master data, where the same data have multiple purposes.
However, as a consultant helping organizations with getting the funding in place and getting the data quality improvement done within time and budget I do end up on the other side.
What about you? Where do you stand in this question?
One of my current engagements is within jewelry – or is it jewellery? The use of these two respectively US English and British English words is a constant data quality issue, when we try to standardize – or is it standardise? – to a common set of reference data and a business glossary within an international organization – or is it organisation?
Looking for international standards often does not solve the case. For example, a shop that sells this kind of bijouterie, may be classified with a SIC code being “Jewelry store” or a NACE code being “Retail sale of watches and jewellery in specialised stores”.
A pearl is a popular gemstone. Natural pearls, meaning they have occurred spontaneously in the wild, are very rare. Instead, most are farmed in fresh water and therefore by regulation used in many countries must be referred to as cultured freshwater pearls.
My pearls of wisdom respectively cultured freshwater pearls of wisdom for building a business glossary and finding the common accepted wording for reference data to be used within your company will be:
- Start looking at international standards and pick what makes sense for your organization. If you can live with only that, you are lucky.
- If not, grow the rest of the content for your business glossary and reference data by imitating the international or national standards for your industry, and use your own better wording and additions that makes the most sense across your company.
And oh, I know that pearls of wisdom are often used to imply the opposite of wisdom 🙂
One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.
By doing that, you tackle some of the most challenging data quality dimensions as:
- Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
- Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
- Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.
Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.
When selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.
At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.
For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.
As reported in the post Gravitational Waves in the MDM World there is a tendency in the MDM (Master Data Management) market and in MDM programmes around to encompass both the party domain and the product domain.
The party domain is still often treated as two separate domains, being the vendor (or supplier) domain and the customer domain. However, there are good reasons for seeing the intersection of vendor master data and customer master data as party master data. These reasons are most obvious when we look at the B2B (business-to-business) part of our master data, because:
- You will always find that many real world entities have a vendor role as well as a customer role to you
- The basic master data has the same structure (identification, names, addresses and contact data
- You need the same third party validation and enrichment capabilities for customer roles and vendor roles.
These reasons also applies to other party roles as examined in the post 360° Business Partner View.
When we look at the product domain we also have a huge need to connect the buy side and the sell side of our business – and the make side for that matter where we have in-house production.
Multi-Domain MDM has a side effect, so to speak, about bringing the sell-side together with the buy- and make-side. PIM (Product Information Management), which we often see as the ancestor to product MDM, has the same challenge. Here we also need to bring the sell-side and and the buy-side together – on three frontiers:
- Bringing the internal buy-side and sell-side together not at least when looking at product hierarchies
- Bringing our buy-side in synchronization with our upstream vendors/suppliers sell-side when it comes to product data
- Bringing our sell-side in synchronization with our downstream customers buy-side when it comes to product data
One of the big news this week was the detection of gravitational waves. The big thing about this huge step in science is that we now will be able to see things in space, we could not see before. These are things we have plenty of clues about, but we cannot measure them because they do not emit electromagnetic radiation and the light from them is absorbed or reflected by cosmic bodies or dust before it reaches our telescopes.
We have kind of the same in the MDM (Master Data Management) world. We know that there is such a thing called multi-domain Master Data Management but our biggest telescope, the Gartner magic quadrants, only until now clearly identified customer Master Data Management and product Master Data Management as latest touched in the post The Perhaps Second Most Important MDM Quadrant 2015 is Out.
Indeed, many MDM programmes that actually does encompass all MDM domains do split the efforts into traditional domains as customer, vendor and product with separate teams observing their part of the sky. It takes a lot to advocate for that despite vendors belongs to the buy side and customers belongs to the sell side of the organization, there are strong ties between these objects. We can detect gravity in terms of that a vendor and a customer can be the same real world entity and vendors and customers have the same basic structure being a party.
Products do behave differently depending on the industry where your organization belongs. You may make products utilizing raw materials you buy and transform into finished products you sell or/and you may buy and sell the same physical product as a distributor, retailer or other value adding node in the supply chain. In order to handle the drastic increased demand for product data related to eCommerce, PIM (Product Information Management) has been known for long and many organizations everywhere in supply chains have already established PIM capabilities inside their organization with or without and inside or outside product Master Data Management.
What we still need to detect is a good system for connecting the PIM portion of sell sides upstream and buy sides downstream in supply chains. Right now we only see a blurred galaxy of spreadsheets as examined in the post Excellence vs Excel.
An important part of implementing Master Data Management (MDM) is to capture the business rules that exists within the implementing organization and build those rules into the solution. In addition, and maybe even more important, is the quest of crafting new business rules that helps making master data being of more value to the implementing organization.
Examples of such new business rules that may come along with MDM implementations are:
- In order to open a business account you must supply a valid Legal Entity Identifier (like Company Registration Number, VAT number or whatever applies to the business and geography in question)
- A delivery address must be verified against an address directory (valid for the geography in question)
- In order to bring a product into business there is a minimum requirement for completeness of product information.
Creating new business rules to be part of the to-be master data regime highlights the interdependency of people, process and technology. New technology can often be the driver for taking on board such new business rules. Building on the above examples such possibilities may be:
- The ability to support real time pick and check of external identifiers
- The ability to support real time auto completion and check of postal addresses
- The ability to support complex completeness checks of a range of data elements
The term Data Quality 3.0 has been around on this blog for nearly 5 years and was recently aired again in the post Data Quality 3.0 Revisited.
A natural consequence of the concept of Data Quality 3.0 is something we may call Master Data Management (MDM) 3.0.
Master Data Management has in a large degree until now been about how to manage master data internally within organizations. The goal has often been to merge different data silos within the organization into one trusted source of master data. But any organization in itself manages a master data silo too. The master data kept by any organization is in a large degree a description of real world entities that also is digitalized by business partners and other third party entities.
The possibility of sharing customer, or rather party, master data as well as product and location master data was examined in the post Third Party Data and MDM.
But how do popular MDM solutions facilitate the enormous potential of looking outside the implementing organization when it comes to achieving high value master data? Very poor, in general, I’m afraid. From my experience MDM vendors stops at the point of creating more or less readymade interfaces to popular data pools and for product data some kind of supplier portals. While the professional MDM vendor have viable methodologies for internal MDM processes there is an open door to the blue sky when it comes to external collaboration.
I guess this is the time for blog posts about big things that is going to happen in 2015. But you see, we could also take a route away from the motorways and highways and see how the traditional way of life is still unfolding the data quality landscape.
While the innovators and early adopters are fighting with big data quality the late majority are still trying get the heads around how to manage small data. And that is a good thing, because you cannot utilize big data without solving small data quality problems not at least around master data as told in the post How important is big data quality?
Solving data quality problems is not just about fixing data. It is very much also about fixing the structures around data as explained in a post, featuring the pope, called When Bad Data Quality isn’t Bad Data.
A common roadblock on the way to solving data quality issues is that things that what are everybody’s problem tends to be no ones problem. Implementing a data governance programme is evolving as the answer to that conundrum. As many things in life data governance is about to think big and start small as told in the post Business Glossary to Full-Blown Metadata Management or Vice Versa.
Data governance revolves a lot around peoples roles and there are also some specific roles within data governance. Data owners have been known for a long time, data stewards have been around some time and now we also see Chief Data Officers emerge as examined in the post The Good, the Bad, and the Ugly Data Governance Role.
As experienced recently, somewhere in the countryside, while discussing how to get going with a big and shiny data governance programme there is however indeed still a lot to do with trivial data quality issues as fields being too short to capture the real world as reported in the post Everyday Year 2000 Problems.
The term evergreen is known from botany as plants staying green all year and from music as songs not just being a hit for a few months but capable of generating royalties for years and years.
Data should also stay evergreen. I am a believer in the “first time right” principle as explained in the post instant Single Customer View. However, you must also keep your data quality fresh as examined in the post Ongoing Data Maintenance.
If we look at customer, or rather party, Master Data Management (MDM) it is much about real world alignment. In party master data management you describe entities as persons and legal entities in the real world and you should have descriptions that reflect the current state (and sometimes historical states) of these entities. Some reflections will be The Relocation Event. And as even evergreen trees go away, and “My Way” hopefully will go away someday, you also must be able to perform Undertaking in MDM.
With product MDM it is much about data being fit for multiple future purposes of use as reported in the post Customer Friendly Product Master Data.
This is post number 666 on this blog. 666 is the number of the beast. Something diabolic.
The first post on my blog came out in June 2009 and was called Qualities in Data Architecture. This post was about how we should talk a bit less about bad data quality and instead focus a bit more on success stories around data quality. I haven’t been able to stick to that all the time. There are so many good data quality train wrecks out there, as the one told in the post called Sticky Data Quality Flaws.
Some of my favorite subjects around data quality were lined up in Post No. 100. They are:
The biggest thing that has happened in the data quality realm during the five years this blog has been live is probably the rise of big data. Or rather the rise of the term big data. This proves to me that changes usually starts with technology. Then we after sometime starts thinking about processes and finally peoples roles and responsibilities.