As said in here: “When the story of Information Quality Management is written, the first sentence of the first paragraph will include the name Larry English”.
Larry pioneered the data quality – or information quality as he preferred to coin it – discipline.
He was an inspiration to many data and information quality practitioners back in the 90’s and 00’s, including me, and he paved the way for bringing this topic to the level of awareness that it has today.
In his teaching Larry emphasized on the simple but powerful concepts which are the foundation of data quality and information quality methodologies:
Quantify the costs and lost opportunities of bad information quality
Always look for the root cause of bad information quality
Observe the plan-do-check-act circle when solving the information quality issues
Let us roll up our sleeves and continue what Larry started.
Every time there is a survey about what causes poor data quality the most ticked answer is human error. This is also the case in the Profisee 2019 State of Data Management Report where 58% of the respondents said that human error is among the most prevalent causes of poor data quality within their organization.
Even the Romans knew this as Seneca the Younger said that “errare humanum est” which translates to “to err is human”. He also added “but to persist in error is diabolical”.
So, how can we not persist in having human errors in data then? Here are three main approaches:
Better humans: There is a whip called Data Governance. In a data governance regime you define data policies and data standards. You build an organizational structure with a data governance council (or any better name), have data stewards and data custodians (or any better title). You set up a business glossary. And then you carry on with a data governance framework.
Machines:Robotic Processing Automation (RPA) has, besides operational efficiency, the advantage of that machines, unlike humans, do not make mistakes when they are tired and bored.
Data Sharing: Human errors typically occur when typing in data. However, most data are already typed in somewhere. Instead of retyping data, and thereby potentially introduce your misspelling or other mistake, you can connect to data that is already digitalized and validated. This is especially doable for master data as examined in the article about Master Data Share.
Tomorrow there is a Marathon race in my home city Copenhagen. 8 years ago, a post on this blog revolved around some data quality issues connected with the Marathon race. The post was called How long is a Marathon?
However, another information quality issue is if there ever was a first Marathon race ran by Pheidippides? Historians toady do not think so. It has something to do with data lineage. The written mention of the 42.192 (or so) kilometre effort from Marathon to Athens by Pheidippides is from Plutarch whose records was made 500 years after the events. The first written source about the Battle of Marathon is from Herodotus. It was written (in historian perspective) only 40 years after the events. He did not mention the Marathon run. However, he wrote, that Pheidippides ran from Athens to Sparta. That is 245 kilometres.
By the way: His mission in Sparta was to get help. But the Spartans did not have time. They were in the middle of an SAP roll-out (or something similar festive).
Some people make the 245-kilometre track in what is called a Spartathlon. In data and information quality context this reminds me that improving data quality and thereby information quality is not a sprint. Not even a Marathon. It is a Spartathlon.
Oftentimes it still takes a human eye to establish if a number, year, term or other piece of information is wrong.
I had that experience today at Harvard Square in Cambridge (Boston) when looking at the sign in front of our lunch restaurant. Established 1271 it says. Hmmmm. North American natives were not known for establishing restaurants. Also, the Vikings did not stay that long or went that south in North America.
The restaurant website actually admits the sign is wrong and this is a printing flaw (should have been 1971) that they have chosen to keep – maybe also in order to test the clever people hanging around Harvard.
Anyway, without attempting to turn this into a foodie blog, the food is OK but the waiting time for being served does resemble spans of centuries.
Hundreds of years ago the geocentric model was replaced by heliocentrism, meaning that we recognize that the earth travels around the sun and not the other way around.
When it comes to Product Information Management (PIM), we also need a Copernican Revolution, meaning that it is good to manage product information consistently inside a given company, but it is better to manage product information in the light of the business ecosystem where we participate.
Exchanging product information in the business ecosystems of manufacturers, distributors and merchants cannot work properly by asking all your trading partners to use your version of a spreadsheet – if they don’t get to you first with their version. Nor will self-centered supplier / customer product data portals work as examined in the post PIM Supplier Portals: Are They Good or Bad?
Your company is not a lonely planet. You are part of a business ecosystem, where you may be:
Upstream as the maker of goods and services. For that you need to buy raw materials and indirect goods from the parties being your vendors. In a data driven world you also to need to receive product information for these items. You need to sell your finished products to the midstream and downstream parties being your B2B customers. For that you need to provide product information to those parties.
Midstream as a distributor (wholesaler) of products. You need to receive product information from upstream parties being your vendors, perhaps enrich and adapt the product information and provide this information to the parties being your downstream B2B customers.
Downstream as a retailer/etailer or large end user of product information. You need to receive product information from upstream parties being your vendors and enrich and adapt the product information so you will be the preferred seller to the parties being your B2B customers and/or B2C customers.
At Product Data Lake we support business ecosystems in Product Information Management (PIM). And this is not just a nice model. There are concrete business benefits too. 5 for you and 5 for your trading partner: Check our 10 business benefits.
There is a famous saying from the military world stating that: “No plan survives contact with the enemy.” At least one blogger has used the paraphrasing saying: “No plan survives contact with the data.” A good read by the way.
Translating the extended military learning into data management makes a lot of sense too. You may plan data management activities using selected examples and you may test those using nice little samples. Like skirmishes before the real battle in warfare. But if your data management solution goes live on the full load of data for the first time, there most often is news for you.
From my data matching days I remember this clearly as explained in the post Seeing is Believing.
The mitigation is to test with a full load of data before going live. In data management we actually have a realistic way of overcoming the observation made by Field Marshall Helmuth Carl Bernard Graf von Moltke and revisit our plan of operations before the second and serious contact with the full load of data.
Today I read a strange story about who discovered the Americas. It is about that Turkish President Recep Tayyip Erdogan said that Muslims, not Columbus, discovered Americas. The assumed discovery should have happened in the year 1178 in the Gregorian calendar.
2nd there is much speculation about that someone else crossed the oceans. Only archaeological evidence (so far) is that the Vikings were on Newfoundland of the coast of Canada at a place today called L’Anse aux Meadows. That happened around year 1000 in the Gregorian calendar. (By the way they came from Greenland, that geographically is a part of the Americas).
3rdChristopher Columbus and his crew arrived in the Americas in the year 1492 in the Gregorian calendar.
That is the data quality part of the story. The rest is information quality.
The term anachronism is used for something misplaced in time. An example is classical paintings where a biblical event is shown with people in clothes from the time when the painting was done.
In data quality lingo such a flaw will be categorized as lack of timeliness.
The most frequent example of lack of timeliness, or should we say example of anachronism, in data management today is having an old postal address attached to a party master data entity. A remedy for avoiding this kind of anachronism is explained in the post The Relocation Event.
In a recent blog post called 3-2-1 Start Measuring Data Quality by Janani Dumbleton of Experian QAS the timeliness dimension in data quality is examined along with five other important dimensions of data quality. As said herein an impact of anachronism could be:
“Not being aware of a change in address could result in confidential information being delivered to the wrong recipient. “
Throughout history and within art we have seen some false positives too. Here are my three favorites:
The Piltdown Man
In 1912 a British amateur archeologist apparently found a fossil claimed to be the missing link between apes and man: The so called Piltdown Man. Backed up by the British Museum it was a true discovery until 1953 when it was finally revealed as a hoax. It was however disputed during all the years but defended by the British establishment maybe due to envy on the French having a Cro-Magnon man first found there and the Germans having a name giving true discovery in Neandertal.
Eventually the Piltdown Man was exposed as a middle age human upper skull, an orangutan jawbone and chimpanzee teeth.
James and Jimmy Bond
As told in the post My Name is Bond. Jimmy Bond: James Bond is British intelligence and Jimmy Bond is an American agent. It’s always a question if two identities residing in different countries are the same as discussed (about me) in the post Hello Leading MDM Vendor.
Dupond et Dupont
In English they are known as Thomson and Thompson. In the original Belgian/French (and in my childhood Danish comics) piece of art about the adventures of Tintin they are known as Dupond et Dupont. They are two incompetent detectives who look alike and have names with a low edit distance and same phonetic sound. For twin names in a lot of other languages check the Wikipedia article here.