Diversity – Liliendahl on Data Quality

The Most Annoying Way of Presenting Data

30th March 202130th March 2021Henrik Gabs Liliendahl2 Comments

Polls are popular on LinkedIn and I have been a sinner of making a few too recently.

One was about what way of presenting data (data format) that is the most annoying.

There were the 4 mentioned above to choose from.

The MM/DD/YYYY date format is in use practically only in the United States. In the rest of the world either the DD/MM/YYYY format or the ISO recommended YYYY-MM-DD format is the chosen one. The data quality challenge appears when you see a date as 03/02/2021 in an international context, because this can be either March, 2 or 3^rd February.

The 12-hour clock with AM and PM postfix, is more commonly in use around the world. But obviously the 12-hour clock is not as well thought as the 24-hour clock. We need some digital transformation here.

Imperial units of measure like inch, foot, yard, pound, and more is far less logical and structured compared to the metric system. Only 3 countries around the world – United States, Myanmar and Liberia has not adopted the metric system. And then there is United Kingdom, who has adopted the metric system in theory, but not in practice.

The Fahrenheit temperature scale is something only used in the United States opposite to Celsius (centigrade) used anywhere else. When someone writes that it is 30 degrees outside that could be quite cold or rather hot if there is no unit of measure applied.

Another example of international trouble mentioned in the comments to the poll is decimal point. In English writing you will use a dot for the decimal point, in many other cultures you use a comma as decimal point.

Most of the annoyance are handled by that mature software have settings where you can set your preferences. The data quality issues arise when these data are part of a text including when software must convert a text into a number, date or time.

If you spot some grey colour (or is it color) in my hair, I blame varying data formats in CSV files, SQL statements, emails and more.

10 Years

29th June 2019Henrik Gabs Liliendahl4 Comments

This blog has now been online for 10 years.

Looking back at the first blog posts I think the themes touched are still valid.

The first post from June 2009 was about data architecture. 2000 years ago, the roman writer, architect and engineer Marcus Vitruvius Pollio wrote that a structure must exhibit the three qualities of firmitas, utilitas, venustas — that is, it must be strong or durable, useful, and beautiful. This is true today – both in architecture and data architecture – as told in the post Qualities in Data Architecture.

A recurring topic on this blog has been a discussion around the common definition of data quality as being that the data is fit for the intended purpose of use. The opening of this topic as made in the post Fit for what purpose?

brueghel-tower-of-babel — Tower of Babel by Brueghel

Diversity in data quality has been another repeating topic. Several old tales including in the Genesis and the Qur’an have stories about a great tower built by mankind at a time with a single language of all people. Since then mankind was confused by having multiple languages. And indeed, we still are as pondered in the post The Tower of Babel.

Thanks to all who are reading this blog and not least to all who from time to time takes time to make a comment, like and share.

A Master Data Mind Map

3rd May 20194th May 2019Henrik Gabs LiliendahlLeave a comment

Please find below a mind map with some of the data elements that are considered to be master data.

The map is in no way exhaustive and if you feel some more very important and common data elements should be there, please comment.

The data elements are grouped within the most common master data domains being party master data, product master data and location master data.

Some of the data elements have previously been examined in posts on this blog. This include:

Product classifications as GPC, ETIM, eClass, Harmonized System (HS) and UNSPSC in the post Five Product Classification Standards.
Product identification codes as GTIN, EAN and UPC in the post Visiting the Product Information Castle.
Duns-Number, SIREN and registration numbers in the post Single Company View.
SIC and NACE codes in the post The World of Reference Data.
ZIP, PLZ and PIN in the post Some Kinds of Reference Data.

The mind map has a selection of flags around where master data are geographically dependent. Again, this is not exhaustive. If you have examples of diversities within master data, please also comment.

Where a Major Tool is Not So Cool

1st February 201818th April 2019Henrik Gabs LiliendahlLeave a comment

During my engagements in selecting and working with the major data management tools on the market, I have from time to time experienced that they often lack support for specialized data management needs in minor markets.

Two such areas I have been involved with as a Denmark based consultant are:

Address verification
Data masking

Address verification:

The authorities in Denmark offers a free of charge access to very up to data and granular accurate address data that besides the envelope form of an address also comes with a data management friendly key (usually referred to as KVHX) on the unit level for each residential and business address within the country. Besides the existence of the address you also have access to what activity that takes place on the address as for example if it is a single-family house, a nursing home, a campus and other useful information for verification, matching and other data management activities.

If you want to verify addresses with the major international data managements tools I have come around, much of these goodies are gone, as for example:

Address reference data are refreshed only once per quarter
The key and the access to more information is not available
A price tag for data has been introduced

Data Masking:

In Denmark (and other Scandinavian countries) we have a national identification number (known as personnummer) used much more intensively than the national IDs known from most other countries as told in the post Citizen ID within seconds.

The data masking capabilities in major data management solutions comes with pre-build functions for national IDs – but only covering major markets as the United States Social Security Number, the United Kingdom NINO and the kind of national id in use in a few other large western countries.

So, GDPR compliance is just a little bit harder here even when using a major tool.

Data Masking National ID.png — From IBM Data Masking documentation

The Problem with English

14th June 2017Henrik Gabs Liliendahl4 Comments

– and many other languages

This blog is in English. However, as a citizen in a country where English is not the first language, I have a problem with English. Which flavour or flavor of English should I use? US English? British English? Or any of the many other kinds of English?

It is, in that context, more a theoretical question than a practical one. Despite what Grammar Nazis might think, I guess everyone understands the meaning in my blend of English variants and occasional other spelling mistakes.

The variants of English, spiced up with other cultural and administrative differences, does however create real data quality issues as told in the post Cultured Freshwater Pearls of Wisdom.

English When working with Product Data Lake, a service for sharing product information between trading partners, we also need to embrace languages. In doing that we cannot just pick English. We must make it possible to pick any combination of English and country where English is (one of) the official language(s). The same goes for Spanish, German, French, Portuguese, Russian and many other languages in the extend that products can be named and described with different spelling (in a given alphabet or script type).

You always must choose between standardization or standardisation.

Product Data Lake Reaches Multi-Lingual Milestone Number Three

3rd April 2017Henrik Gabs LiliendahlLeave a comment

Multi-lingual capabilities is one of the core capabilities in the product information sharing service Product Data Lake.

During our market introduction, we have had three milestones:

Product information can be exchanged in multiple languages – or rather cultures, being a combination of a language and a country. Product Data Lake was born with this core capability back in September 2016.
Product information can be defined in multiple languages. Our February 2017 release introduced metadata in multiple cultures.
Product information can be handled in multiple languages. Today we have released our multi-lingual user interface. The idea behind Product Data Lake is actually not, that you should spend much time in the user interface. You only need to set up the automation of product information exchange. Now, you have the possibility to do that in your preferred language.

Multi-Lingual DK

However, this is not a fait accompli. Mañana, there will be more feinschmeckerei. Next multi-lingual feature will be access to classification and metadata in many languages from various general and industry standards for product information starting with the ETIM standard for technical products.

You can learn more about Product Data Lake here.

I am afraid that Gartner does not help

21st March 2017Henrik Gabs LiliendahlLeave a comment

“The average financial impact of poor data quality on organizations is $9.7 million per year.” This is a quote from Gartner, the analyst firm, used by them to promote their services in building a business case for data quality.

Average While this quote rightfully emphasizes on that a lot of money is at stake, the quote itself holds a full load of data and information quality issues.

On the pedantic side, the use of the $ sign in international communication is problematic. The $ sign represents a lot of different currencies as CAD, AUD, HKD and of course also USD.

Then it is unclear on what basis this average is measured. Is it among the +200 million organizations in the Dun & Bradstreet Worldbase? Is it among organizations on a certain fortune list? In what year?

Even if you knew that this is an average in a given year for the likes of your organization, such an average would not help you justify allocation of resources for a data quality improvement quest in your organization.

I know the methodology provided by Gartner actually is designed to help you with specific return on investment for your organization. I also know from being involved in several business cases for data quality (as well as Master Data Management and data governance) that accurately stating how any one element of your data may affect your business is fiendishly difficult.

I am afraid that there is no magic around as told in the post Miracle Food for Thought.

Cross Border Product Data Flows

4th February 20174th February 2017Henrik Gabs LiliendahlLeave a comment

The below figure shows the cross border data flows on this planet. There are inter-regional data flows and there are flows between the worldwide regions:

cross-boarder-data-flows

Now, a small part of this data will be product data exchanged between trading partners participating in global business ecosystems. While I have no data on if product data are distributed by the same proportions as data in general, it will be a qualified guess, that the picture will look somewhat the same.

Exchanging product data across borders has some challenges:

Language is an issue. Product data will eventually have to be translated into the language of the end buyer, if this is not the language in which the product data originally are provided. The definitions (metadata) of product data will also be subject to translation. Even the language of the transmission tools would not be in English all over.
Regulations around product data are different from country to country.
The cultural content of the optimal data describing a product in structured data elements and related digital assets are different between countries and regions.

At Product Data Lake, we are, from the center of the largest green bubble, looking for ambassadors around the world who are able to link the in-house product information management solutions at trading partners.

Interested? Get in contact:

What’s in an Address (and a Product)?

2nd February 20172nd February 2017Henrik Gabs LiliendahlLeave a comment

Our company Product Data Lake has relocated again. Our new address, in local language and format, is:

Havnegade 39
1058 København K
Danmark

If our address were spelled and formatted as in England, where the business plan was drafted, the address would have looked like this:

The Old Seed Office
39 Harbour Street
Copenhagen, 1058 K
Danelaw

Across the pond, a sunny address could look like this:

39 Harbor Drive
Copenhagen, CR 1058
U.S. Virgin Islands

copenhagen_havnegade Now, the focal point of Product Data Lake is not the exciting world of address data quality, but product data quality.

However, the same issues of local and global linguistic and standardization – or should I say standardisation – issues are the same.

Our lovely city Copenhagen has many names. København in Danish. Köpenhamn in Swedish. Kopenhagen in German. Copenhague in French.

So have all the nice products in the world. Their classifications and related taxonomy are in many languages too. Their features can be spelled in many languages or be dependent of the country were to be sold. The documents that should follow a product by regulation are subject to diversity too.

Handling all this diversity stuff is a core capability for product data exchange between trading partners in Product Data Lake.

	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Conrad Greer on SAP and Master Data Manag…
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Michael Fieg, Parsio… on SAP and Master Data Manag…
	Asifa on Data Fabric and Master Data…
	Henrik Gabs Lilienda… on Data Fabric and Master Data…