One of my current engagements is within jewelry – or is it jewellery? The use of these two respectively US English and British English words is a constant data quality issue, when we try to standardize – or is it standardise? – to a common set of reference data and a business glossary within an international organization – or is it organisation?
Looking for international standards often does not solve the case. For example, a shop that sells this kind of bijouterie, may be classified with a SIC code being “Jewelry store” or a NACE code being “Retail sale of watches and jewellery in specialised stores”.
A pearl is a popular gemstone. Natural pearls, meaning they have occurred spontaneously in the wild, are very rare. Instead, most are farmed in fresh water and therefore by regulation used in many countries must be referred to as cultured freshwater pearls.
My pearls of wisdom respectively cultured freshwater pearls of wisdom for building a business glossary and finding the common accepted wording for reference data to be used within your company will be:
- Start looking at international standards and pick what makes sense for your organization. If you can live with only that, you are lucky.
- If not, grow the rest of the content for your business glossary and reference data by imitating the international or national standards for your industry, and use your own better wording and additions that makes the most sense across your company.
And oh, I know that pearls of wisdom are often used to imply the opposite of wisdom 🙂
Right now I am working with a MDM (Master Data Management) service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information.
One of the challenges in putting such a service to the market is choosing the best term for the entities handled by the service.
Below is the current selection with the chosen term and some recognized alternate terms used frequently and found in various standards that exists for exchanging product data:
Please comment, if you think there are other English (or variant of English) terms that deserves to be in here.
Yesterday I popped in at the combined Master Data Management Summit Europe 2016 and Data Governance Conference Europe 2016.
This event takes place Monday to Thursday, but unfortunately I only had time and money for the Tuesday this year. Therefore, my report will only be takeaways from Tuesday’s events. On a side note the difficulties in doing something pan-European must have troubled the organisers of this London event as avoiding the UK May bank holidays has ended in starting on a Monday where most of the rest of Europe had a day off due to being Pentecost Monday.
Tuesday morning’s highlight for me was Henry Peyret of Forrester shocking the audience in his Data Governance keynote by busting the myth about the good old excuse for doing nothing, being the imperative of top-level management support, is not true.
Back in 2013 I wondered if graph databases will become common in MDM. Certainly graph databases has become the talk of the town and it was good to learn from Andreas Weber how the Germany based figurine manufacturer Schleich has made a home grown PIM / Product MDM solution based on graph database technology.
Ivo-Paul Tummers of Jibes presented the MDM (and beyond) roadmap for the Dutch food company Sligro. I liked the alley of embracing multi-channel, then omnichannel with self-service at the end of the road and how connect will overtake collect during this journey. This is exactly the reason of being for the Product Data Lake venture I am working on right now.
Is that piece of data wrong or right? This may very well be a question about in what language we are talking about.
In an earlier double post on this blog I had a small quiz about the name of the Pope in the Catholic church. The point was that all possible answers were right as explained in post When Bad Data Quality isn’t Bad Data. The thing is that the Pope over the wold has local variants over the English name Francis. François in French, Franziskus in German, Francesco in Italian, Francisco in Spanish Franciszek in Polish, Frans in Danish and Norwegian and so on.
In today’s globalized, or should I say globalised, world, it is important that our data can be represented in different languages and that the systems we use to handle the data is built for that. The user interface may be in a certain flavor/flavour of English only, but the data model must cater for storing and presenting data in multiple languages and even variants of languages as English in its many forms. Add to that the capability of handling other characters than Latin in other script systems than alphabets as examined in the post called Script Systems.
This challenge is very close to me right when we are building a service for sharing product information in business ecosystems. So will the Product Data Lake be multilingual? Mais oui! Natürlich. Jo da.
PS: The Product Data Lake will actually help with collecting product information in multiple languages through the supply chains of product manufacturers, distributors, retailers and end users.
The Data Quality Landscape – Q1 2015 from Information Difference is out. A bit ironically, the report states that the data quality market for the calendar year 2014 was worth a fraction over $1 billion. As the $ sign could mean a lot of different currencies like CAD, AUD or FJD this statement is very ambiguous, but I guess Andy Hayler means USD.
While there still is a market for standalone data quality tools an increasing part of data quality tooling is actually made with tools being a Master Data Management (MDM) tool, a Data Governance tool, an Extract Load and Transform (ETL) tool, a Customer Relationship Management (CRM) tool or an other kind of tool or software suite.
This topic was recently touched on this blog in the post called Informatica without Data Quality? Herein the reasons behind why the new owners of Informatica did not mention data quality as a future goodie in the Informatica toolbox was examined.
In a follow up mail an Informatica officer explained: “As you know Data Quality has become an integral part of multidomain MDM and of the MDM fueled Product Catalog App. We still serve pure DQ (Data Quality) use cases, but we see a lot growth in DQ as part of MDM initiatives”.
You can read the full DQ Landscape 2015 here.
Back in 1990 Michael Hammer made a famous article called Reengineering Work: Don’t Automate, Obliterate.
Indeed, while automation is a most wanted outcome of Master Data Management (MDM) implementations and many other IT enabled initiatives, you should always consider the alternative being eliminating (or simplifying). This often means thinking out of the box.
As an example I today stumbled upon the Wikipedia explanation about Business Process Mapping. The example used is how to make breakfast (the food part):
You could think about different Business Process Re-engineering opportunities for that process. But you could also realize that this is an English / American breakfast. What about making a French breakfast instead. Will be as simple as:
Input money > Buy croissant > Fait accompli
PS: From the data quality and MDM world one example of making French breakfast instead of English / American breakfast is examined in the post The Good, Better and Best Way of Avoiding Duplicates.
11th of November and it’s time for the first x-mas post on this blog this year. My London gym is to blame for this early start.
Santa’s residence is disputed. As told in the post Multi-Domain MDM, Santa Style one option is Lapland.
Yesterday this yuletide challenge was included in an eMail in my inbox:
Nice. Lapland is in Northern Scandinavia. Scandinavia belongs to that half of the world where comma is used as decimal mark as shown in the post Your Point, My Comma.
So while the UK born gym members will be near fainting doing several thousands of kilometers, I will claim the prize after easy 3 kilometers and 546 meters on the cross trainer.
This is post number 666 on this blog. 666 is the number of the beast. Something diabolic.
The first post on my blog came out in June 2009 and was called Qualities in Data Architecture. This post was about how we should talk a bit less about bad data quality and instead focus a bit more on success stories around data quality. I haven’t been able to stick to that all the time. There are so many good data quality train wrecks out there, as the one told in the post called Sticky Data Quality Flaws.
Some of my favorite subjects around data quality were lined up in Post No. 100. They are:
The biggest thing that has happened in the data quality realm during the five years this blog has been live is probably the rise of big data. Or rather the rise of the term big data. This proves to me that changes usually starts with technology. Then we after sometime starts thinking about processes and finally peoples roles and responsibilities.
The term American exceptionalism is born in the political realm but certainly also applies to other areas including data management.
As a lot of software and today cloud services are made in the USA, the rest of world has some struggle with data standards that only or in high degree applies to the United States.
Some of the common ones are:
In the United States Fahrenheit is the unit of temperature. The rest of the world (with a few exceptions) use Celsius. Fortunately many applications has the ability of switching between those two, but it certainly happens to me once in a while that I uninstall a new exciting app because it only shows temperature in Fahrenheit, and to me 30 degrees is very hot weather.
The Month-Day-Year date format is another American exceptionalism in data management. When dates are kept in databases there is no problem, as databases internally use a counter for a date. But as soon as the date slips into a text format and are used in an international sense, no one can tell if 10/9/2014 is the 10th September as it is seen outside the United States or 9th October as it is seen inside the United States. For example it took LinkedIn years before the service handled the date format accordingly to their international spread, at there are still mix-ups.
Having a state as part of a postal address is mandatory in the United States and only shared with a few other countries as Australia and Canada, though the Canadians calls the similar concept a province. The use of a mandatory state field with only US states present is especially funny when registering online for a webinar about an international data quality solution.
In order to have all my travel arrangements in one place I use a service called TripIt. When I receive eMail confirmations from airlines, hotels, train planners and so, I simply forward those to firstname.lastname@example.org, and within seconds they build or amend to an itinerary for me that is available in an app.
Today I noticed a slight flaw though. I was going by train from London, UK up to the Midlands via a large town in the UK called Reading.
The strange thing in the itinerary was that the interchanges in Reading was placed in chronology after arriving at and leaving the final destination.
A closer look at the data revealed two strange issues:
- Reading was spelled Reading, PA
- The time zone for the interchange was set to EST
Hmmm… There must be a town called Reading in Pennsylvania across the pond. Tripit must, when automatically reading the eMail, have chosen the US Reading for this ambiguous town name and thereby attached the Eastern American time zone to the interchange.
Picking the right Reading for me in the plan made the itinerary look much more sensible.