Liliendahl on Data Quality

Data Management, Never stop learning

7th February 2017Henrik Gabs LiliendahlLeave a comment

Welcome in the class room to Rick Buijserd from The Netherlands as the next guest blog post author:

class-romm

As a child you were happy when the bell ranged and the school day ended. It was time to play with your friends and don’t think about learning anymore, just play! Most of us look back at this time as the best time of our lives. A time without any worries and enjoying every moment of it. Even though it wasn’t the main focus as a child it was also the time that we learned new ideas and things every day. Are we still learning every day? Are you learning new things about data management every day? You should and here is why…

Gaining knowledge

Data is the new oil and many of us make a decent living by advising or consulting companies in this area of expertise. But when time goes by so are the developments and in the technology world this goes fast, very fast. In the last couple of years the data environment has become bigger and bigger. First there was just data in companies, now you have the combine sources of data to get a clear view about. And the sources keep on changing. Big data used to be a word that was undefined and unable to use. And for many it still is, but others use big data to enrich and enable growth for their companies. By just summing this up you see the changes that happened in the last couple of years and you have to keep up to stay relevant. Learn and gain knowledge is the only key to success in the long term. Artificial Intelligence and Machine Learning powered by optimal use of data and data management will take over many tasks but in the end human creativity and the ability to learn will provide success and the power to make the difference.

Data Management is never finished and neither is learning about it

As you have been in the world of data management you should know that data management is never finished and so is the possibility of gaining knowledge. New books about data management are published recently, research firms keep on researching and find new discoveries. And many companies use the evolution of the technology to grow. Also Communities are built around topics on many different platforms. The possibility to learn is everywhere! Use it in your benefit, data management is never finished…

Rick Buijserd is author and owner of the platform Data Management Experts and a young professional with experience in the world of data. He started his career at a well-known software vendor as channel manager where he learned the skills of indirect sales and managing partners. Financial, HR, Logistics, Warehousing and PSA were the main elements of his software sales. Building relationships with experts and other vendors are part of his DNA.

After a couple of years he decided to make a switch and landed in the world of accountancy firms. In this period he enabled himself to become a trusted advisor of many accountancy firms in The Netherlands. The area of finance, financial reporting, tax, auditing and other accountancy related activities are no secret to him. Together with his clients he developed many solutions to solve their challenges. In this period the love for data management came above. Accountancy firms are the ultimate example of being data driven. It is all they know.

In the most recent period of his career he stepped into the world of multinationals and as off today he is still active in this world advising around data management and selling software solutions to multinationals who have challenges in the area of data management. Also he is an expert in the area of social selling via LinkedIn and this knowledge has been brought into practice via a LinkedIn Group for Dutch Data Management Experts in which he gathers the top data management experts from the largest companies in The Netherlands to discuss all kind of data related topics.

Cross Border Product Data Flows

4th February 20174th February 2017Henrik Gabs LiliendahlLeave a comment

The below figure shows the cross border data flows on this planet. There are inter-regional data flows and there are flows between the worldwide regions:

cross-boarder-data-flows

Now, a small part of this data will be product data exchanged between trading partners participating in global business ecosystems. While I have no data on if product data are distributed by the same proportions as data in general, it will be a qualified guess, that the picture will look somewhat the same.

Exchanging product data across borders has some challenges:

Language is an issue. Product data will eventually have to be translated into the language of the end buyer, if this is not the language in which the product data originally are provided. The definitions (metadata) of product data will also be subject to translation. Even the language of the transmission tools would not be in English all over.
Regulations around product data are different from country to country.
The cultural content of the optimal data describing a product in structured data elements and related digital assets are different between countries and regions.

At Product Data Lake, we are, from the center of the largest green bubble, looking for ambassadors around the world who are able to link the in-house product information management solutions at trading partners.

Interested? Get in contact:

← Back

Thank you for your response. ✨

What’s in an Address (and a Product)?

2nd February 20172nd February 2017Henrik Gabs LiliendahlLeave a comment

Our company Product Data Lake has relocated again. Our new address, in local language and format, is:

Havnegade 39
1058 København K
Danmark

If our address were spelled and formatted as in England, where the business plan was drafted, the address would have looked like this:

The Old Seed Office
39 Harbour Street
Copenhagen, 1058 K
Danelaw

Across the pond, a sunny address could look like this:

39 Harbor Drive
Copenhagen, CR 1058
U.S. Virgin Islands

copenhagen_havnegade Now, the focal point of Product Data Lake is not the exciting world of address data quality, but product data quality.

However, the same issues of local and global linguistic and standardization – or should I say standardisation – issues are the same.

Our lovely city Copenhagen has many names. København in Danish. Köpenhamn in Swedish. Kopenhagen in German. Copenhague in French.

So have all the nice products in the world. Their classifications and related taxonomy are in many languages too. Their features can be spelled in many languages or be dependent of the country were to be sold. The documents that should follow a product by regulation are subject to diversity too.

Handling all this diversity stuff is a core capability for product data exchange between trading partners in Product Data Lake.

Painting WWII Bombers and Product Data: It Is All in the Details

31st January 2017Henrik Gabs LiliendahlLeave a comment

Today’s guest blog post is from Dan O’Connor, a United States based product data taxonomy guru. Here are Dan’s thoughts on product data quality:

I have had a few days off this past week while I transition to a new role. During that time, I’ve had time to reflect on many things, as well as pursue some personal interests. I talked with peers and former co-workers, added a fresh coat of paint to my basement, and worked on some WWII era bomber models I purchased before Christmas but never had time for.

The third pursuit was a rather interesting lesson in paying attention to details. The instructions would say to paint an individual piece one color, but that piece would comprise of several elements that should never be painted a single color. For example, the flight yokes on Mitchell were planned to be painted black, but in viewing pictures online I saw that certain parts were white, red and aluminum. I therefore painted them appropriately. These yokes are less than an inch long and a couple millimeters wide, but became much more impressive with an appropriate smattering of color.

Flight Yokes and Product Taxonomies

It is this attention to detail that made me think about how product taxonomies are developed. Some companies just follow the instructions, and end up with figurative “black flight yokes”. These taxonomies perform adequately, allowing a base level of product detail to be established. Web sites and catalogs can be fed with data and all is well.

Other companies see past the black flight yokes. They need the red buttons, the white grips, and the silver knobs because they know these data points are what make their product data more real. They could have followed the instructions, but being better than the instructions was more important.

Imagine for a second that the instructions were the mother of the data and the plane itself was the father. According to the mother plain black flight yokes are sufficient. The father, while capable of being so much more, ends up with the dull data the mother provides. Similarly, if the plane/father has no options that allow it to be more colorful the instructions from the mother are meaningless beyond the most basic interpretations.

The Mother and Father of Product Data

To some my analogy might be a stretch, but think of it in these terms: Your product taxonomy is the mother of your product data, and the architecture that supports that taxonomy is the father. If your taxonomy only supports a generic level of data, the architecture supporting it cannot add more detail. If the architecture is limited the most robust product taxonomy will still only support the most basic of data. Your product data quality is limited by the taxonomy you build and the systems you use to manage it. If both are well-developed beautiful product data is born. If one or both is limited your product data will be an ugly mess.

Why is this important? Product data does more than validate the image has the right color on a web site, or make sure an item will fit in your kitchen or TV room. Product data feeds faceting experiences so that customers to your web site can filter down to the perfect product. Without facets customers have to search manually through more products, and may get frustrated and leave your web site before finding the item they want.

Product data also can feed web site search, allowing customers to find your products using product descriptors instead of just product numbers and short descriptions. These search options also filter out unnecessary results, allowing a customer to find the perfect product faster.

Product data might also be used by the marketplaces that sell your data, your catalogs, product data sheets, and even your shelf tags in your retail locations. Having one consistent source of data for those usages avoids customer confusion when they approach your business from an omni-channel perspective. Having to find a product on a shelf when the mobile experience has a different description is painful and leads to bad customer experiences.

Lastly, moving data between your business and others is problematic at the best of times. Poor product data leads to bad data dissemination, which leads to bad customer experiences across your syndication channels. If you cannot represent your data in a single logically message internally your external message will be chaotic and confusing for your guests.

The Elements of a Product Data Program

Therefore, creating a good product taxonomy is not just about hiring a bunch of taxonomists and having them create a product taxonomy. It is about taxonomy best practices, data governance, and understanding your entire product data usage ecosystem, both internally and externally. It is understanding what role Product Information Management systems play in data management, and more importantly what role they do not.

Therefore, in the analogy of a mother product taxonomy and a father architecture creating data, there are siblings, aunts, uncles, and other relatives to understand as well. A lack of understanding in any one of these relationships can cause adverse data quality issues to shine through. It is estimated that companies lose an average of $8 Million US dollars a year (ROI on Data Quality, 2014) due to data quality issues. Can your business afford to keep ignoring your product data issues?

Dan O’Connor is a Product Taxonomy, Product Information Management (PIM), and Product Data Consultant and an avid blogger on taxonomy topics. He has developed taxonomies for major retails as well as manufacturers and distributors, and assists with the development of product data models for large and small companies. See his LinkedIn bio for more information.

What Will you Complicate in the Year of the Rooster?

28th January 201728th January 2017Henrik Gabs Liliendahl1 Comment

rooster-6 Today is the first day in the new year. The year of the rooster according to the Lunar Calendar observed in East Asia. One of the characteristics of the year of the rooster is that in this year, people will tend to complicate things.

People usually likes to keep things simple. The KISS principle – Keep It Simple, Stupid – has many fans. But not me. Not that I do not like to keep things simple. I do. But only as simple as it should be as Einstein probably said. Sometimes KISS is the shortcut to getting it all wrong.

When working with data quality I have come across the three below examples of striking the right balance in making things a bit complicated and not too simple:

Deduplication

One of the most frequent data quality issues around is duplicates in party master data. Customer, supplier, patient, citizen, member and many other roles of legal entities and natural persons, where the real world entity are described more than once with different values in our databases.

In solving this challenge, we can use methods as match codes and edit distance to detect duplicates. However, these methods, often called deterministic, are far too simple to really automate the remedy. We can also use advanced probabilistic methods. These methods are better, but have the downside that the matching done is hard to explain, repeat and reuse in other contexts.

My best experience is to use something in between these approaches. Not too simple and not too overcomplicated.

Address verification

You can make a good algorithm to perform verification of postal and visit addresses in a database for addresses coming from one country. However, if you try the same algorithm on addresses from another country, it often fails miserably.

Making an algorithm for addresses from all over the world will be very complicated. I have not seen one yet, that works.

My best experience is to accept the complication of having almost as many algorithms as there are countries on this planet.

Product classification

Classifications of products controls a lot of the data quality dimensions related to product master data. The most prominent example is completeness of product information. Whether you have complete product information is dependent on the classification of the product. Some attributes will be mandatory for one product but make no sense at all to another product by a different classification.

If your product classification is too simple, your completeness measurement will not be realistic. A too granular or other way complicated classification system is very hard to maintain and will probably seem as an overkill for many purposes of product master data management.

My best experience is that you have to maintain several classification systems and have a linking between them, both inside your organization and between your trading partners.

Happy New Lunar Year

IT is not the opposite of the business, but a part of it

26th January 2017Henrik Gabs LiliendahlLeave a comment

Yin and yang During my professional work and not at least when following the data management talk on social media I often stumble upon sayings as:

IT should not drive a CRM / MDM / PIM / XXX project. The business should do that.
IT should not be responsible for data quality. The business should be that.

I disagree with that. Not that the business should not do and be those things. But because IT should be a part of the business.

I have personally always disliked the concept of dividing a company into IT and the business. It is a concept practically only used by the IT (and IT focused consulting) side. In my eyes, IT is part of the business just as much as marketing, sales, accounting and all the other departmental units.

With the raise of digitalization the distinction between IT and the business becomes absolutely ridiculous – not to say dangerous.

We need business minded IT people and IT savvy business people to drive digitilization and take responsibility of data quality.

Used abbreviations:

IT = Information Technology
CRM = Customer Relationship Management
MDM = Master Data Management
PIM = Product Information Management

Party and Product: The Core Entities in Most Data Models

24th January 201724th January 2017Henrik Gabs LiliendahlLeave a comment

Party and product are the most frequent master data domains around.

Often you meet party as one of the most frequent party roles being customer and supplier (or vendor) or by another term related to the context as for example citizen, patient, member, student, passenger and many more. These are the people and legal entities we are interacting with and with whom we usually exchange money – and information.

Product (or material) is the things we buy, make and sell. The goods (or services) we exchange.

In my current venture called Product Data Lake our aim to serve the exchange of information about products between trading partners who are customers and suppliers in business ecosystems.

For that, we have been building a data model. Below you see our first developed conceptual data model, which has party and product as the core entities.

PDL concept model.png

As this is a service for business ecosystems, another important entity is the partnership between suppliers and customers of products and the information about the products.

The product link entity in this data model is handling the identification of products by the pairs of trading partners. In the same way, this data model has link entities between the identification of product attributes at pair of trading partners (build on same standards or not) as well as digital asset types.

If you are offering product information management services, at thus being a potential Product Data Lake ambassador, or you are part of a business ecosystem with trading partners, I will be happy to discus with you about adding handling of trading partnerships and product information exchange to your current model.

← Back

Thank you for your response. ✨

Who will become Future Leaders in the Gartner Multidomain MDM Magic Quadrant?

21st January 201723rd January 2017Henrik Gabs Liliendahl10 Comments

Gartner emphasizes that the new Magic Quadrant for Master Data Management Solutions Published 19 January 2017 is not solely about multidomain MDM or a consolidation of the two retired MDM quadrants for customer and product master data. However, a long way down the report it still is.

If you want a free copy both Informatica here and Riversand here offers that.

The Current Pole Position and the Pack

The possible positioning was the subject in a post here on the blog some while ago. This post was called The Gartner Magic Quadrant for MDM 2016. The term 2016 has though been omitted in the title of the final quadrant probably because it took into 2017 to finalize the report as reported in the post Gartner MDM Magic Quadrant in Overtime.

Below is my look at the positioning in the current quadrant:

mdm-mq

Starting with the multidomain MDM point the two current leaders, Informatica and Orchestra, have made their way to multidomain in two different ways. Pole position vendor Informatica has used mergers and acquisitions with the old Siperian MDM solution and the Heiler PIM (Product Information Management) solution to build the multidomain MDM leadership. Orchestra Networks has built a multidomain MDM solution from the gound.

The visionary Riversand is coming in from the Product MDM / PIM world as a multidomain MDM wannabe and so is the challenger Stibo. I think SAP is in their right place: Enormous ability to execute with not so much vision.

If you go through the strengths and cautions of the various vendors, you will find a lot of multidomain MDM views from Gartner.

The Future Race

While the edges of the challengers and visionaries’ quadrants are usually empty in a Gartner magic quadrant, the top right in this first multidomain MDM quadrant from Gartner is noticeably empty too. So who will we see there in the future?

Gartner mentions some interesting upcoming vendors earning too little yet. Examples are Agility Multichannel (a Product Data Lake ambassador by the way), Semarchy and Reltio.

The future race track will according to Gartner go through:

MDM and the Cloud
MDM and the Internet of Things
MDM and Big Data

PS: At Product Data Lake we are heading there in full speed too. Therefore, it will be a win-win to see more MDM vendors joining as ambassadors or even being more involved.

Data Born Companies and the Rest of Us

18th January 201728th January 2017Henrik Gabs LiliendahlLeave a comment

This post is a new feature here on this blog, being guest blogging by data management professionals from all over the world. First up is Harri Juntunen, Partner at Twinspark Consulting in Finland:

Data and clever use of data in business has had and will have significant impact on value creation in the next decade. That is beyond reasonable doubt. What is less clear is, how this is going to happen? Before we answer the question, I think it is meaningful to make a conceptual distinction between data born companies and the rest of us.

Data born born companies are companies that were conceived from data. Their business models are based on monetising clever use of data. They have organised everything from their customer service to operations to be capable of maximally harness data. Data and capabilities to use data to create value is their core competency. These companies are the giants of data business: Google, Facebook, Amazon, Über, AirBnB. The standard small talk topics in data professionals’ discussions.

However, most of the companies are not data born. Most of the companies were originally established to serve a different purpose. They were founded to serve some physical needs and actually maintaining them physically, be it food, spare parts or factories. Obviously, all of these companies in e.g. manufacturing and maintenance of physical things need data to operate. Yet, these companies are not organised around the principles of data born companies and capabilities to harness data as the driving force of their businesses.

We hear a lot of stories and successful examples about how data born companies apply augmented intelligence and other latest technology achievements. Surely, technologies build around of data are important. The key question to me is: what, in practice, is our capability to harness all of these opportunities in companies that are not data born?

In my daily practice I see excels floating around and between companies. A lot of manual work caused by unstandardised data, poor governance and bad data quality. Manual data work simply prevents companies to harness the capabilities created by data born companies. Yet, most of the companies follow the data born track without sufficient reflection. They adopt the latest technologies used by the data born companies. They rephrase same slogans: automation, advanced analytics, cognitive computing etc. And yet, they are not addressing the fundamental and mundane issues in their own capabilities to be able to make business and create value with data. Humans are doing machine’s job.

Why? Many things relate to this, but data quality and standardization are still pressing problems in every day practice in many companies. Let alone between companies. We can change this. The rest of us can reborn from data just by taking a good look of our mundane data practices instead of aspiring to go for the next big thing.

P.S. The Google Brain team had reddit a while ago and they were asked “what do you think is underrated?”

The answer:

“Focus on getting high-quality data. “Quality” can translate to many things, e.g. thoughtfully chosen variables or reducing noise in measurements. Simple algorithms using higher-quality data will generally outperform the latest and greatest algorithms using lower-quality data.”

https://www.reddit.com/r/MachineLearning/comments/4w6tsv/ama_we_are_the_google_brain_team_wed_love_to/

About Harri Juntunen:

Harri is seasoned data provocateur and ardent advocate of getting the basics right. Harri says: People and data first, technology will follow.

You can contact Harri here:

+358 50 306 9296

harri.juntunen@twinspark.fi

www.twinspark.fi

Three Must Haves for your Data Lake

13th January 2017Henrik Gabs LiliendahlLeave a comment

Whether the data lake concept is a good idea or not is discussed very intensively in the data management social media community.

Lake The fear, and actual observations made, is that that a data lake will become a data dump. No one knows what is in there, where it came from, who is going to clean up the mess and eventually have a grip on how it should be handled in the future – if there is a future for the data lake concept.

Please folks. We have some concepts from the small data world that we must apply. Here are three of the important ones:

Metadata

In short, metadata is data about data. Even though the great thing about a data lake is that the structure and all purposes of the data does not have to be cut in stone beforehand, at least all data that is delivered to a data lake must be described. An example of such an implementation is examined in the post Sharing Metadata.

Data Lineage

You must also have the means to tag who delivered the data. If your data lake is within a business ecosystem, this should include the legal entity that has provided the data as told in the post Using a Business Entity Identifier from Day One.

Data Governance

Above all, you must have a framework to govern ownership (Responsibility, Accountability, Consultancy and who must be Informed), policies and standards and other stuff we know from a data governance framework. If the data lake expand across organizations by incorporating second party and third party data, we need a cross company data governance framework as for example highlighted on Product Data Lake Documentation and Data Governance.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph