Data Management, Never stop learning

Welcome in the class room to Rick Buijserd from The Netherlands as the next guest blog post author:


As a child you were happy when the bell ranged and the school day ended.  It was time to play with your friends and don’t think about learning anymore, just play! Most of us look back at this time as the best time of our lives. A time without any worries and enjoying every moment of it. Even though it wasn’t the main focus as a child it was also the time that we learned new ideas and things every day. Are we still learning every day? Are you learning new things about data management every day? You should and here is why…

Gaining knowledge

Data is the new oil and many of us make a decent living by advising or consulting companies in this area of expertise. But when time goes by so are the developments and in the technology world this goes fast, very fast. In the last couple of years the data environment has become bigger and bigger. First there was just data in companies, now you have the combine sources of data to get a clear view about. And the sources keep on changing. Big data used to be a word that was undefined and unable to use. And for many it still is, but others use big data to enrich and enable growth for their companies. By just summing this up you see the changes that happened in the last couple of years and you have to keep up to stay relevant. Learn and gain knowledge is the only key to success in the long term. Artificial Intelligence and Machine Learning powered by optimal use of data and data management will take over many tasks but in the end human creativity and the ability to learn will provide success and the power to make the difference.

Data Management is never finished and neither is learning about it

As you have been in the world of data management you should know that data management is never finished and so is the possibility of gaining knowledge. New books about data management are published recently, research firms keep on researching and find new discoveries. And many companies use the evolution of the technology to grow. Also Communities are built around topics on many different platforms. The possibility to learn is everywhere! Use it in your benefit, data management is never finished…

data-management-expertsRick Buijserd is author and owner of the platform Data Management Experts and a young professional with experience in the world of data. He started his career at a well-known software vendor as channel manager where he learned the skills of indirect sales and managing partners. Financial, HR, Logistics, Warehousing and PSA were the main elements of his software sales. Building relationships with experts and other vendors are part of his DNA.

rickAfter a couple of years he decided to make a switch and landed in the world of accountancy firms. In this period he enabled himself to become a trusted advisor of many accountancy firms in The Netherlands. The area of finance, financial reporting, tax, auditing and other accountancy related activities are no secret to him. Together with his clients he developed many solutions to solve their challenges. In this period the love for data management came above. Accountancy firms are the ultimate example of being data driven. It is all they know.

In the most recent period of his career he stepped into the world of multinationals and as off today he is still active in this world advising around data management and selling software solutions to multinationals who have challenges in the area of data management. Also he is an expert in the area of social selling via LinkedIn and this knowledge has been brought into practice via a LinkedIn Group for Dutch Data Management Experts in which he gathers the top data management experts from the largest companies in The Netherlands to discuss all kind of data related topics.

What’s in an Address (and a Product)?

Our company Product Data Lake has relocated again. Our new address, in local language and format, is:

Havnegade 39
1058 København K

If our address were spelled and formatted as in England, where the business plan was drafted, the address would have looked like this:

The Old Seed Office
39 Harbour Street
Copenhagen, 1058 K

Across the pond, a sunny address could look like this:

39 Harbor Drive
Copenhagen, CR 1058
U.S. Virgin Islands

copenhagen_havnegadeNow, the focal point of Product Data Lake is not the exciting world of address data quality, but product data quality.

However, the same issues of local and global linguistic and standardization – or should I say standardisation – issues are the same.

Our lovely city Copenhagen has many names. København in Danish. Köpenhamn in Swedish. Kopenhagen in German. Copenhague in French.

So have all the nice products in the world. Their classifications and related taxonomy are in many languages too. Their features can be spelled in many languages or be dependent of the country were to be sold. The documents that should follow a product by regulation are subject to diversity too.

Handling all this diversity stuff is a core capability for product data exchange between trading partners in Product Data Lake.

Cross Border Master Data Management

One of the most intriguing sides of data quality and Master Data Management (MDM) is, in my eyes, how you can extend a national solution to an international solution.

Google EarthMany implementations starts with a national scope and we also see many tools and services built for a national scope. Success on a national scale does unfortunately not always guarantee success on an international scale.

Besides all the important stuff around different culture challenges and how to drive change management in an international environment, there are also some things about the master data itself that are challenging.

  • Location Master Data is probably the most obvious domain where we face issues when going international. Postal addresses are formatted differently around the world. Approximately half of the world puts the house number in front of the street name, approximately half of the world puts the house number after the street name and then in some places you don’t use house numbers on a street, but in blocks. City and postal code has the same issue. The worst solutions here tries to put the rest of the world into the first implemented national solution as told in the post Nationally International.
  • Party Master Data, also when looking beyond postal addresses, must encompass many national constraints and opportunities, not at least when it comes to exploiting third party data:
    • Utilizing business directories is one common way. Here you have to balance the use of many different best of breed national providers or taking it from a more harmonized provider of an international directory. Where I (also) work right now, we have chosen the latter solution as reported in the post Using a Business Entity Identifier from Day One.
    • If you, as I am, are coming from Scandinavia you are also amazed by the difficulties around the world there are in healthcare, elections and other areas when there is no public available national identifier for citizens as examined in the post Counting Citizens.
  • Product Master Data does in many ways look the same across countries. However, standards for product data often still are specific to a single or a specific range of countries. Also, if the national implementation was not in a country with multiple languages and the international scope includes more languages, you must encompass multilingual capacities for product information management.

What have you experienced when going from national to international?

Alternatives to Product Data Lake

Within Product Information Management (PIM) there is a growing awareness about that sharing product information between trading partners is a very important issue.

So, how do we do that? We could do that, on a global scale, by using:

  • 1,234,567,890 spreadsheets
  • 2,345,678 customer data portals
  • 901,234 supplier data portals

Spreadsheets is the most common mean to exchange product information between trading partners today. The typical scenario is that a receiver of product information, being a downstream distributor, retailer or large end user, will have a spreadsheet for each product group that is sent to be filled by each supplier each time a new range of products is to be on-boarded (and potentially each time you need a new piece of information). As a provider of product information, being a manufacturer or upstream distributor, you will receive a different spreadsheet to be filled from each trading partner each time you are to deliver a new range of products (and potentially each time they need a new piece of information).

Customer data portals is a concept a provider of product information may have, plan to have or dream about. The idea is that each downstream trading partner can go to your customer data portal, structured in your way and format, when they need product information from you. Your trading partner will then only have to deal with your customer data portal – and the 1,234 other customer data portals in their supplier range.

Supplier data portals is a concept a receiver of product information may have, plan to have or dream about. The idea is that each upstream trading partner can go to your supplier data portal, structured in your way and format, when they have to deliver product information to you. Your trading partner will then only have to deal with your supplier data portal – and the 567 other supplier data portals in their business-to-business customer range.

Product Data Lake is the sound alternative to the above options. Hailstorms of spreadsheets does not work. If everyone has either a passive customer data portal or a passive supplier data portal, no one will exchange anything. The solution is that you as a provider of product information will push your data in your structure and format into Product Data Lake each time you have a new product or a new piece of product information. As a receiver you will set up pull requests, that will give you data in your structure and format each time you have a new range of products, need a new piece of information or each time your trading partner has a new piece of information.

Learn more about how that works in Product Data Lake Documentation and Data Governance.

Potential number of solutions / degree of dissatisfaction / total cost of ownership


Shipping Product Information

When looking out of the windows from Product maersk-seen-from-pdl-in-sunshineData Lake global headquarters (well, that is also our home office) we see our neighbour, which is the global headquarters of Maersk, a major worldwide operating shipping company.

In all humbleness we do very parallel business. Maersk is good at moving goods. We are going to move data about the goods. Product data or product information if you like.

The reason of being for a shipping company is that it would be very ineffective for each manufacturer of goods, if they should arrange and carry out the transportation of their manufactured goods to each distributor around the world. Furthermore, it would be equally ineffective, if each distributor should arrange and carry out the transportation of their range of goods to each reseller or large end buyer.

Until now, this ineffectiveness has unfortunately been the case when it comes to exchanging data about the goods. Manufacturers are asked by their distributors to provide product information in a different way for each – most often meaning in a different spreadsheet. And the same craziness repeats itself when it comes to exchanging data between distributors, resellers and large end users of product information.

At Product Data Lake we have set sail to end this insanity and bring digitalization to shipping of product information. Learn more about how exactly we will arrange that journey on Product Data Lake Documentation and Data Governance.

Bookmark and Share

Sustainability Data in PIM

The collection of product data to be handled within PIM (Product Information Management) systems are ever increasing. End customers want more and more data to support purchase decisions.

This theme was pondered in the post Self-Service Ready Product Data.

One new kind of product data to beware of in the future is information about sustainability measures related to a given product. This is information about the environmental impact and the social impact from producing and consuming a product.

As the founder of the Product Data Lake, a solution for exchanging product data in business ecosystems, I am very pleased that sustainability information will be included as an important kind of product data ready to be exchanged between trading partners.

Earth Accounting

This is due to a cooperation with Earth Accounting. The Product Data Lake will be an integrated part of the information cooperative, where the Product Data Lake will facilitate forward looking manufacturers in providing their own sustainability measures along with all other kind of product data and where progressive distributors and retailers can receive and eventually publish sustainability data along with all other self-service ready product data.

Bookmark and Share

Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share

Multilingual? Mais oui! Natürlich.

Is that piece of data wrong or right? This may very well be a question about in what language we are talking about.

In an earlier double post on this blog I had a small quiz about the name of the Pope in the Catholic church. The point was that all possible answers were right as explained in post When Bad Data Quality isn’t Bad Data. The thing is that the Pope over the wold has local variants over the English name Francis. François in French, Franziskus in German, Francesco in Italian, Francisco in Spanish Franciszek in Polish, Frans in Danish and Norwegian and so on.

In today’s globalized, or should I say globalised, world, it is important that our data can be represented in different languages and that the systems we use to handle the data is built for that. The user interface may be in a certain flavor/flavour of English only, but the data model must cater for storing and presenting data in multiple languages and even variants of languages as English in its many forms. Add to that the capability of handling other characters than Latin in other script systems than alphabets as examined in the post called Script Systems.

This challenge is very close to me right when we are building a service for sharing product information in business ecosystems. So will the Product Data Lake be multilingual? Mais oui! Natürlich. Jo da.

PDL Example

PS: The Product Data Lake will actually help with collecting product information in multiple languages through the supply chains of product manufacturers, distributors, retailers and end users.

Bookmark and Share