Liliendahl on Data Quality

Ecosystems are The Future of Digital and MDM

22nd April 2017Henrik Gabs LiliendahlLeave a comment

A recent blog post by Dan Bieler of Forrester ponders that you should Power Your Digital Ecosystems with Business Platforms.

In his post, Dan Bieler explains that such business platforms support:

· The infrastructure that connect ecosystem participants. Business platforms help organizations transform from local and linear ways of doing business toward virtual and exponential operations.

· A single source of truth for ecosystem participants. Business platforms become a single source of truth for ecosystems by providing all ecosystem participants with access to the same data.

· Business model and process transformation across industries. Platforms support agile reconfiguration of business models and processes through information exchange inside and between ecosystems.

A single source of truth (or trust) for ecosystem participants is something that rings a bell for every Master Data Management (MDM) practitioner. The news is that the single source will not be a single source within a given enterprise, but a single source that encompasses the business ecosystem of trading partners.

Gartner Digital Platforms.png

Gartner, the other analyst firm, has also recently been advocating about digital platforms where the ecosystem type is the top right one. As stated by Gartner: Ecosystems are the future of digital.

I certainly agree. This is why all of you should get involved at Master Data Share.

Multi-Domain MDM and PIM, Party and Product

20th April 2017Henrik Gabs LiliendahlLeave a comment

Multi-Domain Master Data Management (MDM) and Product Information Management (PIM) are two interrelated disciplines within information management.

While we may see Product Information Management as the ancestor or sister to Product Master Data Management, we will in my eyes gain much more from Product Information Management if we treat this discipline in conjunction with Multi-Domain Master Data Management.

Party and product are the most common handled domains in MDM. I see their intersections as shown in the figure below:

Multi-Side MDM

Your company is not an island. You are part of a business ecosystem, where you may be:

Upstream as the maker of goods and services. For that you need to buy raw materials and indirect goods from the parties being your vendors. In a data driven world you also to need to receive product information for these items. You need to sell your finished products to the midstream and downstream parties being your B2B customers. For that you need to provide product information to those parties.
Midstream as a distributor (wholesaler) of products. You need to receive product information from upstream parties being your vendors, perhaps enrich and adapt the product information and provide this information to the parties being your downstream B2B customers.
Downstream as a retailer or large end user of product information. You need to receive product information from upstream parties being your vendors and enrich and adapt the product information so you will be the preferred seller to the parties being your B2B customers and/or B2C customers.

Knowledge about who the parties being your vendors and/or customers are and how they see product information, is essential to how you must handle product information. How you handle product information is essential to your trading partners.

You can apply party and product interaction for business ecosystems as explained in the post Party and Product: The Core Entities in Most Data Models.

3 Old and 3 New Multi-Domain MDM Relationship Types

17th April 201717th May 2018Henrik Gabs Liliendahl4 Comments

Master Data Management (MDM) has traditionally been mostly about party master data management (including not at least customer master data management) and product master data management. Location master data management has been the third domain and then asset master data management is seen as the fourth – or forgotten – domain.

With the rise of Internet of Things (IoT), asset – seen as a thing – is seriously entering the MDM world. In buzzword language, these things are smart devices that produces big data we can use to gain much more insight about parties (in customer roles), products, locations and the things themselves.

In the old MDM world with party, product and location we had 3 types of relationships between entities in these domains. With the inclusion of asset/thing we have 3 more exiting relationship types.

Multi-Domain MDM Relations

The Old MDM World

1: Handling the relationship between a party at its location(s) is one of the core capabilities of a proper party MDM solution. The good old customer table is just not good enough as explained in the post A Place in Time.

2: Managing the relationship between parties and products is essential in supplier master data management and tracking the relationship between customers and products is a common use case as exemplified in the post Customer Product Matrix Management.

3: Some products are related to a location as told in the post Product Placement.

The New MDM World

4: We need to be aware of who owns, operates, maintains and have other party roles with any smart device being a part of the Internet of Things.

5: In order to make sense of the big data coming from fixed or moving smart devices we need to know the location context.

6: Further, we must include the product information of the product model for the smart devices.

Expanding to Business Ecosystems

In my eyes, it is hard to handle the 3 old relationship types separately within a given enterprise. When including things and the 3 new relationship types, expanding master data management to the business ecosystems you have with trading partners will be imperative as elaborated in the post Data Management Platforms for Business Ecosystems.

How MDM, PIM and DAM Stick Together

11th April 201720th August 2017Henrik Gabs Liliendahl1 Comment

When working with product data I usually put the data into this five level model:

Five levels

The model is explained in the post Five Product Data Levels.

A recent post by Simon Walker of Gartner, the analyst firm, outlined the possible system landscape. The post is called Creating the 360-Degree view of Product.

MCM-v1.0-284x300

Simon defines these three kind of platforms for managing a 360 degree product data view:

MDM of product master data solutions help manage structured product data for enterprise operational and analytical use cases
PIM solutions help extend structured product data through the addition of rich product content for sales and marketing use cases
DAM solutions help users create and manage digital multimedia files for enterprise, sales and marketing use cases

These two models fit quite well together:

MDM PIM DAM.png

And oh, when it comes to creating a business ecosystem digital platform for exchanging product data with trading partners, the best model looks like this:

MDM PIM DAM PDL

Learn more about Product Data Lake here.

5 Data Management Mistakes to Avoid during Data Integration Projects

6th April 2017Henrik Gabs Liliendahl3 Comments

mistake-876597_1920

I am very pleased to welcome today’s guest blogger. Canada based Maira Bay de Souza of Product Data Lake Technologies shares her view on data integration and the mistakes to avoid doing that:

Throughout my 5 years of working with Data Integration, Data Migration and Data Architecture, I’ve noticed some common (but sometimes serious) mistakes related to Data Management and Software Quality Management. I hope that by reading about them you will be able to avoid them in your future Data Integration projects.

1 Ignoring Data Architecture

Defining the Data Architecture in a Data Integration project is the equivalent of defining the Requirements in a normal (non-data-oriented) software project. A normal software application is (most of the times) defined by its actions and interactions with the user. That’s why, in the first phase of software development (the Requirements Phase), one of the key steps is creating Use-Cases (or User Stories). On the other hand, a Data Integration application is defined by its operations on datasets. Interacting with data structures is at the core of its functionality. Therefore, we need to have a clear picture of what these data structures look like in order to define what operations we will do on them.

It is widely accepted in normal software development that having well-defined requirements is key to success. The common saying “If you don’t know where you’re going, any road will get you there” also applies for Data Integration applications. When ETL developers don’t have a clear definition of the Data Architecture they’re working with, they will inevitably make assumptions. Those assumptions might not always be the same as the ones you, or worse, your customer made.

(see here and here for more examples on the consequences of not finding software bugs early in the process due to by badly defined requirements)

Simple but detailed questions like “can this field be null or not?” need to be answered. If the wrong decision is made, it can have serious consequences. Most senior Java programmers like me are well aware of the infamous “Null Pointer Exception“. If you feed a null value to a variable that doesn’t accept null (but you don’t know that that’s the case because you’ve never seen any architecture specification), you will get that error message. Because it is a vague message, it can be time-consuming to debug and find the root cause (especially for junior programmers): you have to open your ETL in the IDE, go to the code view, find the line of code that is causing the problem (sometimes you might even have to run the ETL yourself), then find where that variable is located in the design view of your IDE, add a fix there, test it to make sure it’s working and then deploy it in production again. That also means that normally, this error causes an ETL application to stop functioning altogether (unless there is some sort of error handling). Depending on your domain that can have serious, life-threatening consequences (for example, healthcare or aviation), or lead to major financial losses (for example, e-commerce).

Knowing the format, boundaries, constraints, relationships and other information about your data is imperative to developing a high quality Data Integration application. Taking the time to define the Data Architecture will prevent a lot of problems down the road.

2 Doing Shallow Data Profiling

Data profiling is another key element to developing good Data Integration applications.

When doing data profiling, most ETL developers look at the current dataset in front of them, and develop the ETL to clean and process the data in that dataset. But unfortunately that is not enough. It is important to also think about how the dataset might change over time.

For example, let’s say we find a customer in our dataset with the postal code in the city field. We then add an instruction in the ETL for when we find that specific customer’s data, to extract the postal code from the city field and put it in the postal code field. That works well for the current dataset. But what if next time we run the ETL another customer has the same problem? (it could be because the postal code field only accepts numbers and now we are starting to have Canadian customers, who have numbers and letters in the postal code, so the user started putting the postal code in the city field)

Not thinking about future datasets means your ETL will only work for the current dataset. However, we all know that data can change over time (as seen in the example above) – and if it is inputted by the user, it can change unpredictably. If you don’t want to be making updates to your ETL every week or month, you need to make it flexible enough to handle changes in the dataset. You should use data profiling not only to analise current data, but also to deduce how it might change over time.

Doing deep data profiling in the beginning of your project means you will spend less time making updates to the Data Cleaning portion of your ETL in the future.

3 Ignoring Data Governance

This point goes hand-in-hand with my last one.

A good software quality professional will always think about the “what if” situations when designing their tests (as opposed to writing tests just to “make sure it works”). In my 9 years of software testing experience, I can’t tell you how many times I asked a requirements analyst “what if the user does/enters [insert strange combination of actions/inputs here]?” and the answer was almost always “the user will never do that“. But the reality is that users are unpredictable, and there have been several times when the user did what they “would never do” with the applications I’ve tested.

The same applies to data being inputted into an ETL. Thinking that “data will never come this way” is similar to saying “the user will never do that“. It’s better to be prepared for unexpected changes in the dataset instead of leaving it to be fixed later on, when the problem has already spread across several different systems and data stores. For example, it’s better to add validation steps to make sure that a postal code is in the right format, instead of making no validation and later finding provinces in the postal code field. Depending on your data structures, how dirty the data is and how widespread the problem is, the cost to clean it can be prohibitive.

This also relates to my first point: a well-defined Data Architecture is the starting point to implementing Data Governance controls.

When designing a high quality Data Integration application, it’s important to think of what might go wrong, and imagine how data (especially if it’s inputted by a human) might be completely different than you expect. As demonstrated in the example above, designing a robust ETL can save hours of expensive manual data cleaning in the future.

4 Confusing Agile with Code-And-Fix

A classic mistake in startups and small software companies (especially those ran by people without a comprehensive education or background in Software Engineering) is rushing into coding and leaving design and documentation behind. That’s why the US Military and CMU created the CMMI: to measure how (dis)organized a software company is, and help them move from amateur to professional software development. However, the compliance requirements for a high maturity organization are impractical for small teams. So things like XP, Agile, Scrum, Lean, etc have been used to make small software teams more organized without getting slowed down by compliance paperwork.

Those techniques, along with iterative development, proved to be great for startups and innovative projects due to their flexibility. However, they can also be a slippery slope, especially if managers don’t understand the importance of things like design and documentation. When the deadlines are hanging over a team’s head, the tendency is always to jump into coding and leave everything else behind. With time, managers start confusing agile and iterative development with code-and-fix.

Throughout my 16 years of experience in the Software Industry, I have been in teams where Agile development worked very well. But I have also been in teams where it didn’t work well at all – because it was code-and-fix disguised as Agile. Doing things efficiently is not the same as skipping steps.

Unfortunately, in my experience this is no different in ETL development. Because it is such a new and unpopular discipline (as opposed to, for example, web development), there aren’t a lot of software engineering tools and techniques around it. ETL design patterns are still in their infancy, still being researched and perfected in the academic world. So the slippery slope from Agile to code-and-fix is even more tempting.

What is the solution then? My recommendation is to use the proven, existing software engineering tools and techniques (like design patterns, UML, etc) and adapt them to ETL development. The key here is to do something. The fact that there is a gap in the industry’s body of knowledge is no excuse for skipping requirements, design, or testing, and jumping into “code-and-fix disguised as Agile“. Experiment, adapt and find out which tools, methodologies and techniques (normally used in other types of software development) will work for your ETL projects and teams.

5 Not Paying Down Your Technical Debt

The idea of postponing parts of your to-do list until later because you only have time to complete a portion of them now is not new. But unfortunately, with the popularization of agile methodologies and incremental development, Technical Debt has become an easy way out of running behind schedule or budget (and masking the root cause of the problem which was an unrealistic estimate).

As you might have guessed, I am not the world’s biggest fan of Technical Debt. But I understand that there are time and money constraints in every project. And even the best estimates can sometimes be very far from reality – especially when you’re dealing with a technology that is new for your team. So I am ok with Technical Debt, when it makes sense.

However, some managers seem to think that technical debt is a magic box where we can place all our complex bugs, and somehow they will get less complex with time. Unfortunately, in my experience, what happens is the exact opposite: the longer you owe technical debt (and the more you keep adding to it), the more complex and patchy the application becomes. If you keep developing on top of – or even around – an application that has a complex flaw, it is very likely that you will only increase the complexity of the problem. Even worse, if you keep adding other complex flaws on top of – or again, even around – it, the application becomes exponentially complex. Your developers will want to run away each time they need to maintain it. Pretty soon you end up with a piece of software that looks more like a Frankenstein monster than a clean, cohesive, elegant solution to a real-world problem. It is then only a matter of time (usually very short time) before it stops working altogether and you have no choice but to redesign it from scratch.

This (unfortunately) frequent scenario in software development is already a nightmare in regular (non-data-oriented) software applications. But when you are dealing with Data Integration applications, the impact of dirty data or ever-changing data (especially if it’s inputted by a human), combined with the other 4 Data Management mistakes I mentioned above, can quickly escalate this scenario into a catastrophe of epic proportions.

So how do you prevent that from happening? First of all, you need to have a plan for when you will pay your technical debt (especially if it is a complex bug). The more complex the required change or bug is, the sooner it should be dealt with. If it impacts a lot of other modules in your application or ecosystem, it is also important to pay it off sooner rather than later. Secondly, you need to understand why you had to go into technical debt, so that you can prevent it from happening again. For example, if you had to postpone features because you didn’t get to them, then you need to look at why that happened. Did you under-estimate another feature’s complexity? Did you fail to account for unknown unknowns in your estimate? Did sales or your superior impose an unrealistic estimate on your team? The key is to stop the problem on its tracks and make sure it doesn’t happen again. Technical Debt can be helpful at times, but you need to manage it wisely.

I hope you learned something from this list, and will try to avoid these 5 Data Management and Software Quality Management mistakes on your next projects. If you need help with Data Management or Software Quality Management, please contact me for a free 15-min consultation.

Maira holds a Bsc in Computer Science, 2 software quality certifications and over 16 years of experience in the Software Industry. Her open-mindedness and adaptability have allowed her to thrive in a multidisciplinary career that includes Software Development, Quality Assurance and Project Management. She has taken senior and consultant roles at Fortune 20 companies (IBM and HP), as well as medium and small businesses. She has spent the last 5 years helping clients manage and develop software for Data Migration, Data Integration, Data Quality and Data Consistency. She is a Product Data Lake Ambassador & Technology Integrator through her startup Product Data Lake Technologies.

Product Data Lake Reaches Multi-Lingual Milestone Number Three

3rd April 2017Henrik Gabs LiliendahlLeave a comment

Multi-lingual capabilities is one of the core capabilities in the product information sharing service Product Data Lake.

During our market introduction, we have had three milestones:

Product information can be exchanged in multiple languages – or rather cultures, being a combination of a language and a country. Product Data Lake was born with this core capability back in September 2016.
Product information can be defined in multiple languages. Our February 2017 release introduced metadata in multiple cultures.
Product information can be handled in multiple languages. Today we have released our multi-lingual user interface. The idea behind Product Data Lake is actually not, that you should spend much time in the user interface. You only need to set up the automation of product information exchange. Now, you have the possibility to do that in your preferred language.

Multi-Lingual DK

However, this is not a fait accompli. Mañana, there will be more feinschmeckerei. Next multi-lingual feature will be access to classification and metadata in many languages from various general and industry standards for product information starting with the ETIM standard for technical products.

You can learn more about Product Data Lake here.

From GDPR to GDPW and Beyond

31st March 20171st April 2017Henrik Gabs Liliendahl1 Comment

While the European data management community is fully occupied with the General Data Protection Regulation (GDPR) the United States president realDonaldTrump and his family is preparing something bigger being the Great Data Protection Wall (GDPW).

Great_Wall An upcoming executive order will enforce a tall and beautiful wall around each data centre. It’s true.

Hereafter, the only way to bring data pass such a wall will be by accessing data on a microwave oven (aka wiretapping).

Some other core concepts will be rules for handling alternative facts as well as how to apply the term fake news to anything, which does not fit into your scheme.

A White House spokesperson is spicing it up like this: “We will repeal and replace the current way of taking care of data with something else”.

China and Russia is expected to come up with similar GDPx initiatives. As China already has a Great Wall, their implementation will be known as GDPH (General Data Piling via Huawai devices) and Russia is routinely already involved in the US implementation.

The Real Reason Why Your Business Needs a PIM Tool

27th March 201728th March 2017Henrik Gabs Liliendahl1 Comment

Today’s guest blog post is the second one from Dan O’Connor, a United States based product data taxonomy guru. Here are Dan’s thoughts on why you should have a Product Information Management (PIM) tool:

Over the past year I have moved from a position of watching a Product Information Management tool, or PIM, being installed, to working for a PIM vendor, to working through the process of installing a PIM tool from the client side. In the same way that I justified buying a sports car to my wife based on the utilitarian value of having 350 horsepower at my disposal, I’ve seen many different justifications for installing a PIM tool. From “Micro Moments” to “collaborative data collection” and “syndication”, terms are tossed around that attempt to add to the value of a PIM installation.

The simple truth is there is only one reason you need a PIM tool. Every justification is solving a symptom of a data problem in a business, not the core problem. Every good management executive knowns that solving symptoms is a rabbit hole that can cost time and money at an incredible rate, so understanding what the core problem that requires a PIM in your business is vital to your business growth.

Controlling your Messaging

That core problem your business needs to solve is product messaging. Simply put, without a central hub for your data your business has a lack of control over how your product messaging is spread both internally and externally. If you are still working in spread sheets or collecting data multiple times for a single product for different channels you have lost most of your product messaging structure.

PIM is a tool that solves that problem, and the symptomology that comes with it. Does your business spend too much time assembling data to meet downstream partner needs? You have a product messaging problem. Is your business’ ability to ingest data limited by spread sheets transferred over network folders or email? You have a product messaging problem.

All the benefits of PIM can be summed up into a simple statement: If you want to be in control of your product brand and your product data quality your business needs a PIM tool. Do you want to reduce product data setup costs? You need a central location for all your product messaging to do so. Does your business have product data quality issues that occur due to poor adherence to best practices? Poor data quality affects your product messaging, and can be solved by a PIM tool. Is your business spending too much time chasing down emails with product specs and spread sheets full of setup data? These bad workflow practices affect your ability to provide a consistent message downstream to your business partners, whether your business is B2B or B2C. They are a symptom of your poor product messaging control.

The True PIM ROI Story

The central premise of a PIM tool is to standardize and normalize your product data collection and setup workflows and processes. If your business looks at a PIM tool only for this metric your vision for PIM is limited. Syndication, the distribution of data to consuming internal and external systems, is another huge benefit to PIM. However, if the product messaging your PIM system is sending or receiving is not well controlled within your PIM your vision is incomplete. There is not a single benefit to PIM that you cannot add the terms “with a consistent approach to your product messaging” to the end of.

Why is product messaging so important? In previous blogs I have demonstrated how failures in product messaging lead to odd product experiences, especially when you look at the messaging across platforms. If your web store shows a length for a product and your channel partner shows a different length you have a product messaging problem. If that product data came from a central source that issue would not exist. It might be as simple as the downstream partner swapped length for depth and there isn’t a true data issue, but to your customers there is an inconsistent product data message.

Extrapolating this out to something as simple as web descriptions actually validates this business case. If you provide a basic web description for a product based on an individual manually typing in marketing copy into a web portal you have lost control of your product messaging. That same person may be responsible for typing that web description in 4 different places, and without a central repository for that data the chances that those 4 messages will complement each other is slim. Add to that the fact that many major retailers edit web descriptions to conform to their standards after your business has completed product setup and you are less in control of your product messaging than you imagined.

Having a PIM tool solves this. You have a single source for web descriptions that you know will be represented in a singular repeatable fashion downstream. You can map your dimension attributes to your downstream channel partner dimensions, ensuring that the appropriate data appears in each field. You can customize web descriptions in a controlled and normalized environment so that you have more control over how those descriptions are customized by your channel partners.

The Importance of Product Messaging

Product messaging is your voice to your customers. As B2B ecommerce follows the path blazed by B2C it has become more important to have a consistent and controlled message for your products to all your customers. Spread sheets are not capable of that task, and email is not a mechanism for maintaining product data quality. Automated systems with proper workflows and data quality checks are paramount to ensuring the voice you expect your customers to hear is your business’ voice.

Reducing catalog printing costs, syndication of product data to channel partners, and reducing product setup headcount are valid reasons to install a PIM tool. However, they all should be part of a greater goal to control your voice to your customers. Those benefits are symptoms of a need in your business to have a unifying voice, and not including product messaging control as the overriding goal of your PIM installation is a strategic error.

In having performed many PIM installations here is the impact of not seeing product messaging control as the overarching goal. A company I worked with went through the process of installing a PIM tool, and we reached the point of remediating their existing product data to fit the new model. This company, who had invested heavily in this project, decided they did not want to perform any data remediation. They simply added back into their PIM tool every attribute that had existed in their old system. There was vision to improve the data they were displaying to their customers: They simply wanted to speed up product setup.

That business has spent the last 6 months undoing the benefits on controlled product messaging. It was less costly to them in the short term to simply replicate their existing data issues in a new system. Their old product data was unwieldly, hyper-specific to channel, and involved writing product titles and web descriptions manually for each channel. There is no common theme to the product messaging they are creating, and their ability to reduce product setup costs has been hampered by these decisions.

In Summary: Product Data is Your Product Messaging

Micro moments and product experience management is just fancy terminology for what is simply an understanding of the importance of your product data. If your vision is to control your product messaging, you have to start with your product data. A PIM tool is the only functional approach that meets that goal, but has to be looked at as a foundational piece of that product messaging. Attempting to reduce product setup costs or speed product data transfer is a valid business goal and a justification for a PIM project, but the true visionary approach has to include an overall product messaging approach. Otherwise, your business is limiting the return on investment it will achieve from any attempt to solve your product data setup and distribution problems.

Dan O’Connor is a Product Taxonomy, Product Information Management (PIM), and Product Data Consultant and an avid blogger on taxonomy topics. He has developed taxonomies for major retails as well as manufacturers and distributors, and assists with the development of product data models for large and small companies. See his LinkedIn bio for more information.

IoT and Multi-Domain MDM

23rd March 2017Henrik Gabs LiliendahlLeave a comment

The Internet-of-Things (IoT) is a hot topic and many Master Data Management (MDM) practitioners as well as tool and service vendors are exploring what the rise of the Internet-of-Things and the related Industry 4.0 themes will mean for Master Data Management in the years to come.

global In my eyes, connecting these smart devices and exploiting the big data you can pull (or being pushed) from them will require a lot for all Master Data Management domains. Some main considerations will be:

Party Master Data Management is needed to know about the many roles you can apply to a given device. Who is the manufacturer, vendor, supplier, owner, maintainer and collector of data? Privacy and security matters on that basis will have to be taken very seriously.
Location Master Data Management is necessary at a much deeper and precise level than what we are used to when dealing with postal addresses. You will need to know a home location with a timespan and you will need to confirm and, for moving devices, supplement with observed locations with a timestamp.
Product and Asset Master Data Management is imperative in order to know about the product model of the smart device and individual characteristics of the given device.

It is also interesting to consider, if you will be able to manage this connectivity within a MDM platform (even multidomain and end-to-end) behind your corporate walls. I do not think so as told in the post The Intersections of 360 Degree MDM.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph