Opportunities on The Data Quality Tool Market

The latest Information Difference Data Quality Landscape is out. This is a generic ranking of major data quality tools on the market.

You can see the previous data quality landscape in the post Congrats to Datactics for Having the Happiest DQM Customers.

There are not any significant changes in the relative positioning of the vendors. Only thing is that Syncsort has been renamed to Precisely.

As stated in the report, much of the data quality industry is focused on name and address validation. However, there are many opportunities for data quality vendors to spread their wings and better tackle problems in other data domains, such as product, asset and inventory data.

One explanation of why this is not happening is probably the interwoven structure of the joint Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM) markets and disciplines. For example, a predominant data quality issue as completeness of product information is addressed in PIM solutions and even better in Product Data Syndication (PDS) solutions.

Here, there are some opportunities for pure play vendors within each speciality to work together as well as for the larger vendors for offering both a true integrated overall solution as well as contextual solutions for each issue with a reasonable cost/benefit ratio.

Get Your Free Bespoke MDM / PIM / DQM Solution List

Many analyst market reports in the Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM) space have a generic ranking of the vendors.

The trouble with generic ranking is that one size does not fit all.

On the sister site to this blog, The Disruptive MDM / PIM / DQM List, there is no generic ranking. Instead there is a service where you can provide your organization’s context, scope and requirements and within 2 to 48 hours get Your Solution List.

The selection model includes these elements:

  • Your context in terms of geographical reach and industry sector.
  • Your scope in terms of data domains to be covered and organizational scale stretching from specific business units over enterprise-wide to business ecosystem wide (interenterprise).
  • Your specific requirements covering the main capabilities that differentiate the vendors on market.
  • Vendor capabilities.
  • A model that combines those facts into a rectangle where you can choose to:
    • Go ahead with a Proof of Concept with the best fit vendor
    • Make an RFP with the best fit vendors in a shortlist
    • Examine a longlist of best fit vendors and other alternatives like combining more than one solution.
The vendors included are both the major players on the market as well as emerging solutions with innovative offerings.

You can get your free solution list here.

Privacy and Confidentiality Concerns in Interenterprise Data Sharing

Exchange of data between enterprises – aka interenterprise data sharing – is becoming a hot topic in the era of digital transformation. As told in the post Data Quality and Interenterprise Data Sharing this approach is the cost-effective way to ensure data quality for the fast-increasing amount of data every organization has to manage when introducing new digital services.

McKinsey Digital recently elaborated on this theme in an article with the title Harnessing the power of external data. As stated in the article: “Organizations that stay abreast of the expanding external-data ecosystem and successfully integrate a broad spectrum of external data into their operations can outperform other companies by unlocking improvements in growth, productivity, and risk management.”

The arguments against interenterprise data sharing I hear most often revolves around privacy and confidentiality concerns.

Let us have a look at this challenge within the two most common master data domains: Party data and product data.

Party Data

The firm CDQ talk about the case for sharing party data in the post Data Sharing: A Brief History of a Crazy Idea. As said in here: The pain can be bigger than the concern.

Privacy through the enforced data privacy and data protection regulations as GDPR must (and should) be adhered to and sets a very strict limit for exchanging Personal Identifiable Information only leaving room for the legitimate cases of data portability.

However, information about organizations can be shared not only as exploitation of public third-party sources as business directories but also as data pools between like-minded organizations. Here you must think about if your typos in company names, addresses and more really are that confidential.

Product Data

The case for exchanging product data is explained in the post The Role of Product Data Syndication in Interenterprise MDM.

Though the vast amount of product data is meant to become public the concerns about confidentiality also exist with product data. Trading prices is an obvious area. The timing of releasing product data is another concern.

In the Product Data Lake syndication service I work with there are measures to ensure the right level of confidentiality. This includes encryption and controlling with whom you share what and when you do it.

Data governance plays a crucial role in orchestrating interenterprise data sharing with the right approach to data privacy and confidentiality. How this is done in for example product data syndication is explained in the page about Product Data Lake Documentation and Data Governance.

10 Kinds of Product Information Needed Within Customization and Personalization

When working with Product Information Management (PIM) I usually divide the different kinds of information to be managed into some levels and groups as elaborated in the post 5 Product Data Levels to Consider.

The 10 groups of data in this 5-level scheme are all relevant for personalization of product data in the following way:

  1. A (prospective) customer may have some preferred brands which are recognized either by collection of preferences or identified through previous behaviour.
  2. The shopping context may dictate that some product codes like GTIN/UPC/EAN and industry specific product codes are relevant as part of the product presentation or if these codes will only be noise.
  3. The shopping context may guide the use of variant product descriptions as touched in the post What’s in a Product Name?
  4. The shopping context may guide the use of various product image styles.
  5. The shopping context may guide the range of product features (attributes) to be presented typically either on a primary product presentation screen and on a detailed specification screen.
  6. The shopping context and occasion may decide the additional product description assets (as certificates, line drawings, installation guides and more) to be presented.
  7. The shopping occasion may decide the product story to be told.
  8. The shopping occasion may decide the supplementary products as accessories and spare parts to be presented along with the product in focus.
  9. The shopping occasion may decide the complementary products as x-sell and up-sell candidates to be presented along with the product in focus.
  10. The shopping occasion may decide the advanced digital assets as brochures and videos to be presented.   

The data collection track that can enable customization and personalization of product information is examined in the post The Roles of MDM in The Data Supply Chain.

Data Quality and Interenterprise Data Sharing

When working with data quality improvement there are three kinds of data to consider:

First-party data is the data that is born and managed internally within the enterprise. This data has traditionally been in focus of data quality methodologies and tools with the aim of ensuring that data is fit for the purpose of use and correctly reflects the real-world entity that the data is describing.  

Third-party data is data sourced from external providers who offers a set of data that can be utilized by many enterprises. Examples a location directories, business directories as the Dun & Bradtstreet Worldbase and public national directories and product data pools as for example the Global Data Synchronization Network (GDSN).

Enriching first-party data with third-party is a mean to ensure namely better data completeness, better data consistency, and better data uniqueness.

Second-party data is data sourced directly from a business partner. Examples are supplier self-registration, customer self-registration and inbound product data syndication. Exchange of this data is also called interenterprise data sharing.

The advantage of using second-party in a data quality perspective is that you are closer to the source, which all things equal will mean that data better and more accurately reflects the real-world entity that the data is describing.

In addition to that, you will also, compared to third-party data, have the opportunity to operate with data that exactly fits your operating model and make you unique compared to your competitors.

Finally, second-party data obtained through interenterprise data sharing, will reduce the costs of capturing data compared to first-party data, where else the ever-increasing demand for more elaborate high-quality data in the age of digital transformation will overwhelm your organization.    

The Balancing Act

Getting the most optimal data quality with the least effort is about balancing the use of internal and external data, where you can exploit interenterprise data sharing through combining second-party and third-party data in the way that makes most sense for your organization.

As always, I am ready to discus your challenge. You can book a short online session for that here.

The Roles of MDM in The Data Supply Chain

Master Data Management (MDM) and the overlapping Product Information Management (PIM) discipline is the centre of which the end-to-end data supply chain revolves around in your enterprise.

The main processes are:

Onboard Customer Data

It starts and ends with the King: The Customer. Your organization will probably have several touchpoints where customer data is captured. MDM was born out of the Customer Data Integration (CDI) discipline and a main reason of being for MDM is still to be a place where all customer data is gathered as exemplified in the post Direct Customers and Indirect Customers.

Onboard Vendor Data

Every organization has vendors/suppliers who delivers direct and indirect products as office supplies, Maintenance, Repair and Operation (MRO) parts, raw materials, packing materials, resell products and services as well. As told in a post on this blog, you have to Know Your Supplier.

Enrich Party Data

There are good options for not having to collect all data about your customers and vendors yourself, as there are 3rd party sources available for enriching these data preferable as close to capture as possible. This topic was examined in the post Third-Party Data and MDM.

Onboard Product Data

While a small portion of product data for a small portion of product groups can be obtained via product data pools, the predominant way is to have product data coming in as second party data from each vendor/supplier. This process is elaborated in the post 4 Supplier Product Data Onboarding Scenarios.

Transform Product Data

As your organization probably do not use the same standard, taxonomy, and structure for product data as all your suppliers, you have to transform the data into your standard, taxonomy, and structure. You may do the onboarding and transformation in one go as pondered in the post The Role of Product Data Syndication in Interenterprise MDM.

Consolidate Product Data

If your organization produce products or you combine external and internal products and services in other ways you must consolidate the data describing your finished products and services.

Enrich Product Data

Besides the hard facts about the products and services you sell you must also apply competitive descriptions of the products and services that makes you stand out from the crowd and ensure that the customer will buy from you when looking for products and services for a given purpose of use.

Customize Product Data

Product data will optimally have to be tailored for a given geography, market and/or channel. This includes language and culture considerations and adhering to relevant regulations.

Personalize Product Data

Personalization is one step deeper than market and channel customization. Here you at point-of-sale seek to deliver the right Customer Experience (CX) by exercising Product eXperience Management (PXM). Here you combine customer data and product data. This quest was touched in the post What is Contextual MDM?

The Most Annoying Way of Presenting Data

Polls are popular on LinkedIn and I have been a sinner of making a few too recently.

One was about what way of presenting data (data format) that is the most annoying.

There were the 4 mentioned above to choose from.

The MM/DD/YYYY date format is in use practically only in the United States. In the rest of the world either the DD/MM/YYYY format or the ISO recommended YYYY-MM-DD format is the chosen one. The data quality challenge appears when you see a date as 03/02/2021 in an international context, because this can be either March, 2 or 3rd February.  

The 12-hour clock with AM and PM postfix, is more commonly in use around the world. But obviously the 12-hour clock is not as well thought as the 24-hour clock. We need some digital transformation here.

Imperial units of measure like inch, foot, yard, pound, and more is far less logical and structured compared to the metric system. Only 3 countries around the world – United States, Myanmar and Liberia has not adopted the metric system. And then there is United Kingdom, who has adopted the metric system in theory, but not in practice.

The Fahrenheit temperature scale is something only used in the United States opposite to Celsius (centigrade) used anywhere else. When someone writes that it is 30 degrees outside that could be quite cold or rather hot if there is no unit of measure applied.

Another example of international trouble mentioned in the comments to the poll is decimal point. In English writing you will use a dot for the decimal point, in many other cultures you use a comma as decimal point.

Most of the annoyance are handled by that mature software have settings where you can set your preferences. The data quality issues arise when these data are part of a text including when software must convert a text into a number, date or time.

If you spot some grey colour (or is it color) in my hair, I blame varying data formats in CSV files, SQL statements, emails and more.

Direct Customers and Indirect Customers

When working with Master Data Management (MDM) for the customer master data domain one of the core aspects to be aware of is the union, intersection and difference between direct customers and indirect customers.

Direct customers are basically those customers that your organization invoice.

Indirect customers are those customers that buy your organizations products and services from a reseller (or marketplace). In that case the reseller is a direct customer to your organization.

The stretch from your organization via a reseller organization to a consumer is referred to as Business-to-Business-to-Consumer (B2B2C). This topic is told about in the post B2B2C in Data Management. If the end user of the product or service is another organization the stretch is referred to as Business-to-Business-to-Business (B2B2B).

The short stretch from your organization to a consumer is referred to as Direct-to-Consumer (D2C).

It does happen, that someone is both a direct customer and an indirect customer either over time and/or over various business scenarios.

IT Systems Involved

If we look at the typical IT systems involved here direct customers are managed in an ERP system where the invoicing takes place as part of the order-to-cash (O2C) main business process. Products and services sold through resellers are part of an order-to-cash process where the reseller place an order to you when their stock is low and pays you according to the contract between them and you. In ERP lingo, someone who pays you has an account receivable.

Typically, you will also handle the relationship and engagement with a direct customer in a CRM system. However, there are often direct customers where the relationship is purely administrative with no one from the salesforce involved. Therefore, these kinds of customers are sometimes not in the CRM system. They are purely an account receivable.

More and more organizations want to have a relationship with and engage with the end customer. Therefore, these indirect customers are managed in the CRM system as well typically where the salesforce is involved and increasingly also where digital sales services are applied. However, most often there will be some indirect customers not encompassed by the CRM system.

The Role of Master Data Management (MDM) in the context of customer master data is to be the single source for all customer data. So, MDM holds the union of customer master data from the ERP world and the CRM world.

An MDM platform also has the capability of encompassing other sources both internal ones and external ones. When utilized optimally, an MDM platform will be able to paint a picture of the entire space of where your direct customers and indirect customers are.

Business Opportunities

Having this picture is of course only interesting if you can use it to obtain business value. Some of the opportunities I have stumbled upon are:

  • More targeted product and service development by having more insight into the whole costumer space leading to growth advancements
  • Optimized orchestration of supply chain activities by having complete insight into the whole costumer space and thereby fostering cost savings
  • Improved ability to analyse the consequences of market change and changes in the economic environment in geographies and industries covered leading to better risk management.

Which business opportunities do you see arise for your organization by having a complete overview of the union, intersection and difference between your direct customers and indirect consumers?

Digital Twin and MDM

A digital twin is in short digital data representing a physical object.

Master Data Management (MDM) has since the discipline emerged in the 00’s been about managing data representing some very common physical objects like persons, products and locations though with a layer of context in between:

  • Persons are traditionally described with data aimed for a given role like a person being a customer, patient, student, contact, employee, and many more specific roles.
  • Products are traditionally described as a product model with data that are the same for a product being mass produced.
  • Locations are typically described as a postal address and/or a given geocode.

With the rise of digitalization and Internet of Things (IoT) / Industry 4.0 the need for having a more real-world view of persons, a broader view of products, and more useful views of locations arise together with the need of similar digital twins for other object types.

The Enterprise Knowledge Graph tool provider Stardog has described this topic in the post Create your Digital Twin with an Enterprise Knowledge Graph.

As Knowledge Graph and (extended) MDM can coexist very well, the same objectives are true for MDM as well.

Some of the use cases I have stumbled on are:

  • Manage generic data about a person and belonging organizations as a digital twin encompassing all historic, current, and sought roles related to your organization. Data privacy must be adhered to here, however issues as opt-in and opt-out must also be handled across roles.
  • Manage specific data about each instance of a product that is a smart device, which is true for more and more products models. Such a digital twin is described in the post 3 Old and 3 New Multi-Domain MDM Relationship Types.
  • Manage complex data about a location as boundaries, placements in geographic hierarchies, location names and property descriptions as a digital twin.
  • Manage data about plants, machines, vehicles, warehouses, stores as digital twins using an MDM approach.     

Which digital twins have you stumbled upon where an MDM approach is useful?

Deduplication as Part of MDM

A core intersection between Data Quality Management (DQM) and Master Data Management (MDM) is deduplication. The process here will basically involve:

  • Match master data records across the enterprise application landscape, where these records describe the same real-world entity most frequently being a person, organization, product or asset.
  • Link the master data records in the best fit / achievable way, for example as a golden record.
  • Apply the master data records / golden record to a hierarchy.

Data Matching

The classic data matching quest is to identify data records that refer to the same person being an existing customer and/or prospective customer. The first solutions for doing that emerged more than 40 years ago. Since then the more difficult task of identifying the same organization being a customer, prospective customer, vendor/supplier or other business partner has been implemented while also solutions for identifying products as being the same have been deployed.

Besides using data matching to detect internal duplicates within an enterprise, data matching has also been used to match against external registries. Doing this serves as a mean to enrich internal records while this also helps in identifying internal duplicates.

Master Data Survivorship

When two or more data records have been confirmed as duplicates there are various ways to deal with the result.

In the registry MDM style, you will only store the IDs between the linked records so the linkage can be used for specific operational and analytic purposes in source and target applications.

Further, there are more advanced ways of using the linkage as described in the post Three Master Data Survivorship Approaches.

One relatively simple approach is to choose the best fit record as the survivor in the MDM hub and then keep the IDs of the MDM purged records as a link back to the sourced application records.

The probably most used approach is to form a golden record from the best fit data elements, store this compiled record in the MDM hub and keep the IDs of the linked records from the sourced applications.

A third way is to keep the sourced records in the MDM hub and on the fly compile a golden view for a given purpose.

Hierarchy Management

When you inspect records identified as a duplicate candidate, you will often have to decide if they describe the same real-world entity or if they describe two real-world entities belonging to the same hierarchy.

Instead of throwing away the latter result, this link can be stored in the MDM hub as well as a relation in a hierarchy (or graph) and thus support a broader range of operational and analytic purposes.

The main hierarchies in play here are described in the post Are These Familiar Hierarchies in Your MDM / PIM / DQM Solution?

Family consumer citizen

With persons in private roles a classic challenge is to distinguish between the individual person, a household with a shared economy and people who happen to live at the same postal address. The location hierarchy plays a role in solving this case. This quest includes having precise addresses when identifying units in large buildings and knowing the kind of building. The probability of two John Smith records being the same person differs if it is a single-family house address or the address of a nursing home.

Family company

Organizations can belong to a company family tree. A basic representation for example used in the Dun & Bradstreet Worldbase is having branches at a postal address. These branches belong a legal entity with a headquarter at a given postal address, where there may be other individual branches too. Each legal entity in an enterprise may have a national ultimate mother. In multinational enterprises, there is a global ultimate mother. Public organizations have similar often very complex trees.

Product hierachy

Products are also formed in hierarchies. The challenge is to identify if a given product record points to a certain level in the bottom part of a given product hierarchy. Products can have variants in size, colour and more. A product can be packed in different ways. The most prominent product identifier is the Global Trade Identification Number (GTIN) which occur in various representations as for example the Universal Product Code (UPC) popular in North America and European (now International) Article Number (EAN) popular in Europe. These identifiers are applied by each producer (and in some cases distributor) at the product packing variant level.

Solutions Available

When looking for a solution to support you in this conundrum the best fit for you may be a best-of-breed Data Quality Management (DQM) tool and/or a capable Master Data Management (MDM) platform.

This Disruptive MDM / PIM /DQM List has the most innovative candidates here.