Information Quality – Liliendahl on Data Quality

The Intersection of Data Observability, MDM and Data Quality

16th May 2024Henrik Gabs Liliendahl2 Comments

Data observability is a new discipline on the rise within data management. As with many new disciplines everything is not new, though. There are several capabilities that come with a data observability solution that have been known for decades within Master Data Management (MDM) and not at least Data Quality Management (DQM).

The brief reason of being for data observability is to prevent data issues at scale. Compared to MDM and DQM you will usually utilize a data observability solution more upstream and have more data sources in scope. The emphasis of data observability is to early and continuously identify data issues. MDM and DQM is geared towards resolving the issues.

Below is a short walkthrough of the common capabilities you can deploy as part of the triangle of data observability, MDM and data quality.

Data Matching

Implementing a data observability solution will usually not extend to data matching capabilities. These capabilities will still reside in the intersection of MDM and data quality.

Data Discovery

Data discovery has been an adjacent part of many MDM solutions as touched on in the post How Data Discovery Makes a Data Hub More Valuable.

You will probably find a better home for data discovery in a data observability solution as this is better deployed for multiple upstream data flows.

Data Profiling

In Data Quality Management (DQM) solutions data profiling has often been seen as a one-off exercise that precedes data quality improvement and data matching, data migration and other data management initiatives.

With a data observability solution, you will be able to implement continuous data profiling and related monitoring.

Metadata Management

Metadata management is essential for data observability, MDM and data quality respectively and over essential for getting the full return of investment in a triangle of data observability, MDM and data quality solutions.

Three Essential Trends in Data Management for 2024

21st December 202312th April 2024Henrik Gabs LiliendahlLeave a comment

On the edge of the New Year, it is time to guess what will be the hot topics in data management next year. My top three candidates are:

Continued Enablement of Augmented Data Management
Embracing Data Ecosystems
Data Management and ESG

Continued Enablement of Augmented Data Management

The term augmented data management is still a hyped topic in the data management world. “Augmented” is here used to describe an extension of the capabilities that is now available for doing data management with these characteristics:

Inclusion of Machine Learning (ML) and Artificial Intelligence (AI) methodology and technology to handle data management challenges that until now have been poorly solved using traditional methodology and technology.
Encompassing graph approaches and technology to scale and widen data management coverage towards data that is less structured and have more variation than data that until now has been formally managed as an asset.
Aiming at automating data management tasks that until now have been solved in manual ways or simply not been solved at all due to the size and complexity of the work involved.

It is worth noticing that the Artificial Intelligence theme lately has been dominated by generative AI and namely ChatGPT. However, for data management generative AI will in my eyes not be the most frequently used AI flavor. Learn more about data management and AI in the post Three Augmented Data Management Flavors.

Embracing Data Ecosystems

The strength of data ecosystems was latest examined here on the blog in the post From Platforms to Ecosystems.

Data ecosystems include:

The infrastructure that connects ecosystem participants and help organizations transform from local and linear ways of doing business toward virtual and exponential operations.
A single source of truth for ecosystem participants that becomes a single source of truth across business partner ecosystems by providing all ecosystem participants with access to the same data.
Business model and process transformation across industries to support agile reconfiguration of business models and processes through information exchange inside and between ecosystems.

In short, your organization cannot grow faster than your competitors by hiding all data behind your firewall. You must share relevant data within your business ecosystem in an effective manner.

Data Management and ESG

ESG stands for Environmental, Social and Governance. This is often called sustainability. In a business context, sustainability is about how your products and services contribute to sustainable development.

When working as a data management consultant I have seen more and more companies having ESG on top of the agenda and therefore embarking on programs to infuse ESG concepts into data management. If you can tie a proposed data management effort to ESG, you have a good chance of getting that effort approved and funded.

Capturing ESG data is very much about sharing data with your business partners. This includes getting new product data elements from upstream trading partners and providing such data to downstream trading partners. These new data elements are often not covered through traditional ways of exchanging product data. Getting the traditional product information through data supply chains is already challenged so adding the new ESG dimension is a daunting task for many organizations.

Therefore, we are ramping up to also cover ESG data in the collaborative product data syndication service I am involved in and is called Product Data Lake.

Which Data Management KPIs Should You Measure?

18th July 2022Henrik Gabs Liliendahl2 Comments

Everyone agrees that the result your data management efforts should be measured and the way to do that should be to define some Key Performance Indicators that can be tracked.

But what should those KPIs be? This has been a key question (so to speak) in almost all data management initiatives I have been involved with. You can with the tools available today easily define some technical indicators close to the raw data such as percentage of duplicate data records and completeness of data attributes. The harder thing to do is to relate data management efforts to business terms and quantify the expected and achieved results in business value.

A recent Gartner study points out five areas where such KPIs can be defined and measured. The aim is that data / information become a monetizable asset. The KPIs revolves around business impact, time to action, data quality, data literacy and risk.

Get a free copy of the Gartner report on 5 Data and Analytics KPIs Every Executive Should Track from the parsionate site here.

A Guide to Data Quality

12th April 2022Henrik Gabs Liliendahl4 Comments

While working with some exciting strategic data management projects together with the data management consultancy firm parsionate, the quest of ensuring data quality in large companies is one of the key topics.

Your Success Factors

In their latest whitepaper parsionate has put data quality in context. The idea behind is this is that only when your data quality initiatives are connected with business goals they will be acknowledged and sustained in business operations.

Marketing departments today want to drive more sales through online channels. To do that you will need a bunch of data quality improvements like having convincing product descriptions for all products put on sale online and having consistent and updated prices across all channels.

In operative management you always strive for making better decisions. To be able to do that you need accurate, updated, and well-related information about markets, products, competitors.

In strategic management your aim is to exploit economies of scale. During mergers and acquisitions, managers must pay particular attention to data quality. In the case of mergers, it must be ensured that the data quality of the previously separate systems is impeccable so that weaknesses are not ported to the new overall situation.

For HR key objectives are to find the best candidates and develop potential. These processes are being digitalized with machine decisions involved. This can only work if the undelaying data is complete, updated and consistent.

For logistics the future belongs to the intelligent supply chain. In many cases the data needed to support this is available, however not in the right quality at the right time. Here, the right data quality management can make a huge difference.

*Source: parsionate, data quality in context*

The Right Steps to Drive Business Forward

Your roadmap to high data quality that will pave the way to successful business should involve the following 8 steps:

1: Appoint responsible persons for the data

2: Set targets and Key-Performance-Indicators

3: Evaluate data quality of existing data

4: Cleanse and harmonize data inventories

5: Define standards and processes

6: Automate data quality maintenance

7: Regulate data quality across divisions, groups and borders

8: Continuously improve data quality

Learn More

To get more details on the range of success factors for the various business areas and the 8 step roadmap you can download a free copy of the parsionate Data Quality in Context guide here.

Three Augmented Data Management Flavors

23rd March 202223rd March 2022Henrik Gabs LiliendahlLeave a comment

What is Augmented Data Management?

The term augmented data management has become a hyped topic in the data management world. “Augmented” is here used to describe an extension of the capabilities that is now available for doing data management with these characteristics:

Inclusion of Machine Learning (ML) and Artificial Intelligence (AI) methodology and technology to handle data management challenges that until now have been poorly solved using traditional methodology and technology
Encompassing graph approaches and technology to scale and widen data management coverage towards data that is lesser structured and have more variation than data that until now has been formally managed as an asset
Aiming at automating data management tasks that until now have been solved in manual ways or simply not been solved at all due to the size and complexity of the work involved.

Augmented data management can be applied to all the data management disciplines we know. In the following I will have a look at three data management disciplines where we today see solutions and implementations emerging. These are:

Augmented Metadata Management
Augmented Master Data Management
Augmented Data Quality Management

Augmented Metadata Management

The word metadata has been around for ages and the importance of metadata management as a prerequisite for proper data management is commonly agreed on among data management professionals. However, the concrete examples of successful enterprise-wide implementations are sparse. Even more, examples of solutions that are governed and maintained over time are rare.

Metadata management is a daunting task. Doing a snapshot of the metadata in play within an enterprise just now is hard enough. Maintaining this as new data types are utilized, applications are replaced, the organization changes, new standards are adopted, and more is even more daunting.

So, here augmented metadata management comes with a promise of automating this task by providing active metadata management, that is enabled by using machine learning and artificial intelligence components and relying on graph approaches that are able to picture complex relationships between metadata.

Augmented Master Data Management

Master Data Management (MDM) solutions are being implemented around the clock in large and midsize organizations. As these solutions become a part of business processes there are people responsible for controlling and maintaining master data. While some of this work can be automated through Robotic Automation Processes (RPA) there is still a substantial amount of work that relies on decision making not easily solved that way. Add to that, that more and more data will become part of MDM solutions.

So, here augmented master data management comes with a promise of automating these tasks by using machine learning and artificial intelligence components that where feasible can rely on graph approaches that are able to picture complex relationships between master data.

Augmented Data Quality

The promise of automating data quality tasks through machine learning and artificial intelligence is not new at all. For decades this approach has been tried out in areas such as data matching and product classification.

What we see now is that this approach has matured and is more widespread utilized, including going from being standalone specialty solutions to being components in broader data management solutions.

One example of how data quality, master data management and metadata management is supported by augmented data management in a mature solution is showcased in a video embedded in the Semarchy blog post How Augmented Data Management Adds Value to Your Business.

PS: The term augmented stems from music – raising a chord to new height

2022 Data Management Predictions

30th December 2021Henrik Gabs Liliendahl2 Comments

On the second last day of the year it is time to predict about next year. My predictions for the year gone were in the post Annus Horribilis 2020, Annus Mirabilis 2021?. These predictions were fortunately fluffy enough to claim that they were right.

There is no reason not to believe that the wave of digitalization will go on and even intensify. Also, it seems obvious that data management will be a sweet spot of digitalization.

The three disciplines within data management focussed on at this blog are:

MDM: Master Data Management
PIM: Product Information Management
DQM: Data Quality Management

So, let`s look at what might happen next year within these overlapping disciplines.

MDM in 2022

MDM will keep inflating as explained in the post How MDM inflates.

More organizations will go for enterprise wide MDM implementations and those who accomplish that will continue to do interenterprise MDM.

More business objects will be handled within the MDM discipline. Multidomain MDM will in more and more cases extend beyond the traditional customer, supplier and product domain.

Intelligent capabilities as Machine Learning (ML) and Artificial Intelligence (AI) will augment the basic IT capabilities currently used within MDM.

PIM in 2022

As with MDM also PIM will go more interenterprise wide. As organizations get a grip on internal product data stores the focus will move to collaborating with external suppliers of product data and external consumers of product data through Product Data Syndication.

In some industries PIM will start extending from the handling the product model to also handling each instance of each product as examined in the post Product Model vs Product Instance.

There will also be a term called augmented PIM meaning using Machine Learning and Artificial Intelligence to improve product data quality. In fact, classification of products using AI has been an early use case of AI in data management. This use case will be utilized more and more besides other product information use cases for AI and ML.

DQM in 2022

Data quality management will also go wider as data quality requirements increasingly will be a topic in business partnerships. More and more contracts between trading partners will besides pricing and timing also emphasize on data quality.

Data quality improvement has for many years been focused on the quality of customer data. This is now extending to other business objects where we will see data quality tools will get better support for other data domains and the data quality dimensions that are essential here.

ML and AI data quality use cases will continue to be implemented and go beyond the current trial stage to be part of operational business processes though still at only a minority of organizations.

Happy New Year.

The Disruptive MDM/PIM/DQM List 2022: Datactics

9th November 2021Henrik Gabs LiliendahlLeave a comment

A major rework of The Disruptive MDM/PIM/DQM List is in the making as the number of visitors keep increasing and so do the number of requests for individual solution lists.

It is good to see that some of the most innovative solution providers commit to be part of the list also next year.

One of those is Datactics.

Datactics is a veteran data quality solution provider who is constantly innovating in this space. This year Datactics was one of the rare new entries in The Gartner Magic Quadrant for Data Quality Solutions 2021.

It will be exciting to follow the ongoing development at Datactics, who is operating under the slogan: “Democratising Data Quality”.

You can learn more about how their self-service data quality and matching solution looks like here.

Five Pairs of Data Quality Dimensions

17th October 202117th October 2021Henrik Gabs LiliendahlLeave a comment

Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:

Validity / Conformity – same same but different

Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.

The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.

Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.

Sometimes conformity is linked to the geography in question. For example a postal code will be conform depending on the country where the address is in. Therefore, a the postal code 12345 is conform in Germany, but not in United Kingdom.

Accuracy / Precision – true, false or not sure

The difference between accuracy and precision is a well-known statistical subject.

In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.

But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.

If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.

Timeliness / Currency – when time matters

Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.

Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.

Uniqueness / Duplication – positive or negative

Uniqueness is the positive term where duplication is the negative term for the same issue.

We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that

“Robert Smith at 123 Main Street, Suite 2 in Anytown”

is the same person as

“Bob Smith at 123 Main Str in Anytown”

Completeness / Existence – to be, or not to be

Completeness is most often used to tell in what degree all required data elements are populated.

Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.

So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.

Data Quality Management

Master Data Management (MDM) solutions and specialized Data Quality Management (DQM) tools have capabilities to asses data quality dimensions and improve data quality within the different data quality dimensions.

Check out the range of the best solutions to cover this space on The Disruptive MDM &PIM &DQM List.

Opportunities on The Data Quality Tool Market

4th May 2021Henrik Gabs LiliendahlLeave a comment

The latest Information Difference Data Quality Landscape is out. This is a generic ranking of major data quality tools on the market.

You can see the previous data quality landscape in the post Congrats to Datactics for Having the Happiest DQM Customers.

There are not any significant changes in the relative positioning of the vendors. Only thing is that Syncsort has been renamed to Precisely.

As stated in the report, much of the data quality industry is focused on name and address validation. However, there are many opportunities for data quality vendors to spread their wings and better tackle problems in other data domains, such as product, asset and inventory data.

One explanation of why this is not happening is probably the interwoven structure of the joint Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM) markets and disciplines. For example, a predominant data quality issue as completeness of product information is addressed in PIM solutions and even better in Product Data Syndication (PDS) solutions.

Here, there are some opportunities for pure play vendors within each speciality to work together as well as for the larger vendors for offering both a true integrated overall solution as well as contextual solutions for each issue with a reasonable cost/benefit ratio.

Data Quality and Interenterprise Data Sharing

13th April 202113th April 2021Henrik Gabs LiliendahlLeave a comment

When working with data quality improvement there are three kinds of data to consider:

First-party data is the data that is born and managed internally within the enterprise. This data has traditionally been in focus of data quality methodologies and tools with the aim of ensuring that data is fit for the purpose of use and correctly reflects the real-world entity that the data is describing.

Third-party data is data sourced from external providers who offers a set of data that can be utilized by many enterprises. Examples a location directories, business directories as the Dun & Bradtstreet Worldbase and public national directories and product data pools as for example the Global Data Synchronization Network (GDSN).

Enriching first-party data with third-party is a mean to ensure namely better data completeness, better data consistency, and better data uniqueness.

Second-party data is data sourced directly from a business partner. Examples are supplier self-registration, customer self-registration and inbound product data syndication. Exchange of this data is also called interenterprise data sharing.

The advantage of using second-party in a data quality perspective is that you are closer to the source, which all things equal will mean that data better and more accurately reflects the real-world entity that the data is describing.

In addition to that, you will also, compared to third-party data, have the opportunity to operate with data that exactly fits your operating model and make you unique compared to your competitors.

Finally, second-party data obtained through interenterprise data sharing, will reduce the costs of capturing data compared to first-party data, where else the ever-increasing demand for more elaborate high-quality data in the age of digital transformation will overwhelm your organization.

The Balancing Act

Getting the most optimal data quality with the least effort is about balancing the use of internal and external data, where you can exploit interenterprise data sharing through combining second-party and third-party data in the way that makes most sense for your organization.

As always, I am ready to discus your challenge. You can book a short online session for that here.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph