Big data – Liliendahl on Data Quality

Big Data vs Small and Wide Data under a Master Data Lens

20th May 202120th May 2021Henrik Gabs LiliendahlLeave a comment

One of the 10 trends in data and analytics in 2021 identified by Gartner, the analyst firm, is a shift from big data to small and wide data.

A press release from yesterday elaborates on this topic outside the paywall. Here Gartner Says 70% of Organizations Will Shift Their Focus from Big to Small and Wide Data By 2025.

As said in there: “Potential areas where small and wide data can be used are demand forecasting in retail, real-time behavioural and emotional intelligence in customer service applied to hyper-personalization, and customer experience improvement.”

This is a topic close to me and something I wrote about, still using the term big data, last year in a Reltio whitepaper as mentioned in the post How to Use Connected Master Data to Enable New Revenue Models.

Small data is in my eyes very much equivalent to master data besides the meaning promoted by Gartner, which is approaches involving “certain time-series analysis techniques or few-shot learning, synthetic data, or self-supervised learning”.

The concrete wide data to be used and connected in the retail scenario is customer data and product data. There is a current trend of mastering wide customer data in a Customer Data Platform (CDP). Wide product data are best handled in a Product Information Management (PIM) platform with a collaborative Product Data Syndication (PDS) add-on.

In the quest of providing hyper-personalization, you need to connect well identified customer data with product information elements aimed for customization and personalization by applying Artificial Intelligence (AI) methodologies.

So, is the term “small and wide data” better than “big data”?

I think it, besides the narrow analytic purpose forwarded by Gartner, can help unlocking the opportunities in master data underpinned big data that have existed the past decade but that have- by far – not been utilized as much as it could.

Five Disruptive MDM Trends

29th May 201931st May 2019Henrik Gabs LiliendahlLeave a comment

As any other IT enabled discipline Master Data Management (MDM) continuously undergo a transformation while adopting emerging technologies. In the following I will focus on five trends that seen today seems to be disruptive:

Disruptive MDM

MDM in the Cloud

According to Gartner the share of cloud-based MDM deployment has increased from 19% in 2017 year to 24 % in 2018 and I am sure that number will increase again this year. But does it come as SaaS (Software as a Service), PaaS (Platform as a Service) or IaaS (Infrastructure as a Service)? And what about DaaS (Data as a Service). Learn more in the post MDM, Cloud, SaaS, PaaS, IaaS and DaaS.

Extended MDM Platforms

There is a tendency on the Master Data Management (MDM) market that solutions providers aim to deliver an extended MDM platform to underpin customer experience efforts. Such a platform will not only handle traditional master data, but also reference data, big data (as in data lakes) as well as linking to transactions. Learn more in the post Extended MDM Platforms.

AI and MDM

There is an interdependency between MDM and Artificial Intelligence (AI). AI and Machine Learning (ML) depends on data quality, that is sustained with MDM, as examined in the post Machine Learning, Artificial Intelligence and Data Quality. And you can use AI and ML to solve MDM issues as told in the post Six MDM, AI and ML Use Cases.

IoT and MDM

The scope of MDM will increase with the rise of Internet of Things (IoT) as reported in the post IoT and MDM. Probably we will see the highest maturity for that first in Industrial Internet of Things (IIoT), also referred to as Industry 4.0, as pondered in the post IIoT (or Industry 4.0) Will Mature Before IoT.

Ecosystem wide MDM

Doing Master Data Management (MDM) enterprise wide is hard enough. But it does not stop there. Increasingly every organization will be an integrated part of a business ecosystem where collaboration with business partners will be a part of digitalization and thus we will have a need for working on the same foundation around master data. Learn more in the post Multienterprise MDM.

The latest and hottest trends within MDM

10th April 2019Henrik Gabs LiliendahlLeave a comment

Leading up to the Nordic Midsummer I am pleased to join Informatica and their co-hosts Capgemini and CGI at two morning seminars on how successful organizations can leverage data to drive their digital transformation, the needed data strategy and the urge to have a 360-view of data relationships and interactions.

My presentations will be an independent view on the question: What are the latest and hottest trends within Master Data Management?

In this session, I will give the audience a quick walk-through visiting some in vogue topics as MDM in the cloud, MDM for big data, embracing Internet of Things (IoT) within MDM, business ecosystem wide MDM and the impact of Artificial Intelligence (AI) on MDM.

The events will take place, and you can register to be there, as follows:

11 June 2019: Informatica MDM and Data Governance Morning Sessions in Stockholm
14 June 2019: Informatica MDM and Data Governance Morning Sessions in Copenhagen

Infa Nordic morning seminars 2019

MDM Market News: Informatica acquires AllSight

1st March 2019Henrik Gabs LiliendahlLeave a comment

As reported in the news Informatica acquires AI-enabled customer insights startup AllSight to expand its intelligent data platform and help enterprises improve their customer experiences.

A while ago the interest at Informatica to pursue this path of Master Data Management (MDM) was examined here on the blog in the post Multi-Domain MDM 360 and an Intelligent Data Lake.

AllSight is listed on the Disruptive Master Data Management Solutions List.

In my eyes MDM vendors must embrace this kind of solutions in order to deliver an extended MDM platform to underpin customer experience efforts. Such a platform will not only handle traditional master data, but also reference data, big data (as data lakes) either directly or by linking to the data in there as well as linking to transactions.

Traditional Master Data Management will, supplemented with Reference Data Management (RDM), enable the handling of:

Customer, supplier and product identity
Customer, supplier and product hierarchies
Customer, supplier and product locations

Additionally, the data lake concept can be used for:

Including customer footprint on websites
Including customer footprint in social media
Syndicating product data from/to trading partners

Extended MDM Platforms

The Informatica take over of AllSight comes timely for me, as I will join Informatica and speak about the latest and hottest trends in Master Data Management at the morning seminars in Copenhagen 8 April 2019 and Stockholm 9 April 2019.

Three Remarkable Observations about Reltio

21st June 201821st June 2018Henrik Gabs Liliendahl3 Comments

The latest entry on The Disruptive Master Data Management Solutions List is Reltio. I have been following Reltio for more than 5 years and have had the chance to do some hands on lately.

In doing that, I think there are three observations that makes the Reltio Cloud solution a remarkable MDM offering.

More than Master Data

While the Reltio solution emphasizes on master data the platform can include the data that revolves around master data as well. That means you can bring transactions and big data streams to the platform and apply analytics, machine learning, artificial intelligence and those shiny new things in order to go from a purely analytical world for these disciplines to exploit these data and capabilities in the operational world.

The thinking behind this approach is that you can not get a 360-degree on customer, vendor and other party roles as well as 360-degree on products by only having a snapshot compound description of the entity in question. You also need the raw history, the relationships between entities and access to details for various use cases.

In fact, Reltio provides not just operational MDM, but through a module called Reltio IQ also brings continuously mastered data, correlated transactions into an Apache Spark environment for analytics and Machine Learning. This eliminates the traditional friction of synchronizing data models between MDM and analytical environments. It also allows for aggregated results to be synchronized back into the MDM profiles, by storing them as analytical attributes. These attributes are now available for use in operational context, such as marketing segmentation, sales recommendations, GDPR exposure and more.

Multiple Storing Capabilities

There is an ongoing debate in the MDM community these days about if you should use relational database technology or NoSQL technology or graph technology? Reltio utilizes all three of them for the purposes where each approach makes the most sense.

Reference data are handled as relational data. The entities are kept using a wide column store, which is a technique encompassing scalability known from pure column stores but with some of the structure known from relational databases. Finally, the relationships are handled using graph techniques, which has been a recurring subject on this blog.

Reltio calls this multi-model polyglot persistence, and they embrace the latest technologies from multiple clouds such as AWS and Google Cloud Platform (GCP) under the covers.

Survival of the Fit Enough

One thing that MDM solutions do is making a golden record from different systems of records where the same real-world entity is described in many ways and therefore are considered duplicate records. Identifying those records is hard enough. But then comes the task of merging the conflicting values together, so the most accurate values survive in the golden record.

Reltio does that very elegantly by actually not doing it. Survivorship rules can be set up based on all the needed parameters as recency, provenance and more and you may also allow more than one value to survive as touched in the post about the principle of Survival of the Fit Enough.

In Reltio there is no purge of the immediately not surviving values. The golden record is not stored physically. Instead Reltio keeps one (or even more than one) virtual golden record(s) by letting the original source records stay. Therefore, you can easily rollback or update the single view of the truth.

The Reltio platform allows survivorship rules to be customized in rulesets for an unlimited number of roles and personas. In effect supporting multiple personalized versions of the truth. In an operational MDM context this allows sales, marketing, compliance, and other teams to see the data values that they care about most, while collaborating continuously in what Reltio calls the Self-Learning Enterprise.

Going beyond operational MDM

Interenterprise Metadata Handling

8th May 2018Henrik Gabs LiliendahlLeave a comment

I am aware that the title of this blog post is a bit geeky.

However, the terms are important in data management and your organisations ability to prosper in a continuously data driven world.

The term interenterprise was part of the previous post on this blog. The post was called What is Interenterprise Data Sharing? In a comment on LinkedIn analyst Simon Walker of Gartner explained interenterprise data sharing this way: “Interenterprise data sharing = Organizations are increasingly required to provide data to, and receive data from, external trading partners (customers, suppliers, business partners and others).”

Metadata is data about data. Handling metadata is an important facet of data management including in data governance, data quality management and Master Data Management (MDM). When it comes to the new trends in data management as big data and handling data in data lakes, the importance of metadata management will in my eyes become even more obvious.

In a current venture (Product Data Lake) we are working on building in metadata management for business ecosystems, meaning that trading partners can share product information either using the same metadata or linking their different metadata.

Using international, national and industry standards for product information will be the perfect solution within business ecosystem sharing of metadata and indeed this is the preferred option we support. However, there are many competing standards for product information and they come in developing versions, so having everyone on the same page at the same time is quite utopic.

Add to that everyone do not speak English – and even not the same variant of English. Metadata originates and should exist in the languages that is used in trading partnerships.

In Product Data Lake we have started out with these principles:

Product attributes can be tagged with an attribute type telling about what standard (if any) in terms of product identification, product classification or product feature it adheres to. More about that in the post Connecting Product Information.
Attribute short and long descriptions can be represented in different languages.
Trading partners can link their product attributes and have visibility in the Product Data Lake of the standards and descriptions used in the different languages they exist.

pdl-how-2

A Business Oriented Data Mind Map

28th April 2018Henrik Gabs Liliendahl2 Comments

You can look at data in many ways.

Below is a mind map embracing some of the ways you can make a picture of data within your business.

data mind map

Data is often seen as the raw material that will be processed into information, which can be used to gather knowledge and thereby over time emerge as business wisdom.

When working with processing data we may distinguish between structured data that is already pre-processed into a workable format and unstructured data that is not easily ingested as information yet.

The main forms of structured data are:

Reference data that often is defined and maintained in a wider scope than in your organization but where you still may consider be more knowledgeable inside your organisation as touched in the post The World of Reference Data.
Master data that describes the who, where and what in your business transactions. You can drill further down into this in the post A Master Data Mind Map.
Transactions that holds the details of the ongoing production events, about when we make purchases and sales and the financials related to all activities in the business.

Unstructured data will in the end hold much more information than our structured data. This includes communication data, digital assets and big data. Some structured data sources are though also big as examined in the post Five Flavors of Big Data.

We may also store the data in different places. For historical reasons within computer technology we have stored our data on premise, but organizations are, in different pace, increasingly depolying new data stores in the cloud.

In organisations with activities in multiple geographies and/or other organizational splits an ongoing consideration is whether a chunk of data is to be handled locally for each unit or to be handled globally (within the organization).

I am sure there are a lot of other ways in which you can look at data. What is on your mind?

Seven Flavors of MDM

19th April 201825th April 2018Henrik Gabs LiliendahlLeave a comment

Master Data Management (MDM) can take many forms. An exciting side of being involved in MDM implementations is that every implementation is a little bit different which also makes room for a lot of different technology options. There is no best MDM solution out there. There are a lot of options where some will be the best fit for a given MDM implementation.

The available solutions also change over the years – typically by spreading to cover more land in the MDM space.

In the following I will shortly introduce the basic stuff with seven flavours of MDM. A given MDM implementation will typically be focused on one of these flavours with some elements of the other flavors and a given piece of technology will have an origin in one of these flavours and in more or less degree encompass some more flavors.

7 flavours

The traditional MDM platform

A traditional MDM solution is a hub for master data aiming at delivering a single source of truth (or trust) for master data within a given organization either enterprise wide or within a portion of an enterprise. The first MDM solutions were aimed at Customer Data Integration (CDI), because having multiple and inconsistent data stores for customer data with varying data quality is a well-known pain point almost everywhere. Besides that, similar pain points exist around vendor data and other party roles, product data, assets, locations and other master data domains and dedicated solutions for that are available.

Product Information Management (PIM)

Special breed of solutions for Product Information Management aimed at having consistent product specifications across the enterprise to be published in multiple sales channels have been around for years and we have seen a continuously integration of the market for such solutions into the traditional MDM space as many of these solutions have morphed into being a kind of MDM solution.

Digital Asset Management (DAM)

Not at least in relation to PIM we have a distinct discipline around handling digital assets as text documents, audio files, video and other rich media data that are different from the structured and granular data we can manage in data models in common database technologies. A post on this blog examines How MDM, PIM and DAM Stick Together.

Big Data Integration

The rise of big data is having a considerable influence on how MDM solutions will look like in the future. You may handle big data directly inside MDM og link to big data outside MDM as told in the post about The Intersection of MDM and Big Data.

Application Data Management (ADM)

Another area where you have to decide where master data stops and handling other data starts is when it comes to transactional data and other forms data handled in dedicated applications as ERP, CRM, PLM (Product Lifecycle Management) and plenty of other industry specific applications. This conundrum was touched in a recent post called MDM vs ADM.

Multi-Domain MDM

Many MDM implementations focus on a single master data domain as customer, vendor or product or you see MDM programs that have a multi-domain vision, overall project management but quite separate tracks for each domain. We have though seen many technology vendors preparing for the multi-domain future either by:

Being born in the multi-domain age as for example Semarchy
Acquiring the stuff as for example Informatica and IBM
Extend from PIM as for example Riversand and Stibo Systems

MDM in the cloud

MDM follows the source applications up into the cloud. New MDM solutions naturally come as a cloud solution. The traditional vendors introduce cloud alternatives to or based on their proven on-promise solutions. There is only one direction here: More and more cloud MDM – also as customer as business partner engagement will take place in the cloud.

Ecosystem wide MDM

Doing MDM enterprise wide is hard enough. But it does not stop there. Increasingly every organization will be an integrated part of a business ecosystem where collaboration with business partners will be a part of digitalization and thus we will have a need for working on the same foundation around master data as reported in the post Ecosystem Wide MDM.

Why it is not a Product Data Warehouse, but a Product Data Lake

25th February 2018Henrik Gabs LiliendahlLeave a comment

There is a need for a new solution to sharing product information between trading partners. Product Data Lake is that new solution. Using the term data lake as a part of the name for the solution is very deliberate. Here is why:

Volume

When setting up a warehouse, and a data warehouse, you have to estimate the storing size and the throughput. There will be a limit to how much data you can store and how much data you can upload and download within a given period.

Our vision is that Product Data Lake will be the process driven key service for exchanging any sort of product information within business ecosystems all over the world, with the aim of optimally assist self-service purchase of every kind of product.

In order to achieve that vision, we need to be able to scale up drastically. Therefore, we use a document-oriented database called MongoDB to store product information.

Even if you choose to implement a Product Data Lake instance for a single business ecosystem, you will benefit from the high scalability.

Velocity

Business ecosystems changes all the time. You need to rapidly be able to adapt your data management, not at least when it comes to exchanging product information.

Swapping trading partners is one thing. That often means dealing with other product information requirements and opportunities and adhering to other standards.

We will also see business ecosystems in new shapes in the future. There will be fewer nodes between manufacturers and point-of-sales and point-of-sales will more likely be online marketplaces.

However, the changes will not happen as a big bang but in varying pace for each industry, geography and organization.

The rigid consensus structure of a data warehouse, and product information exchange solutions that resembles a data warehouse, will not cope with that change. The data lake concept, in the form of Product Data Lake, will.

In Product Data Lake you as a provider upload product information in your structure and format and you as a receiver download in your structure and format. The linking and transformation takes place inside Product Data Lake using linked metadata.

Variety

While everyone agrees that a common standard for all product information is the best answer we must on the other hand accept, that using a common standard for every kind of product and every piece of information needed is quite utopic. We haven’t even a common uniquely spelled term in English for standardization/standarisation.

Also, we must foresee that one organization will mature in a different pace than another organisation in the same business ecosystem.

These observations are the reasons behind the launch of Product Data Lake. In Product Data Lake we encompass the use of (in prioritized order):

The same standard in the same version
The same standard in different versions
Different standards
No standards

Learn about some of these standards in the post Five Product Classification Standards.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph