Data discovery is a term probably most mentioned in relation to business intelligence and data science. I this context data discovery can be seen as a more experimental and preliminary activity that can lead to a more continuous and integrated form of reporting and predictive analysis when hidden data sources, relationships and patterns are identified.
However, data discovery is useful in other data management disciplines as well.
With the increasing awareness of data security, data protection and data privacy – and the regularity compliance enforced in this space – it is crucial for organisations to know what kind of data that flows and are stored within the organization. While you may argue that this should be available in already existing documentation, I have yet to meet an organization, where this is the case. And I come around a lot.
Data discovery is also a component of test data management and tool vendors package their offerings in this space with capabilities for data masking, data subsetting and data discovery in order to answer questions as:
- Where are the data elements that should be masked when using production data in test scenarios without violating data privacy regulations?
- How can you subset (minimize) test data sets derived from production (covering several databases) and still have proper relationships covered?
Within Data Quality Management, Data Governance and Master Data Management (MDM) data discovery also plays a role similar to the role in data reporting. We can use data discovery to map data lineage, find potential data relationships where data matching, data cleansing and/or data stewardship might help with ensuring data quality and business process improvement and explore where the same data have different labels (metadata) attached or the same labels are used for different data types.
Yesterday I had the pleasure of attending the Informatica MDM 360 and Data Governance Summit in London including being in a panel discussing best practices for your MDM 360 journey. The rise of Artificial Intelligence (AI) in Master Data Management (MDM) was a main theme at this event.
Informatica has a track record of innovating in new technologies in the data management space while also acquiring promising newcomers in order to fast track their market offering. So it is with AI and MDM at Informatica too. Informatica currently has two tracks:
- clAIre – the clairvoyant component in the Informatica portfolio that “using machine learning and other AI techniques leverages the industry-leading metadata capabilities of the Informatica Intelligent Data Platform to accelerate and automate core data management and governance processes”.
- Informatica Customer 360 Insights which is the new branding of the recent AllSight acquisition. You can learn about that over at The Disruptive Master Data Management Solutions List in the entry about Informatica Customer 360 Insights.
At the Informatica event the synergy between these two tracks was presented as the Intelligent 360 View. Naturally, marketing synergies are the first results of an acquisition. Later we will – hopefully – see actual synergies when the technologies are to be aligned, positioned and delivered to customers who want to be an intelligent enterprise of the future.
The title of this blog post is also the title of a presentation I will do at the 2019 Data Governance and Information Quality Conference in San Diego, US in June.
There is a little difference between how we can exercise data governance and information quality management when we are handling data about products versus handling the most common data domain being party data (customer, vendor/supplier, employee and other roles).
This topic was touched here on the blog in the post called Data Quality for the Product Domain vs the Party Domain.
The conference session will go through these topics:
- Product master data vs. product information
- How Master Data Management (MDM), Product Information Management (PIM) and Digital Asset Management (DAM) stick together
- The roles of 1st party data, 2nd party data and 3rd party data in MDM, PIM and DAM
- Business ecosystem wide product data management
- Cross company data governance and information quality alignment
You can have a look at the full agenda for the DGIQ 2019 Conference here.
20 years ago, when I started working as a contractor and entrepreneur in the data management space, data was not on the top agenda at many enterprises. Fortunately, that has changed.
An example is displayed by Schneider Electric CEO Jean-Pascal Tricoire in his recent blog post on how digitization and data can enable companies to be more sustainable. You can read it on the Schneider Electric Blog in the post 3 Myths About Sustainability and Business.
Manufacturers in the building material sector naturally emphasizes on sustainability. In his post Jean-Pascal Tricoire says: “The digital revolution helps answering several of the major sustainability challenges, dispelling some of the lingering myths regarding sustainability and business growth”.
One of three myths dispelled is: Sustainability data is still too costly and time-consuming to manage.
From my work with Master Data Management (MDM) and Product Information Management (PIM) at manufacturers and merchants in the building material sector I know that managing the basic product data, trading data and customer self-service ready product data is hard enough. Taking on sustainability data will only make that harder. So, we need to be smarter in our product data management. Smart and sustainable homes and smart sustainable cities need smart product data management.
In his post Jean-Pascal Tricoire mentions that Schneider Electric has worked with other enterprises in their ecosystem in order to be smarter about product data related to sustainability. In my eyes the business ecosystem theme is key in the product data smartness quest as pondered in the post about How Manufacturers of Building Materials Can Improve Product Information Efficiency.
The term data monetization is trending in the data management world.
Data monetization is about harvesting direct financial results from having access to data that is stored, maintained, categorized and made accessible in an optimal manner. Traditionally data management & analytics has contributed indirectly to financial outcome by aiming at keeping data fit for purpose in the various business processes that produced value to the business. Today the best performers are using data much more directly to create new services and business models.
In my view there are three flavors of data monetization:
- Selling data: This is something that have been known to the data management world for years. Notable examples are the likes of Dun & Bradstreet who is selling business directory data as touched in the post What is a Business Directory? Another examples is postal services around the world selling their address directories. This is the kind of data we know as third party data.
- Wrapping data around products: If you have a product – or a service – you can add tremendous value to these products and services and make them more sellable by wrapping data, potentially including third party data, around those products and services. These data will thus become second party data as touched in the post Infonomics and Second Party Data.
- Advanced analytics and decision making: You can combine third party data, second party data and first party data (your own data) in order to make advanced analytics and fast operational decision making in order to sell more, reduce costs and mitigate risks.
Please learn more about data monetization by downloading a recent webinar hosted by Information Builders, their expert Rado Kotorov and yours truly here.
Sometimes you may get the impression that sales, including online sales, is driven by extremely smart sales and marketing people targeting simple-minded customers.
Let us look at an example with selling a product online. Below are two approaches:
Bigger picture is available here.
My take is that the data rich approach is much more effective than the alternative (but sadly often used one). Some proof is delivered in the post Ecommerce Su…ffers without Data Quality.
In many industries, the merchant who will cash in on the sale will be the one having the best and most stringent data, because this serves the overwhelming majority of buying power, who do not want to be told what to buy, but what they are buying.
So, pretending to be an extremely smart data management expert, I will argue that you can monetize on product data by having the most complete, timely, consistent, conform and accurate product information in front of your customers. This approach is further explained in the piece about Product Data Lake.
This week I attended the Master Data Management Summit Europe 2018 and Data Governance Conference Europe 2018 in London.
Among the recurring sessions year by year on this conference and the sister conferences around the world will be Aaron Zornes presenting the top MDM Vendors as he (that is the MDM Institute) sees it and the top System Integrators as well.
Managing an ongoing list of such entities can be hard and doing it in PowerPoint does not make the task easier as visualized in two different shots captured via Twitter as seen below around the Top 19 to 22 European MDM / DG System Integrators:
Bigger picture available here.
Now, the variations between these two versions of the truth and the real world are (at least):
- Red circles: Is number 17 (in alphabetical order) Deloitte – in Denmark – who bought Platon 5 years ago or is it KPMG.
- Blue arrow and circles: Is SAP Professional Services in there or not – and if they are, there must be 21 Top 20 players with two number 11: Edifixio and Entity Group
- Green arrow: Number 1 (in alphabetical order) Affecto has been bought by number 8 CGI during this year.
PS: Recently I started a disruptive list of MDM vendors maintained by the vendors themselves. Perhaps the analysts can be helped by a similar list for System Integrators?
Our February 2018 version of the Product Data Lake cloud service is live. New capabilities include:
- Subscriber clusters
- Put APIs
As a Product Data Lake customer, you can be a subscriber to our public cloud (www.productdatalake.com) or install the Product Data Lake software on your private cloud.
Now there is a hybrid option: Being a member of a subscriber cluster. A subscriber cluster is an option for example for an affiliated group of companies, where you can share product data internally while at the same time you can share product data with trading partners from outside your group using the same account.
Already existing means to feed Product Data Lake include FTP file drops, traditional file upload from your desktop or network drives or actually entering data into Product Data Lake. Now you can also use our APIs for system to system data exchange.
Get the Overview
Get the full Product Data Lake Overview here (opens a PDF file).
Back in 2015 Gartner, within a Magic Quadrant for MDM, described two different ways observed in how you may connect big data and master data management as reported in the post Two Ways of Exploiting Big Data with MDM.
In short, the two ways observed were:
- Capabilities to perform MDM functions directly against copies of big data sources such as social network data copied into a Hadoop environment. Gartner then found that there have been very few successful attempts (from a business value perspective) to implement this use case, mostly as a result of an inability to perform governance on the big datasets in question.
- Capabilities to link traditionally structured master data against those sources. Gartner then found that this use case is also sparse, but more common and more readily able to prove value. This use case is also gaining some traction with other types of unstructured data, such as content, audio and video.
In my eyes the ability to perform governance on big datasets is key. In fact, master data will tend to be more externally generated and maintained, just like big data usually is. This will change our ways of doing information governance as for example discussed in the post MDM and SCM: Inside and outside the corporate walls.
Eventually, we will see use cases of intersections of MDM and big data. The one I am working with right now is about how you can improve sharing of product master data (product information) between trading partners. While this quest may be used for analytical purposes, which is the said aim with big data, this service will fundamentally serve operational purposes, which is the predominant aim with master data management.
This big data, or rather data lake, approach is about how we by linking metadata connects different perceptions of product information that exists in cross company supply chains. While everyone being on the same standard at the same time would be optimal, this is quite utopic. Therefore, we must encourage pushing product information (including rich textual content, audio and video) with the provider’s standard and do the “schema-on-read” stuff when each of the receivers pulls the product information for their purposes.
If you want to learn more about how that goes, you can follow Product Data Lake here.
Business outcome is the end goal of any data management activity may that be data governance, data quality management, Master Data Management (MDM) and Product Information Management (PIM).
Business outcome comes from selling more and reducing costs.
At Product Data Lake we have a simple scheme for achieving business outcome through selling more goods and reducing costs of sharing product information between trading partners in business ecosystems:
Interested? Get in touch: