Big Reference Data – Page 2 – Liliendahl on Data Quality

CDI, PIM, MDM and Beyond

16th April 201516th April 2015Henrik Gabs Liliendahl3 Comments

The TLAs (Three Letter Acronyms) in the title of this blog post stands for:

Customer Data Integration
Product Information Management
Master Data Management

CDI and PIM are commonly seen as predecessors to MDM. For example, the MDM Institute was originally called the The Customer Data Integration Institute and still have this website: http://www.tcdii.com/.

Today Multi-Domain MDM is about managing customer, or rather party, master data together with product master data and other master data domains as visualized in the post A Master Data Mind Map. Some of the most frequent other master domains are location master data and asset master data, where the latter one was explored in the post Where is the Asset? A less frequent master data domain is The Calendar MDM Domain.

Quadrant You may argue that PIM (Product Information Management) is not the same as Product MDM. This question was examined in the post PIM, Product MDM and Multi-Domain MDM. In my eyes the benefits of keeping PIM as part of Multi-Domain MDM are bigger than the benefits of separating PIM and MDM. It is about expanding MDM across the sell-side and the buy-side of the business eventually by enabling wide use of customer self-service and supplier self-service.

The external self-service theme will in my eyes be at the centre of where MDM is going in the future. In going down that path there will be consequences for how we see data governance as discussed in the post Data Governance in the Self-Service Age. Another aspect of how MDM is going to be seen from the outside and in is the increased use of third party reference data and the link between big data and MDM as touched in the post Adding 180 Degrees to MDM.

Besides Multi-Domain MDM and the links between MDM and big data a much mentioned future trend in MDM is doing MDM in the cloud. The latter is in my eyes a natural consequence of the external self-service themes and increased use of third party reference data which all together with the general benefits of the SaaS (Software as a Service) and DaaS (Data as a Service) concepts will make MDM morph into something like MDaaS (Master Data as a Service) – an at least nearly ten year old idea by the way, as seen in this BeyeNetwork article by Dan E Linstedt.

Leads, Accounts, Contacts and Data Quality

29th July 2014Henrik Gabs Liliendahl2 Comments

business partners Many CRM applications have the concepts of leads, accounts and contacts for registering customers or other parties with roles in sales and customer service.

Most CRM systems have a data model suited for business-to-business (B2B) operations. In a B2B environment:

A lead is someone who might become your customer some day
An account is a legal entity who has or seems to become your customer
A contact is a person that works at or in other ways represent an account

In business-to-consumer (B2C) environments there are different ways of making that model work.

The general perception is that data about a lead can be so and so while it of course is important to have optimal data quality for accounts and contacts.

However, this approach works against the essential data quality rule of getting things right the first time.

Converting a lead into an account and/or a contact is a basic CRM process and the data quality pitfalls in that process are many. To name a few:

Is the lead a new account or did we already have that account in the database?
Is the contact new or did we know that person maybe at another account?
How do we align the known data about the lead with external reference data during the conversion process?

In other words, the promise of having a 360-degree customer view is jeopardized by the concept of most CRM systems.

Using External Data in Data Matching

26th May 20148th July 2014Henrik Gabs Liliendahl4 Comments

One of the things that data quality tools does is data matching. Data matching is mostly related to the party master data domain. It is about comparing two or more data records that does not have exactly the same data but are describing the same real world entity.

Common approaches for that is to compare data records in internal master data repositories within your organization. However, there are great advantages in bringing in external reference data sources to support the data matching.

Some of the ways to do that I have worked with includes these kind of big reference data:

Business directories:

The business-to-business (B2B) world does not have privacy issues in the degree we see in the business-to-consumer (B2C) world. Therefore there are many business directories out there with a quite complete picture of which business entities exists in a given country and even in regions and the whole world.

A common approach is to first match your internal B2B records against a business directory and obtain a unique key for each business entity. The next step of matching business entities with that unique is a no brainer.

The problem is though that an automatic match between internal B2B records and a business directory most often does not yield a 100 % hit rate. Not even close as examined in the post 3 out of 10.

Address directories:

Address directories are mostly used in order to standardize postal address data, so that two addresses in internal master data that can be standardized to an address written in exactly the same way can be better matched.

A deeper use of address directories is to exploit related property data. The probability of two records with “John Smith” on the same address being a true positive match is much higher if the address is a single-family house opposite to a high-rise building, nursery home or university campus.

Relocation services:

A common cause of false negatives in data matching is that you have compared two records where one of the postal addresses is an old one.

Bringing in National Change of Address (NCOA) services for the countries in question will help a lot.

The optimal way of doing that (and utilizing business and address directories) is to make it a continuous element of Master Data Management (MDM) as explored in the post The Relocation Event.

A Digital Sharing Revolution

10th April 2014Henrik Gabs LiliendahlLeave a comment

The last couple of days I have been part of a so called Innovation Camp around how to exploit open public sector data in the private sector. In one of the inspirational keynotes Professor Birgitte Andersen of the Big Innovation Centre used the term “A Digital Sharing Revolution” to describe the trend of increasingly sharing data both within the public sector, between the public sector and the private sector and within the private sector.

energy saving bulb During the two days a lot of ideas for how to exploit open public sector data within the private sector were put on the table. I was so lucky to win a SmartWatch as being part of the group with the winning concept that is a service for identifying buildings with potential for energy saving improvements. This service will be of benefit for both large enterprises as building material manufacturers (and in fact energy suppliers), local small and midsize businesses, the house owners and the society as a whole in order to fulfil climate change prevention goals.

At iDQ we see great potential in using such a service in conjunction with our current offerings for exploiting both open public sector data and other external big reference data sources. Of course, there is a dilemma for enterprises in the private sector in using the same data provided by the same services as their competitors. However there is still a lot of possibilities in sticking out from the crowd in how data and services are actually used in the way of doing business and concentrating on that and not reinventing the wheel in the way collecting data.

There is Open Data in the Air

5th April 2014Henrik Gabs Liliendahl1 Comment

It is spring in Europe and the good news in Europe this week is that from December next year we finally have the end of paying exorbitant fees for having data access on your mobile phone outside a WiFi when in a another EU country as told by BBC here. As a person travelling a lot between EU countries this is, though years too late, fantastic news.

open-door Being too late was unfortunately also the case as examined in the article Sale of postcodes data was a ‘mistake’ say Committee – in News from UK Parliament. When the UK Royal Mail was privatised last year the address directory, known as the PAF file, was part of the deal. It would have been a substantial better deal for the society as a whole if the address data had been set free. This calculation is backed up by figures from experiences in Denmark as reported in the post The Value of Free Address Data.

In the next week I’m looking forward to being part of an innovation camp arranged by the Danish authorities as a step in an initiative to exploit open public sector data in the private sector. Here public data owners, IT students, enterprise data consumers and IT tool and service vendors including iDQ A/S will meet openly and challenge each other in the development of the most powerful ideas for new ways to create valuable knowledge based on open public sector data.

External Events, MDM and Data Stewardship

3rd April 2014Henrik Gabs Liliendahl2 Comments

Exploiting external data is an essential part of party master data management as told in the post Third-Party Data and MDM.

Timing External data supports data quality improvement and prevention of party master data by:

Ensuring accuracy of party master data entities best at point of entry but sometimes also by later data enrichment
Exploring relationships between master data entities and thereby enhance the completeness of party master data
Keeping up the timeliness of party master data by absorbing external events in master data repositories

External events around party master data are:

When someone moves to a new address as examined in post The Relocation Event
When someone moves to another world as told in the post Undertaking in MDM
Heaps of other changes in big reference data

Updating with some of these events may be done automatically and some events requires manual intervention.

Right now I’m working with data stewardship functionality in the instant Data Quality MDM Edition where the relocation event, the deceased event and other important events in party master data life-cycle management is supported as part of a MDM service.

Data is the new petroleum

1st March 2014Henrik Gabs LiliendahlLeave a comment

”Data is the new oil” is a well-known term today used to emphasize on the fact that data and your ability to exploit data can make you rich.

The rise of big data has put some more fire to this burning issue indeed with the variant saying “Big data is the new oil”.

Now, as oil is many things, data is many things too. As few of us actually use crude oil, also called petroleum, few of us don’t use raw data to get rich. We use information distilled from raw data for specific purposes. One example is examined in the post Mashing Up Big Reference Data and Internal Master Data.

This brings me to that we have the question of quality of oil just as we have the question of the quality of data as explained nicely by Ken O’Connor in the post Data is the new oil – what grade is yours?

Big Data Quality and Open Government Data

27th February 20145th March 2014Henrik Gabs LiliendahlLeave a comment

Yesterday I participated in an information meeting at the Danish Ministry for Business and Growth related to an initiative around using open government data within business intelligence in the private sector.

Using open government data is already an essential part of the instant Data Quality concept I’m working with right now and I have earlier written about the state of open government data in Denmark in the posts Government Says So and Making Data Quality Gangnam Style.

At the meeting some well-known questions came up:

Is this big data?

The answer was, that it isn’t exactly big data mainly because the data are well structured and thereby looks more as the traditional data sources that we have been used to working with for many years.

Personally I, if we have to use the big word, like to see these data as big reference data as told in the post Four Flavors of Big Reference Data.

What about data quality?

The answer here was a hope about that the fact that these data was made open for the private sector will create some data quality feedback resulting in that the public sector would improve quality of the data to the benefit of both public sector and private sector data consumers.

Sharing Big Location Reference Data

19th February 2014Henrik Gabs Liliendahl2 Comments

In the post Location Data Quality for MDM the different ways of handling location master data within many companies was examined.

A typical “as is” picture could be this:

Location data are handled for different purposes using different kinds of systems. Customer data may be data quality checked by using address validation tools and services, which also serves as prerequisite for better utilization of these data in a Geographical Information System (GIS) and in using internal customer master data in marketing research for example by utilizing demographic classifications for current and prospective customers.

Often additional external location data are used for enrichment and for supplementing internal master data downstream in these specialized systems. It may very well be that the external location reference data used at different points does not agree in terms of precision, timeliness, conformity and other data quality dimensions.

A desired “to be” picture could be this:

In this set-up everything that can be shared across different purposes are kept as common (big) reference data and/or are accessible within a data-as-a-service environment maintained by third party data providers.

Location Data Quality for MDM

21st January 2014Henrik Gabs LiliendahlLeave a comment

The location domain is after the customer, or rather party, domain and the product domain the most frequent addressed domain for Master Data Management (MDM).

In my recent work I have seen a growing interest in handling location data as part of a MDM program.

Traditionally location data in many organizations have been handled in two main ways:

As a part of other domains typically as address attributes for customer and other party entities
As a silo for special business processes that involves spatial data using Geographic Information Systems (GIS) as for example in engineering and demographic market research.

Handling location data most often involves using external reference data as location data doesn’t have the same privacy considering as party data, not at least data describing natural personals, tend to have and opposite to product data location data are pretty much the same to everyone.

MDM for the location domain is very much about bringing the two above mentioned ways of working with locations together while consistently exploiting external reference data.

As in all MDM work data quality is the important factor and the usual data quality dimensions are indeed in place here as well. Some challenges are:

Uniqueness and precision: Locations comes in hierarchies. As told in the post The Postal Address Hierarchy we when referring to textual addresses have levels as country, region, city or district, thoroughfare (street) or block, building number and unit within a building. Uniqueness may be defined within one of these levels. A discussed in the post Where is the Spot? the precision and use case for coordinates may cause uniqueness issues too.
Timeliness and accuracy: Though it doesn’t happen too often locations do change names as reported in the post MDM in LED and features on new locations does show up every day. I remember a recent press coverage in the United Kingdom over people who couldn’t get car and other insurances because the address of their newly build house wasn’t in the database at the insurance company.
Completeness and conformity: Availability of all “points of interest” in reference data is an issue. The available of all attributes of interest at the desired level is an issue too. The available formats and possible mappings between them is a usual challenge. Addresses in both local and standardized alphabets and script systems using endonyms and exonyms is a problem as told in the posts Where the Streets have Two Names and Where the Streets have one Name but Two Spellings.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph