Using External Data in Data Matching

One of the things that data quality tools does is data matching. Data matching is mostly related to the party master data domain. It is about comparing two or more data records that does not have exactly the same data but are describing the same real world entity.

Common approaches for that is to compare data records in internal master data repositories within your organization. However, there are great advantages in bringing in external reference data sources to support the data matching.

Some of the ways to do that I have worked with includes these kind of big reference data:

identityBusiness directories:

The business-to-business (B2B) world does not have privacy issues in the degree we see in the business-to-consumer (B2C) world. Therefore there are many business directories out there with a quite complete picture of which business entities exists in a given country and even in regions and the whole world.

A common approach is to first match your internal B2B records against a business directory and obtain a unique key for each business entity. The next step of matching business entities with that unique is a no brainer.

The problem is though that an automatic match between internal B2B records and a business directory most often does not yield a 100 % hit rate. Not even close as examined in the post 3 out of 10.

Address directories:

Address directories are mostly used in order to standardize postal address data, so that two addresses in internal master data that can be standardized to an address written in exactly the same way can be better matched.

A deeper use of address directories is to exploit related property data. The probability of two records with “John Smith” on the same address being a true positive match is much higher if the address is a single-family house opposite to a high-rise building, nursery home or university campus.

Relocation services:

A common cause of false negatives in data matching is that you have compared two records where one of the postal addresses is an old one.

Bringing in National Change of Address (NCOA) services for the countries in question will help a lot.

The optimal way of doing that (and utilizing business and address directories) is to make it a continuous element of Master Data Management (MDM) as explored in the post The Relocation Event.

Bookmark and Share

Where to put Master Data?

The core of most Master Data Management (MDM) solutions is a master data hub. MDM solutions as those appearing in analyst reports revolves around a store for master data that is a new different place than where master data usually are. That is for example being in CRM, SCM and ERP systems.

For large organizations with a complex IT landscape having a MDM hub is usually the only sensible solution.

However for many midsize and smaller organizations, and even large organizations with a dominant ERP system as well, the choice is often naming one of the application databases to be the main master data hub for a given master data domain as customer, supplier, product and what else is considered a master data entity.

In such cases you may apply things as data quality services as described in the post Lean MDM and other master data related services as told in post Service Oriented MDM.

scaleThere are arguments for and against both approaches. The probably most used argument against the MDM hub approach is that why you should solve the issue of having X data silos with creating data silo X + 1. The argument against naming a given application as the place of master data is that an application is built for a specific purpose and therefore is not good for other purposes of master data use.

Where do you put your master data? Why?

Bookmark and Share

Service Oriented MDM

puzzleMuch of the talking and doing related to Master Data Management (MDM) today revolves around the master data repository being the central data store for information about customers, suppliers and other parties, products, locations, assets and what else are regarded as master data entities.

The difficulties in MDM implementations are often experienced because master data are born, maintained and consumed in a range of applications as ERP systems, CRM solutions and heaps of specialized applications.

It would be nice if these applications were MDM aware. But usually they are not.

As discussed in the post Service Oriented Data Quality the concepts of Service Oriented Architecture (SOA) makes a lot of sense in deploying data quality tool capacities that goes beyond the classic batch cleansing approach.

In the same way, we also need SOA thinking when we have to make the master data repository doing useful stuff all over the scattered application landscape that most organizations live with today and probably will in the future.

MDM functionality deployed as SOA components have a lot to offer, as for example:

  •  Reuse is one of the core principles of SOA. Having the same master data quality rules applied to every entry point of the same sort of master data will help with consistency.
  •  Interoperability will make it possible to deploy master data quality prevention as close to the root as possible.
  •  Composability makes it possible to combine functionality with different advantages – e.g. combining internal master data lookup with external reference data lookup.

Bookmark and Share

External Events, MDM and Data Stewardship

Exploiting external data is an essential part of party master data management as told in the post Third-Party Data and MDM.

TimingExternal data supports data quality improvement and prevention of party master data by:

  • Ensuring accuracy of party master data entities best at point of entry but sometimes also by later data enrichment
  • Exploring relationships between master data entities and thereby enhance the completeness of party master data
  • Keeping up the timeliness of party master data by absorbing external events in master data repositories

External events around party master data are:

Updating with some of these events may be done automatically and some events requires manual intervention.

Right now I’m working with data stewardship functionality in the instant Data Quality MDM Edition where the relocation event, the deceased event and other important events in party master data life-cycle management is supported as part of a MDM service.

Bookmark and Share

Winning by Sharing Data

When I changed my laptop a few months ago, it was the easiest migration to a new computer ever.

Basically I just had to connect to all the services in the cloud I had been using before and for many services the path was to get connected to Google+, Twitter and FaceBook and then connect to many other services via these connections.

ShareThis was a personal win.

Most of the teams I am working with are sharing their data with me in the cloud. As in the bad old days I do not have to call and ask for progress on this and that. I can check the status myself and even get notifications on my phablet when a colleague completes a task.

ShareThis is a shared win.

Within my profession being data quality improvement and Master Data Management (MDM) sharing data is going to be a winning path too as told in the post Sharing is the Future of MDM.

There are several ways of sharing master data like using commercial third party data, digging into open government data, having your own data locker and relying on social collaboration. These options are examined in the post Ways of Sharing Master Data.

Bookmark and Share

Omni-purpose MDM

The terms omni-channel banking and omni-channel retailing are becoming popular within businesses these days.

In this context omni (meaning all) is considered to be something more advanced than multi (meaning many) as in multi-channel retailing.

Data management, including Master Data Management (MDM), is always a bit behind the newest business trends. In our discipline we have hardly even entered the multi stage yet.

Some moons ago I wrote about multi-channel data matching on the Informatica Perspectives blog in the post Five Future Data Matching Trends. Today, on the same blog, Stephan Zoder has the post asking: Is your social media investment hampered by your “data poverty”?

Herein Stephan examines the possible benefits of multi-channel data matching based on a business case within the gambling industry.

Using omni in relation to MDM was seen in a vendor presentation at the Gartner MDM Summit in London last week as reported in the post Slicing the MDM Space. Omnidomain MDM was the proposed term here.

The end goal should probably be something that could be coined as omni-purpose MDM. This will be about advancing MDM capabilities to cover multiple domains and embrace multiple channels in order to obtain a single view of every core entity that can be used in every business process.

Omni

Bookmark and Share

The Intersections of Big Data, Data Quality and Master Data Management

This blog has since 2009 been very much about the intersection between Master Data Management (MDM) and data quality. These two disciplines are closely related as the vast majority of work with data quality improvement going on is related to master data taking some slightly different forms depending on if we are fighting with party master data, product master data, location master data or other master data domains.

Big Data Quality MDMIn mid 2011 the term big data became more popular than data quality as reported in post Data Quality vs Big Data. After initial euphoria about big data and focus on the analytical side of big data the question about big data quality has fortunately gained traction. Apart from the quality of the algorithms used in big data analytics the quality of the big data is definitely a factor to be taken very serious when deciding to act on the outcomes of big data analytics.

There are questions about the quality of the big data itself as for example told in the post Crap, Damned Crap, and Big Data. This story is about social data and how crappy these data streams may be. Another prominent flavor of big data is sensor data where there also may be issues of data quality as in the example mentioned in the post Going in the Wrong Direction.

As examined in the latter example the quality of big data will in many cases have to be measured by how well big data relates to internal master data and external reference data. You may find more examples of that in the post Big Data and Multi-Domain Master Data Management.

Bookmark and Share

Slicing the MDM Space

Master DataThese days I am attending the Gartner MDM summit in London.

MDM (Master Data Management) initiatives and MDM solutions are not created equal and different ways of slicing the MDM world were put forward on the first day.

Gartner is famous for the magic quadrants and during the customer master data quadrant presentation I heard Bill O’Kane explain why this is a separate quadrant from the product master data quadrant and why there are no challengers and no visionaries.

In another session about MDM milestones Bill O’Kane for this context sliced the MDM world a bit differently based on moving between MDM styles. Here we had:

  • Business-to-consumer (B2C) Customer Data Integration (CDI)
  • Business-to-business (B2B) customer MDM, Product Information Management (PIM) and other domains.

The vendors in general seems to want to do everything MDM.

Stibo Systems, a traditional PIM vendor, presented the case for multidomain MDM based on how things have developed within eCommerce. Stibo even smuggled the term omnidomain MDM into the slides. A marketing gig in the making perhaps.

The megavendors has bought who ever they need to be multidomain.

Some new solutions are born in the multidomain age. Semarchy is an interesting example as they are so the evolutionary way.

Bookmark and Share

Data Entry by Employees

A recent infographic prepared by Trillium Software highlights a fact about data quality I personally have been preaching about a lot:

Trillium 75 percent

This number is (roughly) sourced from a study by Wayne W. Eckerson of The Data Warehouse Institute made in 2002:

TDWI 76 percent

So, in the fight against bad data quality, a good place to start will be helping data entry personnel doing it right the first time.

One way of achieving that is to cut down on the data being entered. This may be done by picking the data from sources already available out there instead of retyping things and making those annoying flaws.

If we look at the two most prominent master data domains, some ideas will be:

  • In the product domain I have seen my share of product descriptions and specifications being reentered when flowing down in the supply chain of manufacturers, distributors, re-sellers, retailers and end users. Better batch interfaces with data quality controls is one way of coping with that. Social collaboration is another one as told in the post Social PIM.
  • In the customer, or rather party, domain we have seen an uptake of using address validation. That is good. However, it is not good enough as discussed in the post Beyond Address Validation.

Bookmark and Share

Attending a MDM Summit

Going to MDM (Master Data Management) conferences is a great learning experience.

If we look at world-wide conferences there are two series of conferences going on every year:

  • The Master Data Management Summit series lead by the MDM Institute, which is Aaron Zornes
  • The Master Data Management summit series organized by Gartner (the analyst firm)

Both those traveling events are coming to London this spring. First up is the Gartner event the 12th and 13th March. As I have been to the Zornes show several times before, I am looking forward to be at the more expensive Gartner performance this year.

The learning actually starts when you are looking at company names on the attendee list. Some master data issues are showcased here:

There will be people from these three well-known British supermarkets:

GartnerMDM 1

The good folks at Kühne + Nagel (AG & Co.) KG is having a hard time putting their proper name in there:

GartnerMDM 2

And what a timely name for this Swiss company:

GartnerMDM 3

Bookmark and Share