What to do in 2012

The time between Christmas and New Year is a good time to think about if you are going to do the right things next year. In doing so, you will have to look back at the current year and see how you can develop from there.

In my professional life as a data quality and master data management practitioner my 2011 to do list included these three main activities:

  • Working with Multi-Domain Master Data Quality
  • Exploiting rich external reference data sources in the cloud
  • Doing downstream data cleansing

In a press release from May 2011 Gartner (the analyst firm) Highlights Three Trends That Will Shape the Master Data Management Market. These are:

  • Growing Demand for Multidomain MDM Software
  • Rising Adoption of MDM in the Cloud
  • Increasing Links Between MDM and Social Networks

It looks like I was working in the right space for the first two things but stayed in the past regarding the third activity being downstream data cleansing.

The third thing to embrace in the future, social MDM we may call it, has been an area of interest for me the last couple of years and actually some downstream data cleansing projects has touched making master data useful for including social media networks in the loop.  

I’m not sure if 2012 will be a breakthrough for social MDM, but I think there will be some exciting opportunities out there for paving the road for social MDM.

Bookmark and Share

Party, Product, Place. Period.

In a recent post here on this blog the Master Data Management domain usually called locations was examined and followed by excellent comments.

Also in the DAMA International LinkedIn group there was a great discussion around the location domain.

The comments touched two subjects:

  • Are locations just geographic locations or can we deal with “digital locations” as eMail addresses, phone numbers, websites, go-to-meeting ID’s, social network ID’s and so as locations as well?
  • How do we model the relations between parties, products and locations?

Sometimes I like to use the word places instead of locations as we then have a P-trinity of parties, products and places.

I’m not sure if places have a stronger semantic link to geography than locations have. Anyway, my thoughts on the location domain were merely connected to geography. The digital locations mentioned also in my eyes are more related to parties and not so much products. The same is true for another good old substitute for a location or address being a mailbox (like “Postbox 1234”), which is a valid notion for the destination of a letter or small package, and often seen in database columns else filled with geographic locations.

So, sticking to places being physical, geographic locations: How do we model parties, products and places?

First of all it’s important that we are able to model different concepts within each domain in one single way. A very common situation in many enterprise data landscapes is that different forms of parties exist with different models, like a model for customers, a model for suppliers, a model for employees and other models for other business partner roles.

The association between a party entity and a location entity is in most cases a time dependent relation like this consumer was billed on this address in this period. The relation between the party and the product is the good old basic data model, that we invoiced this and this product on that date. The product and place relation is very industry specific. One example will be that an on-site service contract applies to this address in this period.

Time, often handled as a period, will indeed add a fourth P to the P-trinity of party, product and place.

Bookmark and Share

The Location Domain

When talking master data management we usually divide the discipline into domains, where the two most prominent domains are:

  • Customer, or rather party, master data management
  • Product, sometimes also named “things”, master data management

One the most frequent mentioned additional domains are locations.

But despite that locations are all around we seldom see a business initiative aimed at enterprise wide location data management under a slogan of having a 360 degree view of locations. Most often locations are seen as a subset of either the party master data or in some cases the product master data.  

Industry diversity

The need for having locations as focus area varies between industries.

In some industries like public transit, where I have been working a lot, locations are implicit in the delivered services. Travel and hospitality is another example of a tight connection between the product and a location. Also some insurance products have a location element. And do I have to mention real estate: Location, Location, Location.

In other industries the location has a more moderate relation to the product domain. There may be some considerations around plant and warehouse locations, but that’s usually not high volume and complex stuff.  

Locations as a main factor in exploiting demographic stereotypes are important in retailing and other business-to-consumer (B2C) activities. When doing B2C you often want to see your customer as the household where the location is a main, but treacherous, factor in doing so. We had a discussion on the house-holding dilemma in the LinkedIn Data Matching group recently.

Whenever you, or a partner of yours, are delivering physical goods or a physical letter of any kind to a customer, it’s crucial to have high quality location master data. The impact of not having that is of course dependent on the volume of deliveries.   

Globalization

If you ask me about London, I will instinctively think about the London in England. But there is a pretty big London in Canada too, that would be top of mind to other people. And there are other smaller Londons around the world.

Master data with location attributes does increasingly come in populations covering more than one country. It’s not that ambiguous place names don’t exist in single country sets. Ambiguous place names were the main driver behind that many countries have a postal code system. However the British, and the Canadians, invented a system including letters opposite to most other systems only having numbers typically with an embedded geographic hierarchy.

Apart from the different standards used around the possibilities for exploiting external reference data is very different concerning data quality dimensions as timeliness, consistency, completeness, conformity – and price.

Handling location data from many countries at the same time ruins many best practices of handling location data that have worked for handling location for a single country.

Geocoding

Instead of identifying locations in a textual way by having country codes, state/province abbreviations, postal codes and/or city names, street names and types or blocks and house numbers and names it has become increasingly popular to use geocoding as supplement or even alternative.

There are different types of geocodes out there suitable for different purposes. Examples are:

  • Latitude and longitude picturing a round world,
  • UTM X,Y coordinates picturing peels of the world
  • WGS84 X, Y coordinates picturing a world as flat as your computer screen.

While geocoding has a lot to offer in identifying and global standardization we of course has a gap between geocodes and everyday language. If you want to learn more then come and visit me at N55’’38’47, E12’’32’58.

Bookmark and Share

The Database versus the Hub

In the LinkedIn Multi-Domain MDM group we have an ongoing discussion about why you need a master data hub when you already got some workflow, UI and a database.

I have been involved in several master data quality improvement programs without having the opportunity of storing the results in a genuine MDM solution, for example as described in the post Lean MDM. And of course this may very well result in a success story.

However there are some architectural reasons why many more organizations than those who are using a MDM hub today may find benefits in sooner or later having a Master Data hub.

Hierarchical Completeness

If we start with product master data the main issue with storing product master data is the diversity in the requirements for which attributes is needed and when they are needed dependent on the categorization of the products involved.

Typical you will have hundreds or thousands of different attributes where some are crucial for one kind of product and absolutely ridiculous for another kind of product.

Modeling a single product table with thousands of attributes is not a good database practice and pre-modeling tables for each thought categorization is very inflexible.

Setting up mandatory fields on database level for product master data tables is asking for data quality issues as you can’t miss either over-killing or under-killing.

Also product master data entities are seldom created in one single insertion, but is inserted and updated by several different employees each responsible for a set of attributes until it is ready to be approved as a whole.

A master data hub, not at least those born in the product domain, is built for those realities.

The party domain has hierarchical issues too. One example will be if a state/province is mandatory on an address, which is dependent on the country in question.

Single Business Partner View

I like the term “single business partner view” as a higher vision for the more common “single customer view”, as we have the same architectural requirements for supplier master data, employee master data and other master data concerning business partners as we have for the of course extremely important customer master data.

The uniqueness dimension of data quality has a really hard time in common database managers. Having duplicate customer, supplier and employee master data records is the most frequent data quality issue around.

In this sense, a duplicate party is not a record with accurately the same fields filled and with accurate the same values spelled accurately the same as a database will see it. A duplicate is one record reflecting the same real world entity as another record and a duplicate group is more records reflecting the same real world entity.

Even though some database managers have fuzzy capabilities they are still very inadequate in finding these duplicates based on including several attributes at one time and not at least finding duplicate groups.

Finding duplicates when inserting supposed new entities into your customer list and other party master data containers is only the first challenge concerning uniqueness. Next you have to solve the so called survivorship questions being what values will survive unavoidable differences.

Finally the results to be stored may have several constructing outcomes. Maybe a new insertion must be split into two entities belonging to two different hierarchy levels in your party master data universe.

A master data hub will have the capabilities to solve this complexity, some for customer master data only, some also for supplier master data combined with similar challenges with product master data and eventually also other party master data.

Domain Real World Awareness

Building hierarchies, filling incomplete attributes and consolidating duplicates and other forms of real world alignment is most often fulfilled by including external reference data.

There are many sources available for party master as address directories, business directories and citizen information dependent on countries in question.

With product master data global data synchronization involving common product identifiers and product classifications is becoming very important when doing business the lean way.

Master data hubs knows these sources of external reference data so you, once again, don’t have to reinvent the wheel.

Bookmark and Share

Mutating Platforms or Intelligent Design

How do we go from single-domain master data management to multi-domain master data management? Will it be through evolution of single-domain solutions or will it require a complete new intelligent design?

The MDM journey

My previous blog post was a book review of “Master Data Management in Practice” by Dalton Servo and Mark Allen – or the full title of the book is in fact “Master Data Management in Practice: Achieving True Customer MDM”.

The customer domain has until now been the most frequent and proven domain for master data management and as said in the book, the domain where most organizations starts the MDM journey in particular by doing what is usually called Customer Data Integration (CDI).

However some organizations do start with Product Information Management (PIM). This is mainly due to the magic numbers being the fact that some organizations have a higher number of products than customers in the database.

Sooner or later most organizations will continue the MDM journey by embracing more domains.

Achieving Multi-Domain MDM

John Owens made a blog post yesterday called “Data Quality: Dead Crows Kill Customers! Dead Crows also Kill Suppliers!” The post explains how some data structures are similar between sales and purchasing. For example a customer and a supplier are very similar as a party.

Customer Data Integration (CDI) has a central entity being the customer, which is a party. Product Information Management (PIM) has an important entity being a supplier, which is a party. The data structures and the workflows needed to Create, Read, Update and perhaps Delete these entities are very similar, not at least in business-to-business (B2B) environments.

So, when you are going from PIM to CDI, you don’t have to start from scratch, not at least in a B2B environment.

The trend in the master data management technology market is that many vendors are working their way from being a single domain vendor to being a multi-domain vendor – and some are promoting their new intelligent design embracing all domains from day one.

Some other vendors are breeding several platforms (often based on acquisition) from different domains into one brand, and some vendors are developing from a single domain into new domains.

Each strategy has its pros and cons. It seems there will be plenty of philosophies to choose from when organizations are going the select the platform(s) to support the multi-domain MDM journey.

Bookmark and Share

Psychographic Data Quality

I have just read an article on Mashable by Jamie Beckland called The End of Demographics: How Marketers Are Going Deeper With Personal Data.

The article explains how new sources of available data makes it possible for marketers to get a much closer look at potential customers and thereby going from delivering a broad message to a huge crowd to delivering a very targeted message to a small group of people with a high probability of getting a response.  In short: Marketers are going from demographic marketing to psychographic marketing.

I believe this is true and ongoing (as I have also been involved in such activities).

The data quality issues we have always known in direct marketing is surely very similar in the psychographic marketing which is going on in the social media realm and in connection with eBusiness.

In my eyes, the concept of a single customer view is also a key to getting success in psychographic marketing.  

You are not delivering a targeted message if you are delivering two different messages to two user profiles belonging to the same real world individual.

Your message will be very frustrating if you treat someone as a prospect customer if that someone already is an existing customer perhaps in another channel.

The effectiveness of psychographic marketing depends on a match between the psychographic variables, the behavioral variables and the demographic variables. As seen in the example in the Mashable article a good old thing as geocoding will be needed here.

An exciting thing in the rise of psychographic marketing is that it will add to the trend in data quality technology where it’s much more than simple name and address cleansing and deduplication.  Rich location data will despite the virtual playground be further important. The relations between customers and products as described in the post Customer Product Matrix Management will be further refined in psychographic marketing.       

Bookmark and Share

B2C versus B2B Data Quality

The data quality issues in doing business with private consumers (business-to-consumer = B2C) and doing business with other business’s (business-to-business = B2B) have a lot of similar challenges but also differs in a lot of ways.

Some of my experiences (and thoughts) related to different master data domains are:

Customer master data

In B2C the number of customers, prospects and leads is usually high and characterized by relatively few interactions with each entity.  In B2B you usually have a relatively small number of customers with a high number of interactions.

One of the most automated activities in data quality improvement is matching master data records with information about customers. Many of the examples we see in marketing material, research documents, blog posts and so on is about matching in the B2C realm. This is natural since the high number of records typically with a low attached value calls for automation.

Data matching in the B2B realm is indeed more complex due to numerous challenges like less standardized names of companies and typically more options in what constitutes a single customer. The high value attached to each customer also makes the risk of mistakes a showstopper for too much automation.

So in B2B we see an increasing adaption of creating workflows that insures data quality during data capture often by exploiting external reference data which also in general are more available related to business entities.

Location master data

The location of B2C customers means a lot. Accurate and timely delivery addresses for everything from direct mails to bringing goods to the premises are essential. Location data are used to recognize household relations, assigning demographic stereotypes and in many cases calculating fees of different kind. I had a near disaster experience with a really bad address in my early career.

Even though location data for B2B activities theoretically is just as important, I have often seen that a little less precision is fit for purpose or anyway lower prioritized than more pressing issues.

Product master data

Theoretically there should be no difference between B2C and B2B here, but I guess there is in practice?

The most interesting aspect is probably the multi-domain aspect examining the relations between customers and products.   

I had some experiences some years ago with the B2B realm as described in the post What is Multi-Domain MDM?: 1,000 B2B customers buying 1,000 different finished products can be a quite complicated data quality operation.

Within the B2C realm the most predominant multi-domain data quality issues I have met is related to analytics. As discussed in the post Customer/Product Matrix Management it is about typifying your customers correctly and categorizing your products adequately at the same time.

Bookmark and Share

Non-Obvious Entity Relationship Awareness

In a recent post here on this blog it was discussed: What is Identity Resolution?

One angle was the interchangeable use of the terms “Identity Resolution” and “Entity Resolution”. These terms can be seen as truly interchangeable, as that “Identity Resolution” is more advanced than “Entity Resolution” or as (my suggestion) that “Identity Resolution” is merely related to party master data, but “Entity Resolution” can be about all master data domains as parties, locations and products.

Another term sometimes used in this realm is “Non-Obvious Relationship Awareness”. Also this term is merely related to finding relationships between parties, for example individuals at a casino that seems to do better than the croupiers. Here’s a link to a (rather old) O’Reilly Radar post on Non-Obvious Relationship Awareness.

Going Multi-Domain

So “Non-Obvious Entity Relationship Awareness” could be about finding these hidden relationships in a multi-domain master data scope.

An example could be non-obvious relationships in a customer/product matrix.

The data supporting this discovery will actually not be found in the master data itself, but in transaction data probably being in an Enterprise Data Warehouse (EDW). But a multi-domain master data management platform will be needed to support the complex hierarchies and categorizations needed to make the discovery.   

One technical aspect of discovering such non-obvious relationships is how chains of keys are stored in the multi-domain master data hub.

Customer Master Data

The transactions or sums hereof in the data warehouse will have keys referencing customer accounts. These accounts can be stored in staging areas in the master data hub with references to a golden record for each individual or company in the real world. Depending on the identity resolution available the golden records will have golden relations to each other as they are forming hierarchies of households, company family trees, contacts within companies and their movements between companies and so on.

My guess as described in the post Who is working where doing what? is that this will increasingly include social media data.

Product Master Data

Some of the same transactions or sums hereof in the data warehouse will have keys referencing products. These products will exist in the master data hub as members of various hierarchies with different categorizations.

My guess is that future developments in this field will further embrace not just your own products but also competitor products and market data available in the cloud all attached to your hierarchies and categorizations.   

Bookmark and Share

Multi-Commerce Data Quality

A month ago I wrote about Multi-Channel Data Quality. Multi-Commerce and the related data quality is pretty much another term covering the same challenges which is that despite we today talk a lot about eCommerce, being doing business online, we still have a lot of business going on offline. So we have challenges with online data quality, offline data quality and not at least a single view of online/offline data quality.

According to the Gartner Hype Cycle there is such a thing as Multicommerce Master Data Management. This discipline has just passed the expectation peak but will, according to Gartner, be absorbed by Multidomain Master Data Management on the descent before climbing up again towards enlightenment and productivity.

As data quality and master data management are best friends I find it very likely that Multi-Commerce Data Quality will be all about Multi-Domain Master Data Management, including:

  • Having a single business partner view (that includes single customer view) encompassing all online and offline activities
  • Having a unified way of maintaining and exposing product data online and offline
  • Having the means for doing content management (that includes unstructured data) embracing online presentation as well as offline distribution.    

I also see Multi-Domain Master Data Management as not only doing master data management for several data domains at the same time (with the same software brand), but also exploring the intersections between the different domains.

If you for example look at a customer/product matrix you may add a third dimension being a channel where we examine the relations between a customer type, a product type/attribute and a given channel, thus having a 3D picture of doing business in a multi-commerce environment.

If you are interested in Multi-Domain Master Data Management including how Multi-Commerce Master Data Management and related data quality are developing right now, then please join the LinkedIn group for Multi-Domain MDM by clicking on the puzzle.

Bookmark and Share

Single Business Partner View

If you search in google for “single customer view” you’ll get over 20,000 hits. If you search for “single business partner view” you’ll get zero – until I just posted this blog post.

Some time ago I wrote about getting a 360° Business Partner View elaborating on extending the 360° Customer View or Single Customer View (SVC) to embrace all sorts of party master data managed within the organization.

In fact there is at least the same amount of similar techniques used between

  • managing supplier master data and business-to business (B2B) customer master data

as there is between

  • managing business-to-business (B2B) customer master data and business-to-consumer (B2C) customer master data.

If you look at Customer Relation Management (CRM) systems almost every package is aimed at managing B2B data as the data model and the functionality supports real world B2B structures and how the sales force and other employees interacts with B2B customers and prospects.

Interacting with B2C customers and prospects is much more diverse and often supported by operational systems specialized for the industry in question like solutions for financial services, healthcare and so on.

A business partner is a party acting in the role as customer, prospect, supplier, reseller, distributor, agent and other forms of partnership. Sometimes the same party is acting in several roles at the same time thus potentially being both on the Sell–side and Buy-side of Master Data Quality management.

As sell side and buy side has intersections within party master data, in some industries we may also go deeper into identity resolution and find intersections between B2B entities and B2C entities. I’ve described these matters in the post So, how about SOHO homes. The business case is that some products in some industries are aimed at the households of business owners and the small businesses at the same time. This is for example true for industries as banking, insurance, telco, real estate and  law.

All in all achieving a single view of business partners is a task going beyond traditional customer data integration (CDI) and stretching into areas traditionally belonging to Product Information Management (PIM). This is a business case for multi-domain master data management.

Bookmark and Share