Fit for repurposing

23rd February 2012

Reading a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about if data are of high quality if they are:

  • Fit for the purpose of use
  • Fit for repurposing

The first definition has been around for many years and has been adapted by many data quality practitioners. I have however often encountered situations where the reuse of data for other purposes than the original purpose has raised data quality issues with else cleared data. One of my first pieces on my own blog discussed that challenge in a post called Fit for what purpose?

Not at least within master data management where data are maintained for multiple uses, this problem is very common.

Data in a master data hub may either:

  • Be entered directly into the hub where multiple uses is handled
  • Be loaded from other sources where data capture was done

In the latter case the data governance necessary to ensure fitness for multiple uses must stretch to the ingestion in these sources.

Now, if repurposing is seen as a future not yet discovered purpose of use, what can you then do to ensure that data today are fit for future repurposing?

The only answer is probably real world alignment as discussed here on a page called Data Quality 3.0. Make sure your data are reflecting the real world as close as we can when captured and make sure data can be maintained in order to keep that alignment. And make sure this is done and facilitated where data are entered.

Bookmark and Share


Sharing Social Master Data

21st February 2012

If a company runs a Customer Relationship Management (CRM) system all employees are supposed to enter their interactions with customers and prospects including adding new accounts and contacts if it’s the first engagement.

With the rise of social networks first engagements are increasingly done in those networks. Furthermore new employees often bring old contacts from former employments with them thus utilizing an established relationship that probably is manifested in one or more already existing social network connections.

As explained in the post Social Master Data Management the term ”Social CRM” has been around for a while. We now see CRM solutions where the account and contact master data primarily is build on extracting those data from social networks.

I have just tried out such a solution called Nimble.

If you are more than a one-man-band company it’s interesting in what degree you are willing (or forced) to share your connections as master data entities for the CRM solution.

In Nimble you have the choice of differentiate for each network. I would probably freely choose a setup with Twitter and LinkedIn as shared with the team, but Facebook as private:

But that is just how I think based on my way of using social networks.

There is a fundamental data quality versus privacy issue around utilizing employee’s social network connections as master data for CRM and eventually enterprise wide Master Data Management (MDM).

All things equal data quality will be best if everyone contributes within reason. Not at least in sales, but also more or less in other functions, you are hired also because of your relations.

What do you think?

Bookmark and Share


Informatics for adding value to information

20th February 2012

Recently the Global Agenda Council on Emerging Technologies within the World Economic Forum has made a list of the top 10 emerging technologies for 2012. According to this list the technology with the greatest potential to provide solutions to global challenges is informatics for adding value to information.

As said in the summary: “The quantity of information now available to individuals and organizations is unprecedented in human history, and the rate of information generation continues to grow exponentially. Yet, the sheer volume of information is in danger of creating more noise than value, and as a result limiting its effective use. Innovations in how information is organized, mined and processed hold the key to filtering out the noise and using the growing wealth of global information to address emerging challenges.”

Big data all over

Surely “big data” is the buzzword within data management these days and looking for extreme data quality will be paramount.   

Filtering out the noise and using the growing wealth of global information will help a lot in our endurance to make a better world and to make better business.

In my focus area, being master data management, we also have to filtering out the noise and exploit the growing wealth of information related to what we may call Big Master Data.

Big external reference data

The growth of master data collections is also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to over 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Surely we need informatics for adding the value of big external reference data into our daily master data collections.

Bookmark and Share


Five Moments of Truth in Subscriber Data Management

16th February 2012

The term “Subscriber Data Management” with SDM as the TLA is the industry flavor in the telecommunication sector of the general term “Customer Data Management”.

Recently Teresa Cottam, research director of Telesperience, made a good introduction to the subject in an interview on DataQualityPro.com.

As we have a term as “Customer Master Data Management” we will then also have a term as “Subscriber Master Data Management”.

Based on my experience with phone companies “Subscriber Master Data Management” will be very much about (better) handling the subscriber’s life circle.

These are probably the five most important moments in a subscriber’s life circle(s):

  • A lead is born
  • Engaging a prospect
  • One more subscriber
  • Churn happens
  • Win-Back happiness

A lead is born

One of the most important things to do when capturing the data at this point is ensuring if you already have the person/business behind the subscriber somewhere in the life circle or maybe even in other party roles as examined in the post 360° Business Partner View.

Engaging a prospect

Much of the information prospects are asked about already exist somewhere in the cloud. Why not take advantage of these rich sources as described in Reference Data at Work in the Cloud. By doing that you will have fewer keystrokes and a much better chance of getting it right the first time.  

One more subscriber

After a successful sales process a new subscriber can be added to the subscriber list often with more data being captured as adding a billing address and stating credit risk as credit limit and terms of payment.

This is the point where many party entities are split into data silos. Maybe the current subscriber master data lives on in sales oriented systems while new subscriber data are reentered and enriched in an ERP system and other business applications.

Keeping these data silos aligned is the master data challenge as discussed in the post Boiling Data Silos.

Churn happens

A churn is often seen as the termination of a given subscription. But did the person/business behind the subscription really quit or is the service still covered by other subscriptions by the same person, by the household or within a company family tree?

Isn’t the person among us anymore or did a business dissolve?  

Such questions can be answered better if you are practicing Ongoing Data Maintenance

Win-Back happiness

If a person or business really did quit, but then comes back, then be sure to build on the data from the first engagement and not start from scratch again capturing master data and history. Avoiding this covers up for some of the 55 reasons to improve data quality related to party master data uniqueness.

Bookmark and Share


Wildcard Search versus Fuzzy Search

13th February 2012

My last post about search functionality in Master Data Management (MDM) solutions was called Search and if you are lucky you will find.

In the comments the use of wildcards versus fuzzy search was touched.

The problem with wildcards

I have a company called “Liliendahl Limited” as this is the spelling of the name as it is registered with the Companies House for England and Wales.

But say someone is searching using one of the following strings:

  • “Liliendahl Ltd”,
  • “Liliendal Limited” or
  • “Liljendahl Limited”

Search functionality should in these situations return with the hit “Liliendahl Limited”.

Using wildcard characters could, depending on the specific syntax, produce a hit in all combinations of the spelling with a string like this: “lil?enda*l l*”.

The problem is however that most users don’t have the time, patience and skills to construct these search strings with wildcard characters. And maybe the registered name was something slightly else not meeting the wildcard characters used.  

Matching algorithms

Tools for batch matching of name strings have been around for many years. When doing a batch match you can’t practically use wildcard characters. Instead matching algorithms typically rely of one, or in best case a combination, of these techniques:

The same techniques can be used for interactive search thus reaching a hit in one fast search.

Fuzzy search

I have worked with the Omkron FACT algorithm for batch matching. This algorithm has morphed into being implemented as a fuzzy search algorithm as well.

One area of use for this is when webshop users are searching for a product or service within your online shop. This feature is, along with other eCommerce capabilities, branded as FACT-Finder.

The fuzzy search capabilities are also used in a tool I’m involved with called iDQ. Here external reference data sources, in combination with internal master data sources, are searched in an error tolerant way, thus making data available for the user despite heaps of spelling possibilities.

Bookmark and Share


Search and if you are lucky you will find

9th February 2012

This morning I was following the tweet stream from the ongoing Gartner Master Data Management (MDM) conference here in London, when another tweet caught my eyes:

 

This reminded me about that (error tolerant) search is The Overlooked MDM Feature.

Good search functionality is essential for making the most out of your well managed master data.

Search functionality may be implemented in these main scenarios:

Inside Search

You should be able to quickly find what is inside your master data hub.

The business benefits from having fast error tolerant search as a capacity inside your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

MDM inside search capabilities applies to multiple domains: Party, product and location master data.

Search the outside

You should be able to quickly find what you need to bring inside your master data hub.

Data entry may improve a lot by having fast error tolerant search that explores the cloud for relevant data related to the entry being done. Doing that has two main purposes:

  • Data entry becomes more effective with less cumbersome investigation and fewer keystrokes.
  • Data quality is safeguarded by better real world alignment.

Preferably the inside and the outside search should be the same mash-up.

Searching the outside is applies especially to location and party master data.

Search from the outside

Website search applies especially to product master data and in some cases also to related location master data as described in the post Product Placement.

Your website users should be able to quickly find what you publish from your master data hub be that description of physical products, services or research documents as in the case of Gartner, which is an analyst firm.

As said in the tweet on the top of this post, (good) search makes the life of your coming and current customers much easier. Do I need to emphasize the importance of good customer experience?

Bookmark and Share


The Big ABC of Reference Data

7th February 2012

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organisation. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organisations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

In the party master data realm such reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data. Some of the challenges I have met for each type of data are:

Addresses

The depth (or precision if you like) of an address is a common problem. If the depth of address data is at the level of building numbers on streets (thoroughfares) or blocks, you have issues as described in the blog post called Multi-Occupancy.  

Address reference data of course have issues with the common data quality dimensions as:

  • Timeliness, because for example new addresses will exist in the real world but not yet in a given address directory.
  • Accuracy, as you are always amazed when comparing two official sources which should have the same elements, but haven’t.

Business Entities

Business directories have been accessible for many years and are often used when handling business-to-business (B2B) customer master data and supplier master data management. Some hurdles in doing this are:

  • Uniqueness, as your view of what a given business entity is occasionally don’t match the view in the business directory as discussed in the post 3 out of 10
  • Conformity, because for example an apparently simple exercise as assigning an industry vertical can be a complex matter as mentioned in the post What are they doing?   

Consumers (or Citizens)

In business-to-consumer (B2C) or other activities involving citizens a huge challenge is identifying the individuals living on this planet as pondered in the post Create Table Homo Sapiens. Some troubles are:

  • Consistency isn’t easy, as governments around the world have found 240 (or so) different solutions to balancing privacy concerns and administrative effectiveness.
  • Completeness, as the rules and traditions not only between countries, but also within different industries, certain activities and various channels, are different.

External Reference Data as a Service

Even though I have emphasized on some data quality dimensions for each type of data, all dimensions apply to all types of data.

For organisations operating multinational and/or multichannel exploiting the wealth and diversity of external reference data is a daunting task.  

This is why I see reference data as a service embracing many sources as a good opportunity for getting data quality right the first time. There is more on this subject in the post Reference Data at Work in the Cloud.

Bookmark and Share


Small Business Owners

1st February 2012

A challenge I encounter over and over again within Data Matching and customer Master Data Management is what to do with small business owners.

Examples of small business owners are:

  • Farmers
  • Healthcare professionals with an own clinic
  • Small family driven shop owners
  • Modest membership organisation administrators
  • Local hospitality providers as Basil Fawlty of Fawlty Towers
  • Independent Data Quality consultants as myself

When handling customer master data we often like to divide those into Business-to-consumer (B2C) or Business-to-business (B2B). We may have different source systems, different data models and different data owners and data stewards for each of the two divisions.

But small business owners usually belong to both divisions. In some transactions they act as private persons (B2C) and in some other transactions they act as a business contact (B2B). If you like to know your customer, have a single customer view , engage in social media and all that jazz, you must have a unique view of the person, the business and the household.

In several industries small business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

So here are plenty of business cases for multi-domain Master Data Management embracing customer master data and product master data.

The capability to handle a single customer view of small business owners is in my experience very poorly fulfilled in Data Quality and Master Data Management solutions around. Here is certainly room for improvement and entrepreneurship.

Bookmark and Share


Indulgent Moderator or Ruthless Terminator?

30th January 2012

I am the founder/moderator of two small niche LinkedIn groups in the data quality and Master Data Management (MDM) realm:

As a moderator I feel responsible for keeping the discussions in the group on target.

I guess my challenges in doing so resemble what nearly every other moderator on LinkedIn groups are faced with.

The postings that keep creating trouble are related to:

  • Jobs
  • Promotions

LinkedIn does have a facility to place entries into these two alternative tabs. But people seldom do that voluntary.

Jobs

In fact I’m pleased when a job is posted in one of the groups. But I also know that many people don’t like job postings coming up among the “normal” discussions in the groups.

I’m not so naive that I think recruiters forget to post as a job or don’t know how to do it. Many recruiters don’t respect the rules even if reminded. And some recruiters keep on entering the same job over and over again.

Therefore I have to mark recruiters, who twice “forget”, as subject to indulgent moderation. As said, I like job postings, so until now I haven’t practiced ruthless termination apart from deleting double entries – but that is also a destination of data matching anyway.

Promotions

With the relative small number of members in the groups in question, and recognising that most participants are tool vendors and service providers, I find it refreshing and informative with entries with promotional content, however most pleased when it’s done with limited marketing triviality.     

My indulgence may be explained by that I’m interconnected with tool makers and service providers myself. So these promotions are great ready-made competitor monitoring.

However, my indulgence has its limits when it comes to off topic promotion.

A special case here is outsourcing promotions. I find it peculiar that those people practicing this trade don’t target the message for the group where posted. It shouldn’t be too hard to make an angle with data matching or Multi-Domain MDM for your services. But I find that most out-sourcing people copy-paste their usual stuff.

So, in this area I mostly am the ruthless terminator. And there is seldom any hasta la vista, baby.

Bookmark and Share


Multi-Occupancy

26th January 2012

The fact that many people doesn’t live in a single family house but live in a flat sharing the same building number on a street with people living in other flats in the same building is a common challenge in data quality and data matching.

The same challenge also applies to companies sharing the same building number with other companies and not to say when companies and households are in the same building. So this is a common party master data issue.

Address verification and geocoding is seen as important methods for achieving data quality improvement related to the top data quality pain all over being quality of party master data and aiming at getting a single customer view.

Multi-occupancy is a pain in the (you know) getting there.

My pain

I have had some personal experiences living at multi-occupancy addresses lately.

One and a half years ago I was living a painless life in single family house in a Copenhagen suburb.

Then I moved closer to downtown Copenhagen in a flat as mentioned in post Down the Street.     

The tradition in Denmark is to send letters and make deliveries and register master data with a common format of units within a building and having separate mailboxes with flat ID and names for each flat. I have received most of my post since then and got all deliveries I’m aware of.

Then I moved to London in a flat. Here the flats in my building have numbers. But the postman delivers the letters in one batch in the street door, and there are no names on the doorbells in front of the door.

So now I sense I don’t get many letters and today I had to order the same stuff trice from amazon.co.uk, because I haven’t received the first two packages despite of their state of the art online accessible package tracking systems that tells me that delivery was successful.    

Master data pains unresolved

Address reference data at building number level and related geocodes are becoming commonly available many places around these days.

But having reference data and real world aligned location and related party master data at the unit level is still a challenge most places. Therefore we are still struggling with using address verification and geocoding for single customer view where a given building number has more than a single occupancy.

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 109 other followers