The New Year in Identity Resolution

identity resolutionYou may divide doing identity resolution into these categories:

  • Hard core identity check
  • Light weight real world alignment
  • Digital identity resolution

Hard Core Identity Check

Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.

Services for identity checks vary from country to country because of different regulations and different availability of reference data.

An identity check usually involves the entity who is being checked.

Light Weight Real World Alignment

In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.

Doing such kind of identity resolution usually doesn’t involve the entity being examined.

Digital Identity Resolution

Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.

There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.

Future Trends

With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.

My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.

That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.

Bookmark and Share

Multi-Domain MDM, Santa Style

How would a Multi-Domain Master Data Management (MDM) solution look like at Santa Claus’s organization?

julemandenI think it may look like this:

Santa’s MDM solution covers all 4 classic domains:

  • Party
  • Product
  • Location
  • Calendar

Party

A main business improvement achieved through Santa’s MDM solution is better Nice or Naughty management. The old CRM system didn’t have a dedicated field for Nice or Naughty assignment, so this information was found in many different fields used during the years including as part of a street address or as a “send Christmas card” check mark. Today Santa handles Nice and Naughty information including historical tracking as a kid may be Nice one year but Naughty the next. This also helps with predictive analysis for future present demand. Ho ho ho.

Party master data management at Santa’s also includes keeping track of all the business partners as manufacturers of toys and other stuff, the shopping malls where Santa has to sit in December and so on. A given legal entity may have different roles in different business processes. For example a reindeer insurance company may also require Santa’s presence at the company’s Christmas tree family party.

Product

Product Information Management (PIM) has always been a complex operation at Santa’s. In Wish List Fulfillment (Wishful) you may have kids wishing for the same thing with different wording. The new MDM solutions flexible hierarchy management features helps a lot when the wishes are matched with specifications obtained by the purchase elves. At Santa’s they increasingly work with the suppliers in sharing complete and timely product descriptions and specifications.

Location

Handling location information relates to different locations where Santa is supposed to live be that at the North Pole, in Greenland, in Lapland or any other believes as discussed in the post Notes about the North Pole.

Also related to knowing where to deliver all the presents Santa has realized that maintaining an address as part of the record for each boy and girl isn’t the best way. Today each boy and girl record has a relation with a start and end date to a location entity where location specific information, including precise chimney positions, are kept.

Calendar

Christmas present delivery timing is crucial for Santa. In some countries Christmas morning the 25th December is the right time for the stuff to be there. In other countries Christmas evening the 24th December is the right time. Add to that doing present delivery across all time zones. Ho ho ho.

The MDM implementation at Santa’s has indeed helped a lot with Santa Quality. But it is an ongoing journey.

Right now Santa is looking for a smart Information management firm to help with defining to what time zone the North Pole belongs.

Anyone out there?

Bookmark and Share

Beyond True Positives in Deduplication

The most frequent data quality improvement process done around is deduplication of party master data.

A core functionality of many data quality tools is the capability to find duplicates in large datasets with names, addresses and other party identification data.

When evaluating the result of such a process we usually divide the result of found duplicates into:

  • False positives being automated match results that actually do not reflect  real world duplicates
  • True positives being  automated match results reflecting the same real world entity

The difficulties in reaching the above result aside, you should think the rest is easy. Take the true positives, merge into a golden record and purge the unneeded duplicate records in your database.

Well, I have seen so many well executed deduplication jobs ending just there, because there are a lot of reasons for not making the golden records.

Sure, at lot of duplicates “are bad” and should be eliminated.

But many duplicates “are good” and have actually been put into the databases for a good reason supporting different kind of business processes where one view is needed in one case and another view is needed in another case.

Many, many operational applications, including very popular ERP and CRM systems, do have inferior data models that are not able to reflect the complexity of the real world.

Only a handful of MDM (Master Data Management) solutions are able to do so, but even then the solutions aren’t easy as most enterprises have an IT landscape with all kinds of applications with other business relevant functionality that isn’t replaced by a MDM solution.

What I like to do when working with getting business value from true positives is to build a so called Hierarchical Single Source of Truth.

Bookmark and Share

The Three Big M’s of Data Quality

Most organizations have a lot of data quality issues where there is a wealth of possible solutions to deal with these challenges.

What you usually do is that that you categorize the problems into three different types of best resolutions:

Mañana

You could go ahead with solving the data quality problems today but probably you have better and more important things to do right now.

Your organization may have a global SAP rollout going on or other resource demanding implementations. Therefore it is most wise to deal with the data quality issues when everything is running smoothly.

Mission impossible

Maybe a resolution has been tried before and didn’t work. Chances that alternate people management, different orchestration of processes and development in available technology will change that are very slim.

May the force be with you

Many problems solve themselves over time or hopefully don’t get noticed by anyone. If things get ugly you always have your lightsaber.

Bookmark and Share

Killing Keystrokes

Keystrokes are evil. Every keystroke represents a potential root cause of poor data quality by spelling things wrongly, putting the right thing in the wrong place, putting the wrong thing in the right place and so on. Besides that every keystroke is a cost of work summing up with all the other keystrokes to gigantic amounts of work costs.

In master data management (MDM) you will be able to getting things right, and reduce working costs, by killing keystrokes wherever possible.

Killing keystrokes in Product Information Management (PIM)

I have seen my share of current business processes where product master data are reentered or copied and pasted from different sources extracted from one product master data container and, often via spreadsheets, captured into another product master data container.

This happens inside organizations and it happens in the ecosystem of business partners in supply chains encompassing manufactures, distributors and retailers.

As touched in the post Social PIM there might be light at the end of the tunnel by the rise of tools, services and platforms setting up collaboration possibilities for sharing product master data and thus avoiding those evil keystrokes.

Killing keystrokes in Party Master Data Management

With party master data there are good possibilities of exploiting external data from big reference data sources and thus avoiding the evil keystrokes. The post instant Data Quality at Work tells about how a large utility company have gained better data quality, and reduced working costs, by using the iDQ™ service in that way within customer on-boarding and other business processes related to customer master data maintenance.

The next big thing in this area will be the customer data integration (CDI) part of what I call Social MDM, where you may avoid the evil keystrokes by utilizing the keystrokes already made in social networks by who the master data is about.

Bookmark and Share

Return on Investment in Big Reference Data

Currently I’m working with a cloud based service where we are exploiting available data about addresses, business entities and consumers/citizens from all over the world.

The cost of such data varies a lot around the world.

In Denmark, where the product is born, the costs of such data are relatively low. The joys of the welfare state also apply to access to open public sector data as reported in the post The Value of Free Address Data. Also you are able to check the identity of an individual in the citizen hub. Doing it online on a green screen you will be charged (what resembles) 50 cent, but doing it with cloud service brokerage, like in iDQ™, it will only cost you 5 cent.

In the United Kingdom the prices for public sector data about addresses, business entities and citizens are still relatively high. The Royal Mail has a license tag on the PAF file even for government bodies. Ordnance Survey is given the rest of AddressBase free for the public sector, but there is a big tag for the rest of the society. The electoral roll has a price tag too even if the data quality isn’t considered for other uses than the intended immediate purpose of use as told in the post Inaccurately Accurate.

At the moment I’m looking into similar services for the United States and a lot of other countries. Generally speaking you can get your hands on most data for a price, and the prices have come down since I checked the last time. Also there is a tendency of lowering or abandoning the price for the most basic data as names and addresses and other identification data.

As poor data quality in contact data is a big cost for most enterprises around the world, the news of decreasing prices for big reference data is good news.

However, if you are doing business internationally it is a daunting task to keep up with where to find the best and most cost effective big reference data sources for contact data and not at least how to use the sources in business processes.

Wednesday the 25th July I’m giving a presentation, in the cloud, on how iDQ™ comes to the rescue. More information on DataQualityPro.

Bookmark and Share

Beyond Address Validation

The quality of contact master data is the number one data quality issue around.

Lately there has been a lot of momentum among data quality tool providers in offering services for getting at least the postal address in contact data right. The new services are improved by:

  • Being cloud based offering validation services that are implemented at data entry and based on fresh reference data.
  • Being international and thus providing address validation for customer and other party data embracing a globalized world.

Capturing an address that is aligned with the real world may have a significant effect on business outcomes as reported by the tool vendor WorldAddresses in a recent blog post.

However, a valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address, and you are not sure if the name and other master data elements are accurate and complete. Therefore you often need to combine address reference data with other big reference data sources as business directories and consumer/citizen reference sources.

Using business directories is not new at all. Big reference sources as the D&B WorldBase and many other directories have been around for many years and been a core element in many data quality initiatives with customer data in business-to-business (B2B) environments and with supplier master data.

Combining address reference data and business entity reference data makes things even better, also because business directories doesn’t always come with a valid address.

Using public available reference data when registering private consumers, employees and other citizen roles has until now been practiced in some industries and for special reasons. Therefore the big reference data and the services are out there and being used today in some business processes.

Mashing up address reference data, business entity reference data and consumer/citizen reference data is a big opportunity for many organizations in the quest for high quality contact master data, as most organizations actually interact with both companies and private persons if we look at the total mix of business processes.

The next big source is going to be exploiting social network profiles as well. As told in the post Social Master Data Management social media will be an additional source of knowledge about our business partners. Again, you won’t find the full truth here either. You have to mashup all the sources.

Bookmark and Share

Mashing Up Big Reference Data and Internal Master Data

Right now I’m working on a cloud service called instant Data Quality (iDQ™).

It is basically a very advanced search engine capable of being integrated into business processes in order to get data quality right the first time and at the same time reducing the time needed for looking up and entering contact data.

With iDQ™ you are able to look up what is known about a given address, company and individual person in external sources (I call these big reference data) and what is already known in internal master data.

From a data quality point of view this mashup helps with solving some of the core data quality issues almost every organization has to deal with, being:

  • Avoiding duplicates
  • Getting data as complete as possible
  • Ensuring maximal accuracy

The mashup is also a very good foundation for taking real-time decisions about master data survivorship.

The iDQ™ service helps with getting data quality right the first time. However, you also need Ongoing Data Maintenance in order to keep data at a high quality. Therefore iDQ™ is build for trigging into subscription services for external reference data.

At iDQ we are looking for partners world-wide who see the benefit of having such a cloud based master data service connected to providing business-to-business (B2B) and/or business-to-consumer (B2C) data services, data quality services and master data management solutions.

Here’s the contact data: http://instantdq.com/contact/

Bookmark and Share

Big Data and Multi-Domain Master Data Management

The possible connection between the hot buzz within IT today being “big data” and the good old topic of master data management has been discussed a lot lately. An example from CIO UK today is this article called Big data without master data management is a problem.

As said in the article there is a connection through big master data (and big reference data) to big transaction data. Big transaction data is what we usually would call big data, because these are the really big ones.

The two most mentioned kind of big transaction data are:

  • Social data and
  • Sensor data

I also have seen a lot of connections between these big data and master data in multiple domains.

Social Data

Connecting social data to Master Data Management (MDM) is an ongoing discussion I have been involved in for the last three years lately through the new LinkedIn group called Social MDM.

The customer master data domain is in focus here, as the immediate connection here is how to relate traditional systems of record holding customer master data and the systems of engagement where the big social data are waiting to be analyzed and eventually be a part of day-to-day customer centric business processes.

However being able to analyze, monitor and take action on what is being said about specific products in social data is another option and eventually that has to be linked to product master data. In product master data management the focus has traditionally been on your own (resell) products. Effectively listening to social data will mean that you also have to manage data about competing products.

Attaching location to social data has been around for long. Connecting social data to your master data will also require that your location master data are well aligned with the real world.

Sensor Data   

During the past many years I have been involved in data management within public transportation where we have big data coming in from sensors of different kind.

The big problem has for sure being able to connect these transactions correctly to master data. The challenges here are described in the post Multi-Entity Master Data Quality.

The biggest problem is that all the different equipment generating the sensor data in practice can’t be at the same stage at the same time and this will eventually create data that if related without care will show very wrong information about who was the passenger(s), what kind of trip it were, where the journey happened and under which timetable.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share