How to Avoid True Positives in Data Matching

Now, this blog post title might sound silly, as we generally consider true positives to be the cream of data matching as it means that we have found a match between two data records that reflects the same real world entity and it has been confirmed, that this is true and based on that we can eliminate a harmful and costly duplicate in our records.

Why this isn’t still an optimal situation is that the duplicate shouldn’t have entered our data store in the first place. Avoiding duplicates up front is by far the best option.

So, how do you do that?

You may aim for low latency duplicate prevention by catching the duplicates in (near) real-time by having duplicate checks after records have been captured but before they are committed in whatever is the data store for the entities in question. But still, this is actually also about finding true positives and at the same time to be aware of false positives.

Killing Keystrokes
Killing Keystrokes

The best way is to aim for instant data quality. That is, instead of entering data for the (supposed) new records, you are able to pick the data from data stores already available presumably in the cloud through an error tolerant search that covers external data as well as data records already in the internal data store.

This is exactly such a solution I’m working with right now. And oh yes, it is exactly called instant Data Quality.

Bookmark and Share

Social Data Quality

A cornerstone in the social sphere around data quality is the site DataQualityPro founded by Dylan Jones.

This week the site had a major facelift. As Dylan explains:

“We’ve moved over to one of the most advanced content hosting sites available to make it easier for you to discover, share and engage with the huge amounts of educational content and resources we now have on the site.”

You may read more about the changes in the post Welcome to the New Look Data Quality Pro.

I remember joining DataQualityPro even before it was a site, as it started as a section of the sister site called DataMigrationPro.

During the years I have learned a lot by being a member of DataQualityPro and as most things social you don’t pay anything for being that. The only difference compared to other services is that there are no paid upgrades. You get the full package when joining.

There are sponsors too of course.

Also here I, as representing the data quality service provider iDQ, have very good experiences with DataQualityPro. Last summer we had a technology briefing on the site with a massive response.

So, if you haven’t seen the new design or you are not a member (or a sponsor) yet, hurry on and visit

DataQualityPro

Bookmark and Share

What’s so special about your party master data?

My last blog post was called Is Managing Master Data a Differentiating Capability? The post is an introduction to a conference session being a case story about managing master data at Philips.

During my years working with data quality and master data management it has always struck me how different organizations are managing the party master data domain while in fact the issues are almost the same everywhere.

business partnersFirst of all party master data are describing real world entities being the same to everyone. Everyone is gathering data about the same individuals and the same companies being on the same addresses and having the same digital identities. The real world also comes in hierarchies as households, company families and contacts belonging to companies which are the same to everyone. We may call that the external hierarchy.

Based on that everyone has some kind of demand for intended duplicates as a given individual or company may have several accounts for specific purposes and roles. We may call that the internal hierarchy.

A party master data solution will optimally reflect the internal hierarchy while most of the business processes around are supported by CRM-systems, ERP-systems and special solutions for each industry.

Fulfilling reflecting the external hierarchy will be the same to everyone and there is no need for anyone to reinvent the wheel here. There are already plenty of data models, data services and data sources out there.

Right now I’m working on a service called instant Data Quality that is capable of embracing and mashing up external reference data sources for addresses, properties, companies and individuals from all over the world.

The iDQ™ service already fits in at several places as told in the post instant Data Quality and Business Value. I bet it fits your party master data too.

Bookmark and Share

The Real Estate Domain

In the comments on the recent blog post about multidomain MDM (Master Data Management) it was discussed in what degree multidomain MDM is much more than CDI (Customer Data Integration) and PIM (Product Information Management).

While customer (or rather party) and product are important master entity types, there are of course a lot of other master entity types. The location domain is often mentioned as the third domain in MDM, and then there are some entity types most relevant for specific industries like an insurance policy or a vehicle in public transit, and in public transit we also have the calendar as an important master entity type.

Real estateOne of the entity types that doesn’t belong to party and in many ways is a different thing than a product is real estate (or real property or just property if you like).

For a realtor a real estate looks like a product of course. And it’s all about location, location, location.

Right now I’m working with the instant Data Quality framework. Here we are embracing the party domain by having access to external reference sources about individuals and companies, we are embracing the location domain by having access to external reference sources about addresses and then we are also embracing the real estate domain by having access to external reference sources about properties.

Real properties have addresses in many cases and are therefore close to the location domain. For some business processes it is a product with a product key like mentioned for realtors. For some business processes it is a security often identified by other keys than the postal address. It is related to different party roles like an occupier (or several) and an owner (or several) that may or may not be the same party (or parties).

What about you. Do you feel at home with the real estate entity type?

Bookmark and Share

While we are waiting for the LEI

As told in the post Business Entity Identifiers there has been a new global numbering system for business entities on the way for some time. The wonder is called LEI (Legal Entity Identifier).

fsb-leiThe implementation work has been adapted by the Financial Stability Board. The latest developments are reported in a publication called Fifth progress note on the Global LEI Initiative.

Surely, while the implementations may be in good hands, the set up doesn’t give hope for a speedy process where every legal entity in the world in a short time will have a LEI.

And then the next question will be how long it will take before organizations will have enriched existing databases with that LEI and implemented on-boarding processes where a LEI is captured with every new insertion of party master data describing a legal entity.

A good way to start to be prepared will be to implement features in on-boarding business processes where available external reference data are captured when new party entities are added to your databases. Having best available information about names, addresses and business entity identifiers available today and a culture of capturing such information will be a great starting point.

And oh, the instant Data Quality concept is precisely all about doing that.

Bookmark and Share

The New Year in Identity Resolution

identity resolutionYou may divide doing identity resolution into these categories:

  • Hard core identity check
  • Light weight real world alignment
  • Digital identity resolution

Hard Core Identity Check

Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.

Services for identity checks vary from country to country because of different regulations and different availability of reference data.

An identity check usually involves the entity who is being checked.

Light Weight Real World Alignment

In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.

Doing such kind of identity resolution usually doesn’t involve the entity being examined.

Digital Identity Resolution

Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.

There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.

Future Trends

With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.

My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.

That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.

Bookmark and Share

Rising Adoption of MDM in the Cloud

When I back in December 2011 had a look into 2012 and what I was going to do, the topics were very well aligned with what Gartner (the analyst firm) have predicted for MDM, being:

What for me turned out to go faster than I thought was the thing about rising adoption of MDM in the Cloud.

I remember from back when CRM in the Cloud started to grow, not at least driven by the success of Salesforce.com, many voices predicted a slow adoption as most people couldn’t believe that companies would put one of their best secrets, the customer database, up in the cloud where everyone may be able to have a look.

iDQ logoRight now I’m working with implementing my first cloud MDM solution. This solution is based on the instant Data Quality service, which now consequently has an MDM edition. We didn’t expected to be this far already, but here we are.

Bookmark and Share

MDM meets MDM

The three letter acronym MDM may mean several different things. It can for example be:

This morning I read an article on metering.com telling about that Meter Data Management varies across Europe. The article was about the different approaches that are taken in various European countries when it comes to access to and storing meter readings in the energy sector.

I noticed that these different approaches very much resembles how public sector basic data about addresses, companies and citizens are made accessible and stored in the same countries. Scandinavia has a centralized approach, some other countries have some hybrid solutions and Germany has a strictly decentralized approach.

As told in the post instant Data Quality at Work I am right now working with Master Data Management in the energy sector. These projects certainly have some bordering zones to Meter Data Management.

So it’s good to see that reading the article about MDM (in this case Meter Data Management) makes just as much sense if you thought it was about MDM (being Master Data Management):

“The most important factors for supporting and choosing a particular model are cost efficiency, transparency, data security and efficient processes. The rationale for centralized MDM also seems to be strengthened .. because of the increased amount of information exchanged.

There is also a clear understanding .. that the chosen MDM model needs clear rules regarding data access, privacy and security, while enabling proportionate access to data by authorized parties to ensure that benefits can be delivered.

As such many regulatory changes are occurring .. that .. considers that efficient and secure information and data access for relevant stakeholders is fundamental for a proper .. market functioning and customer protection and empowerment .. and indeed different countries might require different MDM models on the basis of their market design specificities.”

So, MDM is just like MDM.

Bookmark and Share

Social MDM and Complex Sales

Social Master Data Management (Social MDM) is about linking the increasing trend of doing business via social media, using what we may call “systems of engagement”, with the traditional way of supporting business using what we call “systems of record”.

Doing social MDM is a natural consequence of adapting social CRM (Social Customer Relation Management). Many CRM solutions are supporting Business-to Business (B2B) activities helping with keeping track of what’s going on with a lot of contacts related to a business account within so called complex sales processes.

Traditional MDM in B2B environments has been much about a single view of the business account and the legal entity behind. As social CRM is much about the relations to the business contacts, the people side of business, we need a solid master data foundation behind the people being those contacts.

The same individual may in fact be an important influencer related to a range of business accounts being the legal entity with who you are aiming for a sales contract. You need a single view of that. So many sales contracts are based on a relation to a buyer moving from one business account to another. You need to be the winner in that game and the answer to that may very well be your ability to do better social MDM.

Social MDM adds a new external source of reference data to MDM solutions for B2B customer master data management. This new source is professional social network profiles where LinkedIn is the most known and used service around.

It is early days for social MDM solutions so it is quite exciting for me to work with designing the first kind of such solutions around the MDM edition of the instant Data Quality service.

Stay tuned for more news in this field on this blog in the times to come.

Bookmark and Share

My Name is Bond. Jimmy Bond.

Right now the 23rd James Bond film called Skyfall is out in cinemas. And oh yes, he does say that his name is Bond. James Bond.

There were actually some films before the current row of James Bond films based on Ian Fleming’s character. The first one was Casino Royale from 1954. This was a pure American production and herein James Bond was an American agent mostly referred to as Jimmy Bond.

There are plenty of examples around on how films and TV series are adopted for a foreign audience by changing the characters to have local names and habits.

When preparing software, including data quality tools and master data management solutions, you have the same balancing to do. Should you emphasis on the strength of the product based on a particular advantage within the country where the product is born or do you have to rewrite some features and unique selling points to make it understandable and feasible in another part of the world?

This challenge is close to me as I’m working with internationalization of the iDQ service. This service is born in a Scandinavian context where there is good availability around public sector master data indentifying and describing addresses, companies and individuals which helps with getting high quality contact master data.

But this may not resonate as well in a British context where ability to do rapid addressing and support vanity addressing may be the current hot stuff or in an American context where external reference data are much more privatized.

Technically the services will be pretty much the same, but it has to be twisted a bit and so do the story telling around the service.

Bookmark and Share