You probably won’t find the truth (and salsa) inside your firewall

In a Data Roundtable blog post published today and called Big Data in Your Kitchen Phil Simon says:

“CXOs who believe that “data” is simply the content in their own internal databases are increasing being seen as anachronistic. More progressive leaders understand that data is everywhere, including–and especially–external to the enterprise.”

Bringing in external data was also touched recently by Kim Loughead of Informatica in the post Bring The Outside In: Why Integrating External Data Sources Should Be Your Next Data integration Project.

Herein Kim emphasizes that: “Innovation is driven by data and that data largely resides outside your firewall”.

SalsaMy humble work in bringing in the outside revolves around a service called instant Data Quality (iDQ™). This service is about exploiting the increasing choice if external directories holding valuable information about the individuals, companies, addresses and properties we have so much trouble with reflecting in our party master data hubs.

What about you? Are you anachronistic or do you bring in the outside? Or as it will sound in Phil’s Big Data Kitchen: Will you miss salsa tonight?

Bookmark and Share

Tomorrow’s Data Quality Tool

In a blog post called JUDGEMENT DAY FOR DATA QUALITY published yesterday Forrester analyst Michele Goetz writes about the future of data quality tools.

Michele says:

“Data quality tools need to expand and support data management beyond the data warehouse, ETL, and point of capture cleansing.”

and continues:

“The real test will be how data quality tools can do what they do best regardless of the data management landscape.”

As described in the post Data Quality Tools Revealed there are two things data quality tools do better than other tools:

  • Data profiling and
  • Data matching

Some of these new challenges I have worked with within designing tomorrow’s data quality tools are:

  • open-doorPoint of capture profiling
  • Searching using data matching techniques
  • Embracing social networks

Point of capture profiling:

The sweet thing about profiling your data while you are entering your data is that analysis and cleansing becomes part of the on-boarding business process. The emphasis moves from correction to assistance as explained in the post Avoiding Contact Data Entry Flaws. Exploiting big external reference data sources within point of capture is a core element in getting it right before judgment day.

Searching using data matching techniques:

Error tolerant searching is often the forgotten capability when core features of Master Data Management solutions and data quality tools are outlined. Applying error tolerant search to big reference data sources is, as examined in the post The Big Search Opportunity, a necessity to getting it right before judgment day.

Embracing social networks:

The growth of social networks during the recent years has been almost unbelievable. Traditionally data matching has been about comparing names and addresses. As told in the post Addressing Digital Identity it will be a must to be able to link the new systems of engagement with the old systems of record in order to getting it right before judgment day.

How have you prepared for judgment day?

Bookmark and Share

The Real Estate Domain

In the comments on the recent blog post about multidomain MDM (Master Data Management) it was discussed in what degree multidomain MDM is much more than CDI (Customer Data Integration) and PIM (Product Information Management).

While customer (or rather party) and product are important master entity types, there are of course a lot of other master entity types. The location domain is often mentioned as the third domain in MDM, and then there are some entity types most relevant for specific industries like an insurance policy or a vehicle in public transit, and in public transit we also have the calendar as an important master entity type.

Real estateOne of the entity types that doesn’t belong to party and in many ways is a different thing than a product is real estate (or real property or just property if you like).

For a realtor a real estate looks like a product of course. And it’s all about location, location, location.

Right now I’m working with the instant Data Quality framework. Here we are embracing the party domain by having access to external reference sources about individuals and companies, we are embracing the location domain by having access to external reference sources about addresses and then we are also embracing the real estate domain by having access to external reference sources about properties.

Real properties have addresses in many cases and are therefore close to the location domain. For some business processes it is a product with a product key like mentioned for realtors. For some business processes it is a security often identified by other keys than the postal address. It is related to different party roles like an occupier (or several) and an owner (or several) that may or may not be the same party (or parties).

What about you. Do you feel at home with the real estate entity type?

Bookmark and Share

While we are waiting for the LEI

As told in the post Business Entity Identifiers there has been a new global numbering system for business entities on the way for some time. The wonder is called LEI (Legal Entity Identifier).

fsb-leiThe implementation work has been adapted by the Financial Stability Board. The latest developments are reported in a publication called Fifth progress note on the Global LEI Initiative.

Surely, while the implementations may be in good hands, the set up doesn’t give hope for a speedy process where every legal entity in the world in a short time will have a LEI.

And then the next question will be how long it will take before organizations will have enriched existing databases with that LEI and implemented on-boarding processes where a LEI is captured with every new insertion of party master data describing a legal entity.

A good way to start to be prepared will be to implement features in on-boarding business processes where available external reference data are captured when new party entities are added to your databases. Having best available information about names, addresses and business entity identifiers available today and a culture of capturing such information will be a great starting point.

And oh, the instant Data Quality concept is precisely all about doing that.

Bookmark and Share

Making Data Quality Gangnam Style

The 21st December 2012 wasn’t the end of the world. But it was the day a music video for the first time passed one billion views on YouTube. It has been said that a reason for this success for Gangnam Style was that the Korean pop singer PSY hasn’t pursued any copyrights related to the video. But that doesn’t mean that PSY doesn’t earn money from the video. On the contrary related commercials are making money Gangnam Style.

A hindrance for better data quality by better real world alignment has traditionally been lack of free and open reference data.  Some issues has been availability and heavy price tags on government collected data.

In my current daily work I mostly use such data within the United Kingdom and Denmark. And here the authorities are taking different paths.

The prices on UK public reference data has traditionally been fairly high and there’s certainly room for innovation around open government data as reported on DataQualityPro in the post Introduction to the Open Data User Group UK.

In Denmark the 21st December 2012 was the day it was published that a unanimous parliament had agreed on the laws behind having Free and Open Public Sector Master Data. From the 1st January 2013 there are no price tags on reference data about addresses, properties, companies (and citizens) and there are plans for making those data even more available, consistent and timely.

Great news for data quality, Gangnam Style.

Data Quality Gangnam Style

Bookmark and Share

The New Year in Identity Resolution

identity resolutionYou may divide doing identity resolution into these categories:

  • Hard core identity check
  • Light weight real world alignment
  • Digital identity resolution

Hard Core Identity Check

Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.

Services for identity checks vary from country to country because of different regulations and different availability of reference data.

An identity check usually involves the entity who is being checked.

Light Weight Real World Alignment

In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.

Doing such kind of identity resolution usually doesn’t involve the entity being examined.

Digital Identity Resolution

Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.

There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.

Future Trends

With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.

My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.

That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.

Bookmark and Share

Doing Census versus doing Master Data Management

“In those days Caesar Augustus issued a decree that a census should be taken of the entire Roman world. This was the first census that took place while Quirinius was governor of Syria. And everyone went to their own town to register.”

These are the famous words from the Gospel According to Luke that you, if you belong to the part of the world where Christianity is practiced, hear every Christmas.

Today scholars don’t think that there actually was a census for the whole Roman Empire but there are evidences that a local census in Syria and Judea took place around year 1. This was in order to collect taxes in those provinces. As you know: The taxman is data quality’s best friend.

Today doing census is still the most practiced method of knowing about the people living in a given country. The alternative is a public registry that is constantly updated with all the information needed about you. I had the chance to describe such a method in the post on a Canadian blog some years ago. The post is called How Denmark does it.

India has a similar scheme with a centralized citizen registry on the go. This program is called Aadhaar.

As reported in the post Citizen ID and Biometrics the United Kingdom was close to adapting doing citizen Master Data Management some years ago. But it didn’t happen, so it’s still possible to have multiple names and multiple addresses at the same time in different registries while Cameron is Prime Minister of the United Kingdom, First Lord of the Treasury and Minister for the Civil Service.

Merry Christmas.

going to census

Bookmark and Share

Some Kinds of Reference Data

The term ”reference data” and related Reference Data Management (RDM) is used commonly in the data quality and Master Data Management (MDM) realm.

As with most terms it may be used with slightly different meanings. Usually, but not necessarily always, reference data are core data entities defined outside a given organization.

I have come across the below discussed kinds of reference data:

Reference Data in Investment Banking

The term “reference data” is well established in investment banking. Reference data are core master data entities as counterparties, securities and currencies. These are the things you deal with in investment banking. They are not made up for a given bank or other single financial institution but are shared across the whole market and should optimally be the same to every institution at exactly the same point of time.

RDMSmall Reference Data

In Master Data Management in general we usually see reference data as value lists helping describing and standardizing internal master data.

One example will be a country list. A list of countries should be the same for every organization in the world. However available lists does differ though most variations usually don’t have any business impact as the academic question about if Antarctica should be in the list or not.

A list of codes describing to which industry a given company belongs is another example of reference data. As examined in the post What are they doing? you may choose to standardize on SIC codes or standardise on NACE codes or develop your own set of codes for that purpose.

Big Reference Data

In geography a country list is in the top levels of defining locations. Further deep we may have postal code systems within each country as ZIP codes in the United States, PLZ codes in Germany and PIN codes in India. Yet further deep we have every single valid postal address eventually all over the world. This is what I call big reference data.

A way of sourcing industry codes for your customers, suppliers and other business partners will be picking from or enriching from a business directory like for example the D&B WorldBase or any other of the many business directories around. Such directories may also be seen as big reference data.

The dramatic increase in the use of social media and related social network profiles has emerged as a new kind of big reference data serving as links to our internal master data.

Bookmark and Share

My Name is Bond. Jimmy Bond.

Right now the 23rd James Bond film called Skyfall is out in cinemas. And oh yes, he does say that his name is Bond. James Bond.

There were actually some films before the current row of James Bond films based on Ian Fleming’s character. The first one was Casino Royale from 1954. This was a pure American production and herein James Bond was an American agent mostly referred to as Jimmy Bond.

There are plenty of examples around on how films and TV series are adopted for a foreign audience by changing the characters to have local names and habits.

When preparing software, including data quality tools and master data management solutions, you have the same balancing to do. Should you emphasis on the strength of the product based on a particular advantage within the country where the product is born or do you have to rewrite some features and unique selling points to make it understandable and feasible in another part of the world?

This challenge is close to me as I’m working with internationalization of the iDQ service. This service is born in a Scandinavian context where there is good availability around public sector master data indentifying and describing addresses, companies and individuals which helps with getting high quality contact master data.

But this may not resonate as well in a British context where ability to do rapid addressing and support vanity addressing may be the current hot stuff or in an American context where external reference data are much more privatized.

Technically the services will be pretty much the same, but it has to be twisted a bit and so do the story telling around the service.

Bookmark and Share

Rapid and Vanity Addressing – and the Apple Hotel

Mid next month iDQ will move our London office to a new address:

iDQ A/S
2nd Floor
Berkeley Square House
Berkeley Square
London
W1J 6BD
United Kingdom

It’s a good old English address including a lot of lines on an envelope.

The address could be either shorter or longer.

The address below will in fact be enough to have a letter delivered:

iDQ A/S
2nd Floor
W1J 6BD
UK

Due to the granular UK postal code system a single post code may either be a single address a part of a long road or a small street.

This structure is also what is exploited in what is called rapid addressing, where you only type in the need data and the rest is supplied by a (typically cloud) service.

But sometimes people want their addresses presented in a different way than the official way. Maybe I want our address to be:

iDQ A/S
2nd Floor
Berkeley Square House
Berkeley Square
Mayfair
London
W1J 6BD
United Kingdom

Mayfair is a nice part of London. Insisting in including this element in the address is an example of vanity addressing.

Here’s the map of the area:

Notice the place in the upper right corner of the Google Map: Apple Store Regent Street. With an icon with a bed. This means it’s a hotel. Is the Apple Store really a hotel? No – except for some while ago when people slept in front of the store waiting for a product with a notable map service as reported by Richard Northwood (aka The Data Geek) in the post Data Quality Failure – Apple Style.

Well Google, you can’t win them all.

Bookmark and Share