How to Avoid Losing 5 Billion Euros

Two years ago I made a blog post about how 5 billion Euros were lost due to bad identity resolution at European authorities. The post was called Big Time ROI in Identity Resolution.

In the carbon trade scam criminals were able to trick authorities with fraudulent names and addresses.

One way of possible discovery of the fraudster’s pattern of interrelated names and physical and digital locations was, as explained in the post, to have used an “off the shelf” data matching tool in order to achieve what is sometimes called non-obvious relationship awareness. When examining the data I used the Omikron Data Quality Center.

Another and more proactive way would have been upstream prevention by screening identity at data capture.

Identity checking may be a lot of work you don’t want to include in business processes with high volume of master data capture, and not at least screening the identity of companies and individuals on foreign addresses seems a daunting task.

One way to help with overcoming the time used on identity screening covering many countries is using a service that embraces many data sources from many countries at the same time. A core technology in doing so is cloud service brokerage. Here your IT department only has to deal with one interface opposite to having to find, test and maintain hundreds of different cloud services for getting the right data available in business processes.

Right now I’m working with such a solution called instant Data Quality (iDQ).

Really hope there’s more organisations and organizations out there wanting to avoid losing 5 billion Euros, Pounds, Dollars, Rupees, Whatever or even a little bit less.

Bookmark and Share

Big Reference Data as a Service

This morning I read an article called The Rise of Big Data Apps and the Fall of SaaS by Raj De Datta on TechCrunch.

I think the first part of the title is right while the second part is misleading. Software as a Service (SaaS) will be a big part of Big Data Apps (BDA).

The article also includes a description of LinkedIn merely as a social recruitment service. While recruiters, as reported in the post Indulgent Moderator or Ruthless Terminator?, certainly are visible on this social network, LinkedIn is much more than that.

Among other things LinkedIn is a source of what I call big reference data as examined in the post Social MDM and Systems of Engagement.

Besides social network profiles big reference data also includes big directory services, being services with large amount of data about addresses, business entities and citizens/consumers as told in the post The Big ABC of Reference Data.

Right now I’m working with a Software as a Service solution embracing Big (Reference) Data as a Service thus being a Big Data App called instant Data Quality.

And hey, I have made a pin about that:

Bookmark and Share

Data Quality vs Big Data

If you go to Google Insight and ask for how it goes with search interest for “data quality” versus how it is with “big data” you’ll get this graph:

“Data quality” (blue line) is a bear market. The interest is slowly but steadily decreasing. “Big data” (red line) is a bull market with a steep rising curve of interest starting in early 2011 and exploding in 2012.

So, what can you do if your blog is about data quality? For my part I’m writing a blog post on my data quality blog mentioning the term “big data” as many times as possible 🙂

I’m not saying “big data” is uninteresting. Not at all. I even use the term “big reference data” when describing how to exploit big directories and social network profiles in the quest for improving party master data quality.

In the short period of the “big data” hype it has often been said, that why should we start working with “big data” when we can’t manage small data yet?

While this makes some sense, it will in my eyes be a mistake not to try exploring what data quality techniques we can apply to “big data” and what data quality advantages we can harvest within “big data”.

We have known for years that the amount of data being available is drastically increasing. Now we just have a term to be used when searching for and talking about it. Like it or not; that term is “big data”.

Bookmark and Share

255 Reasons for Data Quality Diversity

255 is one source of truth about how many countries we have on this planet. Even with this modest list of reference data there are several sources of the truth. Another list may have 262 entries and a third list 240 entries.

As I have made a blog post some years ago called 55 reasons to improve data quality I think 255 fits nice in the title of this post.

The 55 reasons to improve data quality in the former post revolves around name and address uniqueness. In the quest for having uniqueness, and fulfilling other data quality dimensions as completeness and timeliness, a have often advocated for using deep (or big) reference data sources as address directories, business directories and consumer/citizen directories.

Doing so in the best of breed way involves dealing with a huge number of reference data sources. Services claimed to have worldwide coverage often falls a bit short compared to local services using local reference sources.

For example when I lived in Denmark, at tiny place in one corner of the world, I was often amazed how address correction services from abroad only had (sometimes outdated) street level coverage, while local reference data sources provides building number and even suite level validation.

Another example was discussed in the post The Art in Data Matching where the multi-lingual capacities needed to do well in Belgium was stressed in the comments.

Every country has its own special requirement for getting name and address data quality right, the data quality dimensions for reference data are different and governments has found 255 (or so) different solutions to balancing privacy and administrative effectiveness.

Right now I’m working on internationalization and internationalisation of a data and software service called instant Data Quality. This service makes big reference data from all over the world available in a single mashup. For that we need at least 255 partners.

Bookmark and Share

Social MDM and Systems of Engagement

Social Master Data Management has been an interest of mine the last couple of years and last week I have tried to reach out to others in exploring this new era of Master Data Management by creating a group on LinkedIn called Social MDM.

When reading a nice blog with the slogan ”Welcome to the Real (IT) World!” by Max J. Pucher I came across a good illustration by John Mancini showing the history of IT and how the term “Systems of Record” is being replaced (or at least supplemented) by the term “Systems of Engagement”:

Master Data Management (MDM) includes having a System of Record (SOR) describing the core entities that takes part in the transactional systems of record that supports the daily business in every organization. For example a golden MDM record is describing the party that acts as a customer on an order record while the products in the underlying order lines are described in golden MDM records for the things dealt with within the organization.

Social Master Data Management (Social MDM) will be about supplementing that System of Record so we are able to further describe the parties taking part in the new Systems of Engagement and link with the old Systems of Records. These parties are reflected as social network profiles that are owned by the same human beings who are our (prospective) customers, part of the same household or are a contact for a company being a (prospective) customer or any other business partner.

For a guy like me who started in IT in the mainframe era (just after it had ended according to the above illustration) and went on with mini computers, PC’s and the internet it’s very exciting to be moving on into the social and cloud era.

It will be good to be joined by even more data quality and MDM practitioners and anyone else in the LinkedIn Social MDM group.

Bookmark and Share

At Least Two Versions of the Truth

Precisely one year ago I wrote a post called Single Company View examining the challenges of getting a single business partner view in business-to-business (B2B) party master data.

Yesterday Robert Hawker of Vodafone made a keynote at the MDM Summit Europe 2012 telling about supplier master data management.

One of the points was that sometimes you really want the exactly same real world entity to be two golden records in your master data hub, as there may be totally different business activities made with the same legal entity. The Vodafone example was:

  • Having an antenna placed on the top of a building owned by a certain company and thus paying a fee for that
  • Buying consultancy services from the same company

I have met such examples many times when doing data matching as told in the post Entity Revolution vs Entity Evolution.

However at one occasion, many years ago, I worked in a company where not having a single business partner view nearly became a small disaster.

Our company delivered software for membership administration and was at the same time a member of an employer organisation that also happened to be a customer.

A new director got the brilliant idea, that cancelling the membership of the employer organization was an obvious cost reduction.

The cancellation was sent. The employer organisation confirmed the cancellation adding, that they were very sorry that internal business rules at the same time forced them to not being a customer anymore.

Cancellation was cancelled of course and damage control was initiated.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

MDM Summit Europe 2012 Preview

I am looking forward to be at the Master Data Management Summit Europe 2012 next week in London. The conference runs in parallel with the Data Governance Conference Europe 2012.

Data Governance

As I am living within a short walking distance of the venue I won’t have so much time thinking as Jill Dyché had when she recently was on a conference within driving distance, as reported on her blog post After Gartner MDM in which Jill considers MDM and takes the road less traveled. In London Jill will be delivering a key note called: Data Governance, What Your CEO Needs to know.

On the Data Governance tracks there will be a panel discussion called Data Governance in a Regulatory Environment with some good folks: Nicola Askham, Dylan Jones, Ken O’Connor and Gwen Thomas.

Nicola is currently writing an excellent blog post series on the Six Characteristics Of A Successful Data Governance Practitioner. Dylan is the founder of DataQualityPro. Ken was the star on the OCDQblog radio show today discussing Solvency II and Data Quality.

Gwen, being the founder of The Data Governance Institute, is chairing the Data Governance Conference while Aaron Zornes, the founder of The MDM Institute, is chairing the MDM Summit.

Master Data, Social MDM and Reference Data Management

The MDM Institute lately had an “MDM Alert”  with Master Data Management & Data Governance Strategic Planning Assumptions for 2012-13 with the subtitle: Pervasive & Pandemic MDM is in Your Future.

Some of the predictions are about reference data and Social MDM.

Social master data management has been a favorite subject of mine the last couple of years, and I hope to catch up with fellow MDM practitioners and learning how far this has come outside my circles.

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and initially maintained outside a single enterprise. Examples from the customer master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend as I see it is that enterprises seek to benefit from having reference data in more depth than those often modest populated lists mentioned above. In the customer master data realm such big reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data.

As I’m working with an international launch of a product called instant Data Quality (iDQ™) I look forward to explore how MDM analysts and practitioners are seeing this field developing.

Bookmark and Share

Iceberg, Right Ahead!

Tonight it is 100 years ago Titanic hit an iceberg and sank. So I guess it is rush hour for Titanic related blog posts. I’m going on board as well with some musings on lessons from Titanic to be learned within data management, be that migration projects, master data management implementations and data quality improvement programs.

From A to B

Why did Titanic have to sail through icy waters? There are no icebergs around Southampton, Cherbourg or Cork from where she departed, and no icebergs around New York where she was heading to. Unfortunately there is in the Iceberg Alley of Newfoundland where she passed.

In data management (and enterprise architecture too) we are often focused on the AS-IS and TO-BE states, while the dangers are on the route between these points.

Maturity

1,100 lifeboat seats are good enough for 2,200 people on an unsinkable ship, right? And why waste time and money on training the crew in evacuation. Unfortunately omitting that caused lifeboats available to be only half filled when Titanic was going down.

The maritime industry has improved a lot since then. The data management industry and discipline has a way to go still.

Real time decision making       

When the lookout reported “Iceberg, right ahead!” the officer in charge on Titanic had to make a swift decision. “Hard a’starboard!” unfortunately was the worst option, causing the ships side to be opened below the waterline. The ship would have been better off if it had sailed directly into the iceberg.

Supporting better real time decision making is a great challenge within data management today.

Bookmark and Share

Bat-and-ball Data Quality

Lately Jim Harris of the OCDQblog has written two excellent blog posts, or may I say home runs, discussing data quality with inspiration from baseball.

In the post Quality Starts and Data Quality Jim talks about that you may have a tough loss in business despite stellar data quality and have a cheap win in business despite of horrible data quality, but in the long run by starting off with good data quality, your organization have a better chance to succeed.

The follow up post called Pitching Perfect Data Quality Jim ponders that business success is achievable without perfect data quality, but data quality has a role to play.

Now, despite that baseball is a very popular sport in the United States, but largely unknown in the rest of world, I think we all understand the metaphors.

Also we have different but similar sports, with other rules, statistics and terms attached, over the world. The common name for these sports is bat-and-ball games.

In Britain, where I live now, cricket is huge and can be used to attract awareness of data issues. As late as yesterday the Ordnance Survey, a government body that have registries with addresses, coordinates and maps, made a blog post called Anyone for cricket? British blogger Peter Thomas also wrote among others a post on cricket and data quality called Wager.

Before coming to Britain I lived in Denmark, where we don’t know baseball, don’t know cricket but sometimes at family picnics, perhaps after a Carlsberg and a snaps or two, plays a similar game called rundbold, with kids and grandpa friendly rules and score board and usually using a tennis ball.

Data quality, not at least data quality in relation to party master data, which is the most prominent domain within the discipline, is also a same same but different game around the world as told in the post Partnerships for the Cloud.

Understanding the rules, statistics and terms of baseball, cricket, rundbold and all the other bat-and-ball games of the world is a daunting task, even though we all know how to hit a ball with a bat.

Bookmark and Share