The Taxman: Data Quality’s Best Friend

Collection of taxes has always been a main driver for having registries and means of identifying people, companies and properties.

5,000 years ago the Egyptians made the first known census in order to effectively collect taxes.

As reported on the Data Value Talk blog, the Netherlands have had 200 years of family names thanks to Napoleon and the higher cause of collecting taxes.

Today the taxman goes cross boarder and wants to help with international data quality as examined in the post Know Your Foreign Customer. The US FATCA regulation is about collecting taxes from activities abroad and as said on the Trillium blog: Data Quality is The Core Enabler for FATCA Compliance.

My guess is that this is only the beginning of a tax based opportunity for having better data quality in relation to international data.

In a tax agenda for the European Union it is said: “As more citizens and companies today work and operate across the EU’s borders, cooperation on taxation has become increasingly important.”.

The EU has a program called FISCALIS in the making. Soon we not only have to identify Americans doing something abroad but practically everyone taking part in the globalization.

For that we all need comprehensive accessibility to the wealth of global reference data through “cutting-edge IT systems” (a FISCALIS choice of wording).

I am working on that right now:

Bookmark and Share

Happy Easter

If you are in a country with Western Christian roots this weekend is Easter weekend. Countries with Eastern Christian roots have it the next weekend.

Many countries (and states or provinces within) have holidays around Easter. For many Easter Monday is a day off. Some had Good Friday as a none working day and a few countries even had Maundy Thursday as a none productive day for most people.

The passed over Maundy Thursday was the day of The Last Supper. The famous Last Supper painting by Leonardo da Vinci has in my eyes, as told in a post from last year, something in common with Data Quality Evangelism.

Happy Easter.

Bookmark and Share

Updating a Social Business Directory

Business directories have been around for ages. In the old days it was paper based as in the yellow pages for a phone book. The yellow pages have since made it to be online searchable. We also know commercial business directories as the Dun & Bradstreet WorldBase as well as government operated national wide directories of companies and industry specific business directories.

Such business directories often takes a crucial role in master data quality work as sources for data enrichment in the quest for getting as close as possible to a single version of the truth when dealing with B2B customer master data, supplier master data and other business partner master data.

A classic core data model for Master Data in CRM systems, SCM solutions and Master Data hubs when doing B2B is that you have:

  • Accounts being the BUSINESS entities who are your customers, suppliers, prospects and all kind of other business partners
  • Contacts being the EMPLOYEEs working there and acting in the roles as decision makers, influencers, gate keepers, users and so on

Today we also have to think about social master data management, being exploiting reference data in social media as a supplementary source of external data.

As all social activity this exercise goes two ways:

  • Finding and monitoring your existing and wanted business partners in the social networks
  • Updating your own data

Most business entities in this world are actually one-man-bands. So are mine. Therefore I went to the LinkedIn company pages this morning and updated data about my company Liliendahl Limited: Unlimited Data Quality and Master Data Management consultancy for tool and service vendors.

Bookmark and Share

The Big Search Opportunity

The other day Bloomberg Businessweek had an article telling that Facebook Delves Deeper Into Search.

I have always been advocating for having better search functionality in order to get more business value from your data. That certainly also applies to big data.

In a recent post called Big Reference Data Musings here on the blog, the challenge of utilizing large external data sources for getting better master data quality was discussed. In a comment Greg Leman pointed out, that there often isn’t a single source of the truth, as you for example could expect from say a huge reference data source as the Dun & Bradstreet WorldBase holding information about business entities from all over the world.

Indeed our search capabilities optimally must span several sources. In the business directory search realm you may include several sources at a time like supplementing the D&B  WorldBase with for example EuroContactPool, if you do business in Europe, or the source called Wiki-Data (under rename to AvoxData) if you are in financial services and wants to utilize the new Legal Entity Identifier (LEI) for counterparty uniqueness in conjunction with other more complete sources.

As examined in Search and if you are lucky you will find combining search on external reference data sources and internal master data sources is a big opportunity too. In doing that you, as described the follow up piece named Wildcard Search versus Fuzzy Search, must get the search technology right.

I see in the Bloomberg article that Facebook don’t intend to completely reinvent the wheel for searching big data, as they have hired a Google veteran, the Danish computer scientist Lars Rasmussen, for the job.

Bookmark and Share

Credit Ratings Turned Upside Down

In a recent reform the usual way of expressing credit ratings by assigning AAA as the top rating, AA+ as the next best and so on has been changed.

If we look at sovereign credit ratings being those ratings assigned to countries, the world picture looks somewhat different than before.

The new top rating is LMAO followed by LOL+ and so on. As of 1st April 2012 only three countries have the top rating. These countries are Zimbabwe, Greece and Wales.

The improved Zimbabwean rating is due to a simplification (Keep It Simple, Stupid) in the way of handling currencies. Now the Zimbabwean dollar equals the US dollar. Much easier, indeed.

Until now Greece has been a bit of a scapegoat for the Eurozone problems. With a new way of measuring things that has certainly changed. Already tomorrow German chancellor Merkel must go to Athens and present a plan telling how to pay back the balance.

Wales have until now been rated as part of the United Kingdom. But as a credit bureau spokesman says: “If you have a national soccer team and a national rugby team you should definitely also have your own sovereign credit rating”. As a main reason for the Welsh economic strength most analysts point to the new Welsh shadow currency called Nidwyfynrhoicachufelltithamarianfijystyneudefnyddiowrthfiwedieu – or just short Nidwyfynrhoicachufelltithamarianbasta.

Bookmark and Share

Eating the MDM Elephant

The idiom of eating the elephant one bite at time is often used when trying to vision a roadmap for Master Data Management (MDM).

It’s a bit of a contradiction to look at it that way, because the essence of MDM is an enterprise wide single source of truth eventually for all master data domains.

But it may be the only way.

Using a cliché MDM is (as any discipline) about people, processes and technology.

In an earlier post called Lean MDM a data quality and entity resolution technology focused approach to start consuming the elephant was described, here starting with building universal data models for party master data and rationalizing the data within a short frame of time.

I have often encountered that many organizations actually don’t want an entity revolution but are more comfortable with having entity evolution when it comes to entity resolution as examined the post Entity Revolution vs Entity Evolution.

The term “Evolutionary MDM” is used by the MDM vendor Semarchy as seen on this page here called What is Evolutionary MDM?

The idea is to have technology that supports an evolutionary way of implementing MDM. This is in my eyes very important, as people, processes and technology may be prioritized in the said order, but shouldn’t be handled in a serial matter that reveals the opportunities and restrictions related to technology at a very late stage in implementing MDM.

Bookmark and Share

Costs of a Single Citizen View

Recently Andrew Dean made a blog post called National Identity Numbers. The post generated some comments in the Data Matching group on LinkedIn.

Andrew’s post is based on the ongoing project in India called Aadhaar, where every citizen is assigned a unique identification number to be used for multiple purposes when interacting with the government and financial institutions.

As Andrew mentions the United Kingdom cancelled such a project a few years ago. This cancellation was, in some part, due to fear of excessive costs. The question Andrew, and comments in the LinkedIn group, poses, is if the (feared) costs will justify the benefits of getting a “single citizen view”.

Indeed large governmental projects have a bad name these days all over the world as I know it.

Back in the late 60’s the United States was able to put a man on the moon.

It was at the same time that the Scandinavian countries implemented their “single citizen view”.

Besides digitalizing the national identification number Sweden also, in 1967, managed to change from driving on the left side of the road to driving on the right side. I’m not sure if Sweden could afford turning to the right side today not to say the United Kingdom doing the same.

Bookmark and Share

Big Reference Data Musings

The term “big data” is huge these days. As Steve Sarsfield suggest in a blog post yesterday called Big Data Hype is an Opportunity for Data Management Pros, well, let’s ride on the wave (or is it tsunami?).

The definition of “big data” is as with many buzzwords not crystal clear as examined in a post called It’s time for a new definition of big data on Mike2.0 by Robert Hillard. The post suggests that big may be about volume, but is actually more about big complexity.

As I have worked intensively with large amounts of rich reference data, I have a homemade term called “big reference data”.

Big Reference Data Sets

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organization. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organizations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

An example of a big reference data set is the Dun & Bradstreet WorldBase. This reference data set holds around 300 different attributes describing over 200 million business entities from all over world.

This data set is at first glance well structured with a single (flat) data model for all countries. However, when you work with it you learn that the actual data is very different depending on the different original sources for each country. For example addresses from some countries are standardized, while this isn’t the case for other countries. Completeness and other data quality dimensions vary a lot too.

Another example of a large reference data set is the United Kingdom electoral roll that is mentioned in the post Inaccurately Accurate. As told in the post there are fit for purpose data quality issues. The data set is pretty big, not at least if you span several years, as there is a distinct roll for every year.

Big Reference Data Mashup

Complexity, and opportunity, also arises when you relate several big reference data sets.

Lately DataQualityPro had an interview called What is AddressBase® and how will it improve address data quality? Here Paul Malyon of Experian QAS explains about a new combined address reference source for the United Kingdom.

Now, let’s mash up the AddressBase, the WorldBase and the Electoral Rolls – and all the likes.

Image called Castle in the Sky found on photobotos.

Bookmark and Share

Real World Identity

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share

Data Quality at Terminal Velocity

Recently the investment bank Saxo Bank made a marketing gimmick with a video showing a BASE jumper trading foreign currency with the banks mobile app at terminal velocity (e.g. the maximum speed when free falling).

Today business decisions have to be taken faster and faster in the quest for staying ahead of competition.

When making business decisions you rely on data quality.

Traditionally data quality improvement has been made by downstream cleansing, meaning that data has been corrected long time after data capture. There may be some good reasons for that as explained in the post Top 5 Reasons for Downstream Cleansing.

But most data quality practitioners will say that data quality prevention upstream, at data capture, is better.

I agree; it is better.  Also, it is faster. And it supports faster decision making.

The most prominent domain for data quality improvement has always been data quality related to customer and other party master data. Also in this quest we need instant data quality as explained in the post Reference Data at Work in the Cloud.

Bookmark and Share