Partnerships for the Cloud

Earlier this month Loraine Lawson was so kind to quote me in an article on IT Business Edge called New Partnerships Create Better Customer Data via the Cloud.

The article mentions some cloud services from StrikeIron and Melissadata. These services are currently based on improving North American, being US and Canadian, customer data.

I am involved in similar services that currently are based on improving Danish customer data, which then covers the rest of North America being Greenland.

Improving customer data from all over the world is surely a daunting task that needs partnerships.

The cloud is the same, the reference data isn’t and the rules and traditions aren’t either as governments around the world has found 240 (or so) different solutions to balancing privacy concerns and administrative efficiency.

So, if not partnering, you risk getting solutions that are nationally international.

Bookmark and Share

The Big ABC of Reference Data

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organisation. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organisations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

In the party master data realm such reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data. Some of the challenges I have met for each type of data are:

Addresses

The depth (or precision if you like) of an address is a common problem. If the depth of address data is at the level of building numbers on streets (thoroughfares) or blocks, you have issues as described in the blog post called Multi-Occupancy.

Address reference data of course have issues with the common data quality dimensions as:

  • Timeliness, because for example new addresses will exist in the real world but not yet in a given address directory.
  • Accuracy, as you are always amazed when comparing two official sources which should have the same elements, but haven’t.

Business Entities

Business directories have been accessible for many years and are often used when handling business-to-business (B2B) customer master data and supplier master data management. Some hurdles in doing this are:

  • Uniqueness, as your view of what a given business entity is occasionally don’t match the view in the business directory as discussed in the post 3 out of 10
  • Conformity, because for example an apparently simple exercise as assigning an industry vertical can be a complex matter as mentioned in the post What are they doing?

Consumers (or Citizens)

In business-to-consumer (B2C) or other activities involving citizens a huge challenge is identifying the individuals living on this planet as pondered in the post Create Table Homo Sapiens. Some troubles are:

  • Consistency isn’t easy, as governments around the world have found 240 (or so) different solutions to balancing privacy concerns and administrative effectiveness.
  • Completeness, as the rules and traditions not only between countries, but also within different industries, certain activities and various channels, are different.

Big Reference Data as a Service

Even though I have emphasized on some data quality dimensions for each type of data, all dimensions apply to all types of data.

For organisations operating multinational and/or multichannel exploiting the wealth and diversity of external reference data is a daunting task.

This is why I see reference data as a service embracing many sources as a good opportunity for getting data quality right the first time. There is more on this subject in the post Reference Data at Work in the Cloud.

Bookmark and Share

Nationally International

I am right now in the process of moving most of my business from the Kingdom of Denmark to the United Kingdom.

During that process I have become a regular customer at the Gatwick Express, the (sometimes) fast train going from London’s second largest airport to central London.

When buying tickets online they require you to enter a billing address. Here you can choose between entering a UK address or an international address.

If you enter a UK address the site takes advantage of the UK postal code system where you just have to enter a postcode, which is very granular in the UK, and a house number, and then the system will know your address.

Alternatively you can choose to enter an international address. In that case you will get a form with more fields for you to enter. But, in order not to be too international the form still have the UK way of formatting an address.

Also the default country is United Kingdom which I guess is the only value that should not be applicable for this form.

Bookmark and Share

Citizen Master Data Management

Citizen Master Data Management in the public sector is the equivalence of Customer Master Data Management in the private sector.

Where are we?

As private organizations find different solutions to how to manage customer master data, governments around the world also have found their particular solution for managing citizen master data.

Most descriptions on data management are originated in the United States and so are also many examples and issues related to citizen master data management. One example is this blog post from IBM Initiate called The End of the Social Security Number?

As mentioned in the post there are different administrative practices around the world where governments may learn from experiences with alternative solutions in other countries.

During last year’s discussion in Canada about the census form I had the chance to write a guest blog post on a Canadian blog about How Denmark does it.

The way of the world does change. One example is the program in India called Aadhaar aiming at providing a unique national ID for the over one billion people living in India.

When to register?

The question about when a citizen has to be included in a citizen master data registry of course depends on the purpose of the registry. If the single purpose for example is driving license administration it will depend on when a citizen may obtain a driving license and that will exclude citizens under a certain age depending on the rules in place. The same applies to an electoral roll.

In my country we have an all-purpose citizen master data hub, which today means that a new born is registered and provided a unique Citizen ID within seconds.

Similar considerations apply to immigration and cross boarder employment.

What to store?

Citizen master data registries typically hold attributes as an identifier, name and address and status information.

As new technologies matures governments of course considers if such technologies may be feasible and may add benefits as part of the master data stored about citizens.

Using biometrics is a controversial topic here. The pros and cons were discussed, based on the cancelled program in the United Kingdom, in the post Citizen ID and Biometrics.

Who will share?

Privacy considerations are paramount in most discussions around citizen master data hubs.

Even if you have an all-purpose citizen registry there will be laws limiting how public sector may exploit data identified with the registry and the identifier in use.

On the other hand, in some countries even private sector organizations may benefit from such a master data hub.

An example from Sweden is shown here in the post No Privacy Customer Onboarding.

Bookmark and Share

Some Flyover Information

My Follow Friday World Tour stop today was at some Flyover States, being states in the United States bicoastal people only see from above when flying over them going from coast to coast.

If I were to fly from (A) Copenhagen to (B) Los Angeles one should, by looking at a traditional flat world map, think that the flight also would pass over these inland states.

But the world isn’t flat. The shortest route for an east to west flight will tend to follow the so called great circle being a much more northerly swing.  

However, this isn’t the shortest route either. The polar route, being flying over the North Pole, is the shortcut in the real round world. Actually the Copenhagen (CPH) to Los Angeles (LAX) connection established in 1954 was the world’s first commercial polar route.

I find great analogies in looking at a map and solving data and information quality issues like in the post Sharing data is key to a single version of the truth which was a blog-bout with a UK guy and a Flyover guy.

Bookmark and Share

World Population Excluding Greenland?

According to a newly published paper called The population of the world (2011) we are now 6,987 million citizens on the planet Earth.

However something makes me wonder if they counted Greenland. It’s not that inclusion or exclusion of the 57,564 Greenlanders will rock the figure, but I think we should all be in there.

Greenland does cover a great deal of area on a world map as the big white island on top of the world, not at least when the projection makes areas close to the poles bigger than on a globe.

But is Greenland visible in the population statistics at all?

First I looked for Greenland in North America where Greenland belongs in a geophysical context.

 

Not there.

Then I looked for Greenland in Northern Europe where Greenland belongs in a political context.

 

Not there – or maybe there as part of (the Kingdom of) Denmark?

The population of Denmark is stated as 5.6 million citizens.

If I look up the Kingdom of Denmark on Wikipedia we have these numbers:

It’s a close call. If we round the numbers the 5.6 million citizens is without the North Atlantic dependencies and Greenland, and the Faroe Islands, isn’t anywhere else. And anyway the area clearly suggest that Greenland isn’t included as part of Denmark. So it could be a case of rounding or a case of timeliness – or most probably a case of incompleteness.

Maybe we have passed 7 billion people on earth already if someone else (also) is missing in the statistics.

Bookmark and Share

The 20 Million Rupees Question

Here we go again. The same old question: “What is the definition of customer?”  Latest Informatica (a data quality, master data management and data integration firm) has hired David Loshin to find out – started in the blog post The Most Dangerous Question to Ask Data Professionals.

Shortly, my take is that this question in practice has two major implications for data quality and master data management but in theory, it should only have one:

  • The first one is real world alignment. In theory real world alignment is independent of the definition of a customer as it is about the party behind the customer.
  • The second is party roles. It’s actually here we can have an endless discussion.

In practice we of course mix things up as discussed in the post Entity Revolution vs Entity Evolution.

And Now for Something Completely Different

Instead of saying that “What is the definition of customer?”  is the million dollar question it’s probably more like the 20 million rupees question as most data management these days are taking place in India.

The amount of money involved is taken from the film Slumdog Millionaire where 20 million rupees is the top prize in the local “Who Wants to Be a Millionaire?” (Kaun Banega Crorepati), which by the way has the same jingle and graphics as all over the world.

And oh, how much is 20 million rupees? It’s near ½ million US dollars or 300.000 euro (with a dot as thousand separator). But a lot in buying power for a local customer. Exactly 2 crores (2,00,00,000 rupees).  

Party on.

Bookmark and Share

Big Master Data

Right now I am overseeing the processing of yet a master data file with millions of records. In this case it is product master data also with customer master data kind of attributes, as we are working with a big pile of author names and related book titles.

The Big Buzz

Having such high numbers of master data records isn’t new at all and compared to the size of data collections we usually are talking about when using the trendy buzzword BigData, it’s nothing.

Data collections that qualify as big will usually be files with transactions.

However master data collections are increasing in volume and most transactions have keys referencing descriptions of the master entities involved in the transactions.

The growth of master data collections are also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to near 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Extreme Master Data

Gartner, the analyst firm, has a concept called “extreme data” that rightly points out, that it is not only about volume this “big data” thing; it is also about velocity and variety.

This is certainly true also for master data management (MDM) challenges.

Master data are exchanged between organizations more and more often in higher and higher volumes. Data quality focuses and maturity may probably not be the same within the exchanging parties. The velocity and volume makes it hard to rely on people centric solutions in these situations.

Add to that increasing variety in master data. The variety may be international variety as the world gets smaller and we have collections of master data embracing many languages and cultures. We also add more and more attributes each day as for example governments are releasing more data along with the open data trend and we generally include more and more attributes in order to make better and more informed decisions.

Variety is also an aspect of Multi-Domain MDM, a subject that according to Gartner (the analyst firm once again) is one of the Three Trends That Will Shape the Master Data Management Market.

Bookmark and Share

Data Diversity

As part of my work I deal with data from different countries. In the below figure I have put in some examples of different presentations of the same data from some of the countries I meet the most being Denmark (DK), Germany (DE), France (FR), United States (US) and United Kingdom (GB):

 
Click on figure to enlarge.

I have some more information on the issues regarding the different attributes:

Bookmark and Share

No NOT NULL

A basic way of ensuring data quality in a database is to define that a certain attribute must be filled. This is done by specifying that the value “null” isn’t allowed or as said in SQL’ish: Setting the NOT NULL constraint.

A common data quality issue is that such constraints almost always are too rigid.

In my last post called Notes about the North Pole it was discussed that every place on earth has a latitude and a longitude except that the North Pole – and the South Pole – hasn’t a longitude. So if you have a table with geocodes you can’t set NOT NULL for the longitude if you (though very unlikely) should store the coordinates for the poles. Alternatively you could store 0 for longitude to make it complete – but then it would be very inaccurate. 360 degree inaccurate so to speak.

Another infrequent example from this blog is that every person in my country has a given (first) name and a family (last) name. But there are a few Royal Exceptions. So, no NOT NULL for the family name.

Related to people and places there are plenty of more frequent examples. If you only expect addresses form United States, Australia or India setting the NOT NULL for the state attribute seems wise. But expect foolish values in here when you get addresses from most other parts of the world. So, no NOT NULL for the state.  

A common variant of the mandatory state value is when you register for data quality webinars, white papers and so on. Most often you must select from a value list containing the United States of America – in some cases also mixed in with Canadian Provinces. The NULL option to be used by strangers may hide as “Not Applicable” way down the list among states beginning with N.

I usually select Alaska which is among the first states in the alphabetical order – which also brings me back close to the North Pole making my data close to 360 degree inaccuracy.     

Bookmark and Share