There is Open Data in the Air

It is spring in Europe and the good news in Europe this week is that from December next year we finally have the end of paying exorbitant fees for having data access on your mobile phone outside a WiFi when in a another EU country as told by BBC here. As a person travelling a lot between EU countries this is, though years too late, fantastic news.

open-doorBeing too late was unfortunately also the case as examined in the article Sale of postcodes data was a ‘mistake’ say Committee – in News from UK Parliament. When the UK Royal Mail was privatised last year the address directory, known as the PAF file, was part of the deal. It would have been a substantial better deal for the society as a whole if the address data had been set free. This calculation is backed up by figures from experiences in Denmark as reported in the post The Value of Free Address Data.

In the next week I’m looking forward to being part of an innovation camp arranged by the Danish authorities as a step in an initiative to exploit open public sector data in the private sector. Here public data owners, IT students, enterprise data consumers and IT tool and service vendors including iDQ A/S will meet openly and challenge each other in the development of the most powerful ideas for new ways to create valuable knowledge based on open public sector data.

Bookmark and Share

Advertisements

Big Data Quality and Open Government Data

ChristiansborgYesterday I participated in an information meeting at the Danish Ministry for Business and Growth related to an initiative around using open government data within business intelligence in the private sector.

Using open government data is already an essential part of the instant Data Quality concept I’m working with right now and I have earlier written about the state of open government data in Denmark in the posts Government Says So and Making Data Quality Gangnam Style.

At the meeting some well-known questions came up:

Is this big data?

The answer was, that it isn’t exactly big data mainly because the data are well structured and thereby looks more as the traditional data sources that we have been used to working with for many years.

Personally I, if we have to use the big word, like to see these data as big reference data as told in the post Four Flavors of Big Reference Data.

What about data quality?

The answer here was a hope about that the fact that these data was made open for the private sector will create some data quality feedback resulting in that the public sector would improve quality of the data to the benefit of both public sector and private sector data consumers.

Bookmark and Share

Sharing is the Future of MDM

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like the Mydex service mentioned by Dylan.
  • Social network profiles.

instant Single Customer ViewMy guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Bookmark and Share

Making Data Quality Gangnam Style

The 21st December 2012 wasn’t the end of the world. But it was the day a music video for the first time passed one billion views on YouTube. It has been said that a reason for this success for Gangnam Style was that the Korean pop singer PSY hasn’t pursued any copyrights related to the video. But that doesn’t mean that PSY doesn’t earn money from the video. On the contrary related commercials are making money Gangnam Style.

A hindrance for better data quality by better real world alignment has traditionally been lack of free and open reference data.  Some issues has been availability and heavy price tags on government collected data.

In my current daily work I mostly use such data within the United Kingdom and Denmark. And here the authorities are taking different paths.

The prices on UK public reference data has traditionally been fairly high and there’s certainly room for innovation around open government data as reported on DataQualityPro in the post Introduction to the Open Data User Group UK.

In Denmark the 21st December 2012 was the day it was published that a unanimous parliament had agreed on the laws behind having Free and Open Public Sector Master Data. From the 1st January 2013 there are no price tags on reference data about addresses, properties, companies (and citizens) and there are plans for making those data even more available, consistent and timely.

Great news for data quality, Gangnam Style.

Data Quality Gangnam Style

Bookmark and Share

The New Year in Identity Resolution

identity resolutionYou may divide doing identity resolution into these categories:

  • Hard core identity check
  • Light weight real world alignment
  • Digital identity resolution

Hard Core Identity Check

Some business processes requires a solid identity check. This is usually the case for example for credit approval and employment enrolment. Identity check is also part of criminal investigation and fighting terrorism.

Services for identity checks vary from country to country because of different regulations and different availability of reference data.

An identity check usually involves the entity who is being checked.

Light Weight Real World Alignment

In data quality improvement and Master Data Management (MDM) you often include some form of identity resolution in order to have your data aligned with the real world. For example when evaluating the result of a data matching activity with names and addresses, you will perform a lightweight identity resolution which leads to marking the matched results as true or false positives.

Doing such kind of identity resolution usually doesn’t involve the entity being examined.

Digital Identity Resolution

Our existence has increasingly moved to the online world. As discussed in the post Addressing Digital Identity this means that we also will need means to include digital identity into traditional identity resolution.

There are of course discussions out there about how far digital identity resolution should be possible. For example real name policy enforcement in social networks is indeed a hot topic.

Future Trends

With regard to digital identity resolution the jury is still out. In my eyes we can’t avoid that the economic consequences of the rising social sphere will affect the demand for knowing who is out there. Also the opportunities in establishing identity via digital footprints will be exploited.

My guess is that the distinction between hard core identity check and real world alignment in data quality improvement and MDM will disappear as reference data will become more available and the price of reference data will go down.

That’s why I’m right now working with a solution (www.instantdq.com) that combines identity check features and data universe into master data management with the possibility of adding digital identity into the mix.

Bookmark and Share

Free and Open Public Sector Master Data

Yesterday the Danish Ministry of Finance announced an agreement between local authorities and the central government to improve and link public registers of basic data and to make data available to the private sector.

Once the public authorities have tidied up, merged the data and put a stop to parallel registration, annual savings in public administration could amount to 35 million EUR in 2020.

Basic open data includes private addresses, companies’ business registration numbers, cadastral numbers of real properties and more. These master data are used for multiple purposes by public sector bodies.

Private companies and other organizations can look forward to large savings when they no longer have to buy their basic data from the public authorities.

In my eyes this is a very clever move by the authorities exactly because of the two main opportunities mentioned:

  • The public sector will see savings and related synergies from a centralized master data management approach
  • The private sector will gain a competitive advantage from better and affordable reference data accessibility and thereby achieve better master data quality.

Denmark have, along with the other Nordic countries, always had a more mature public sector master data approach than we see in most other countries around the world.

I remember I worked with the committee that prepared a single registry for companies in Denmark back in the 80’s as mentioned in the post Single Company View.

Today I work with a solution called iDQ (instant Data Quality) which is about mashing up internal master data and a range of external reference data from social networks and not at least public sector sources. In that realm there is certainly not something rotten in Denmark. Rather there is a good answer to the question about to be free and open or not to be.

Bookmark and Share

Citizen Master Data Management

Citizen Master Data Management in the public sector is the equivalence of Customer Master Data Management in the private sector.

Where are we?

As private organizations find different solutions to how to manage customer master data, governments around the world also have found their particular solution for managing citizen master data.

Most descriptions on data management are originated in the United States and so are also many examples and issues related to citizen master data management. One example is this blog post from IBM Initiate called The End of the Social Security Number?

As mentioned in the post there are different administrative practices around the world where governments may learn from experiences with alternative solutions in other countries.

During last year’s discussion in Canada about the census form I had the chance to write a guest blog post on a Canadian blog about How Denmark does it.

The way of the world does change. One example is the program in India called Aadhaar aiming at providing a unique national ID for the over one billion people living in India.

When to register?

The question about when a citizen has to be included in a citizen master data registry of course depends on the purpose of the registry. If the single purpose for example is driving license administration it will depend on when a citizen may obtain a driving license and that will exclude citizens under a certain age depending on the rules in place. The same applies to an electoral roll.

In my country we have an all-purpose citizen master data hub, which today means that a new born is registered and provided a unique Citizen ID within seconds.

Similar considerations apply to immigration and cross boarder employment.

What to store?

Citizen master data registries typically hold attributes as an identifier, name and address and status information.

As new technologies matures governments of course considers if such technologies may be feasible and may add benefits as part of the master data stored about citizens.

Using biometrics is a controversial topic here. The pros and cons were discussed, based on the cancelled program in the United Kingdom, in the post Citizen ID and Biometrics.

Who will share?

Privacy considerations are paramount in most discussions around citizen master data hubs.

Even if you have an all-purpose citizen registry there will be laws limiting how public sector may exploit data identified with the registry and the identifier in use.

On the other hand, in some countries even private sector organizations may benefit from such a master data hub.

An example from Sweden is shown here in the post No Privacy Customer Onboarding.

Bookmark and Share

Big Master Data

Right now I am overseeing the processing of yet a master data file with millions of records. In this case it is product master data also with customer master data kind of attributes, as we are working with a big pile of author names and related book titles.

The Big Buzz

Having such high numbers of master data records isn’t new at all and compared to the size of data collections we usually are talking about when using the trendy buzzword BigData, it’s nothing.

Data collections that qualify as big will usually be files with transactions.

However master data collections are increasing in volume and most transactions have keys referencing descriptions of the master entities involved in the transactions.

The growth of master data collections are also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to near 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Extreme Master Data

Gartner, the analyst firm, has a concept called “extreme data” that rightly points out, that it is not only about volume this “big data” thing; it is also about velocity and variety.

This is certainly true also for master data management (MDM) challenges.

Master data are exchanged between organizations more and more often in higher and higher volumes. Data quality focuses and maturity may probably not be the same within the exchanging parties. The velocity and volume makes it hard to rely on people centric solutions in these situations.

Add to that increasing variety in master data. The variety may be international variety as the world gets smaller and we have collections of master data embracing many languages and cultures. We also add more and more attributes each day as for example governments are releasing more data along with the open data trend and we generally include more and more attributes in order to make better and more informed decisions.

Variety is also an aspect of Multi-Domain MDM, a subject that according to Gartner (the analyst firm once again) is one of the Three Trends That Will Shape the Master Data Management Market.

Bookmark and Share

Psychographic Data Quality

I have just read an article on Mashable by Jamie Beckland called The End of Demographics: How Marketers Are Going Deeper With Personal Data.

The article explains how new sources of available data makes it possible for marketers to get a much closer look at potential customers and thereby going from delivering a broad message to a huge crowd to delivering a very targeted message to a small group of people with a high probability of getting a response.  In short: Marketers are going from demographic marketing to psychographic marketing.

I believe this is true and ongoing (as I have also been involved in such activities).

The data quality issues we have always known in direct marketing is surely very similar in the psychographic marketing which is going on in the social media realm and in connection with eBusiness.

In my eyes, the concept of a single customer view is also a key to getting success in psychographic marketing.  

You are not delivering a targeted message if you are delivering two different messages to two user profiles belonging to the same real world individual.

Your message will be very frustrating if you treat someone as a prospect customer if that someone already is an existing customer perhaps in another channel.

The effectiveness of psychographic marketing depends on a match between the psychographic variables, the behavioral variables and the demographic variables. As seen in the example in the Mashable article a good old thing as geocoding will be needed here.

An exciting thing in the rise of psychographic marketing is that it will add to the trend in data quality technology where it’s much more than simple name and address cleansing and deduplication.  Rich location data will despite the virtual playground be further important. The relations between customers and products as described in the post Customer Product Matrix Management will be further refined in psychographic marketing.       

Bookmark and Share

No Privacy Customer Onboarding

This post is a follow up on today’s #DataKnightsJam happening on twitter. Today’s subject was data quality and data privacy.

Diversity in data quality is a subject discussed a lot of times on this blog.

So I want to share a real life example of a good upstream get it right first time data sharing approach that might compromise privacy thresholds in other places.

The image to the right is the data entry form from a Swedish webshop used for customer self-registration. The main flow is that:

  • You type your national ID (personnummer in Swedish)
  • You press the following button
  • The system fetches your name and address data from the public citizen hub
  • The webshop gets an accurate, complete single customer view  

The webshop www.jula.se sells tools for home improvement.

Bookmark and Share