The trees never grow into heaven

This morning most of digital Denmark was closed. You couldn’t do anything at the online bank, you couldn’t do much at public sector websites and you couldn’t read electronic mail from your employer, pension institution and others.

It wasn’t because someone cut a big cable or a computer virus got a lucky strike. The problem was that the centralized internet login service had a three hour outage. It was a classic single point of failure incident.

In Denmark we have a single sign-on identity solution used by public sector, financial services and other organizations. The service is called NemID (Easy ID) and is based on an all-purpose unique national ID for every citizen.

As more and more interaction with public sector and financial services along with online shopping is taking place in the cloud, we are of course more and more vulnerable to these kind of problems.

The benefits of having a single source of truth about who you are became a single point of failure here.

Well, we have this local saying: “The trees never grow into heaven”. All good things have their limit. Even in instant Identity Resolution.

Bookmark and Share

Managing Client On-Boarding Data

This year I will be joining FIMA: Europe’s Premier Financial Reference Data Management Conference for Data Management Professionals. The conference is held in London from 8th to 10th November.

I will present “Diversities In Using External Registries In A Globalised World” and take part in the panel discussion “Overcoming Key Challenges In Managing Client On-Boarding Data: Opportunities & Efficiency Ideas”.

As said in the panel discussion introduction: The industry clearly needs to normalise (or is it normalize?) regional differences and establish global standards.

The concept of using external reference data in order to improve data quality within master data management has been a favorite topic of mine for long.

I’m not saying that external reference data is a single source of truth. Clearly external reference data may have data quality issues as exemplified in my previous blog post called Troubled Bridge Over Water.

However I think there is a clear trend in encompassing external sources, increasingly found in the cloud, to make a shortcut in keeping up with data quality. I call this Data Quality 3.0.

The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

Lately I have worked with a concept called instant Data Quality. The idea is to make simple yet powerful functionality that helps with hooking up with many external sources at the same time when on-boarding clients and making continuous maintenance possible.

One aspect of such a concept is how to exploit the different opportunities available in each country as public administrative practices and privacy norms varies a lot over the world.

I’m looking forward to present and discuss these challenges and getting a lot of feedback.

Bookmark and Share

Typos in the Cloud

By 1st January this year the next largest city in Denmark changed its name. It was only a minor change from “Århus” to “Aarhus” – replacing the Scandinavian letter Å with a double A, which is the normal conversion to the English alphabet.

Data quality would be a lot easier if people, companies and cities stopped changing names. It always goes wrong. First of all a lot of data will be out-of-sync. And then the change may go wrong.

That is what happened at Google Maps. They introduced a typo so the name of the city on the map now is “Aahrus” – swapping the r and the h in the middle of the name.    

For those out there not sure where on earth Århus/Aarhus/Aahrus is, it is the red dot in the upper right corner, where you have London and Paris in the lower left corner on the map below. You may click on map to enlarge.

Bookmark and Share

Non-Obvious Entity Relationship Awareness

In a recent post here on this blog it was discussed: What is Identity Resolution?

One angle was the interchangeable use of the terms “Identity Resolution” and “Entity Resolution”. These terms can be seen as truly interchangeable, as that “Identity Resolution” is more advanced than “Entity Resolution” or as (my suggestion) that “Identity Resolution” is merely related to party master data, but “Entity Resolution” can be about all master data domains as parties, locations and products.

Another term sometimes used in this realm is “Non-Obvious Relationship Awareness”. Also this term is merely related to finding relationships between parties, for example individuals at a casino that seems to do better than the croupiers. Here’s a link to a (rather old) O’Reilly Radar post on Non-Obvious Relationship Awareness.

Going Multi-Domain

So “Non-Obvious Entity Relationship Awareness” could be about finding these hidden relationships in a multi-domain master data scope.

An example could be non-obvious relationships in a customer/product matrix.

The data supporting this discovery will actually not be found in the master data itself, but in transaction data probably being in an Enterprise Data Warehouse (EDW). But a multi-domain master data management platform will be needed to support the complex hierarchies and categorizations needed to make the discovery.   

One technical aspect of discovering such non-obvious relationships is how chains of keys are stored in the multi-domain master data hub.

Customer Master Data

The transactions or sums hereof in the data warehouse will have keys referencing customer accounts. These accounts can be stored in staging areas in the master data hub with references to a golden record for each individual or company in the real world. Depending on the identity resolution available the golden records will have golden relations to each other as they are forming hierarchies of households, company family trees, contacts within companies and their movements between companies and so on.

My guess as described in the post Who is working where doing what? is that this will increasingly include social media data.

Product Master Data

Some of the same transactions or sums hereof in the data warehouse will have keys referencing products. These products will exist in the master data hub as members of various hierarchies with different categorizations.

My guess is that future developments in this field will further embrace not just your own products but also competitor products and market data available in the cloud all attached to your hierarchies and categorizations.   

Bookmark and Share

We Will Become More Open

Yesterday I read a post called Taking Stock Of DQ Predictions For 2011 by Clarke Patterson of Informatica Corporation. Informatica is a well established vendor within data integration, data quality and master data management. The post is based on post called Six Data Management Predictions for 2011 by Steve Sarsfield of Talend. Talend is an open source vendor within data integration, data quality and master data management.

One of the six predictions for 2011 is: Data will become more open.

Steves (open source based) take on this is:

“In the old days good quality reference data was an asset kept in the corporate lockbox. If you had a good reference table for common misspellings of parts, cities, or names for example, the mind set was to keep it close and away from falling into the wrong hands.  The data might have been sold for profit or simply not available.  Today, there really is no “wrong hands”.  Governments and corporations alike are seeing the societal benefits of sharing information. More reference data is there for the taking on the internet from sites like data.gov and geonames.org.  That trend will continue in 2011.  Perhaps we’ll even see some of the bigger players make announcements as to the availability of their data. Are you listening Google?”

Clarkes (propriety software based) take is as follows:

“As data becomes more open, data quality tools will need to be able to handle data from a greater number of sources used for a broader number of purposes.  Gone are the days of single domain data manipulation.  To excel in this new, open market, you’ll need a data quality tool that can profile, cleanse and monitor data regardless of domain, that is also locale-aware and has pre-built rules and reference data.”

I agree with both views which by the way are on each of The Two Sides To The IT Coin – Data Centric IT vs Process Centric IT as explained by Robin Bloor in another recent post on the blog by data integration vendor Pervasive Software.

Steves and Clarkes perspectives are also close to me as my 2011 to do list includes:

  • Involvement in a solution called iDQ (instant Data Quality). The solution is about how we can help system users doing data entry by adding some easy to use technology that explores the cloud for relevant data related to the entry being done.
  • Helping enhancing a hot MDM hub solution with further data quality and multi-domain capabilities.

Bookmark and Share

instant Data Quality

My last blog post was all about how data quality issues in most cases are being solved by doing data cleansing downstream in the data flow within an enterprise and the reasons for doing that.

However solving the issues upstream wherever possible is of course the better option. Therefore I am very optimistic about a project I’m involved in called instant Data Quality.

The project is about how we can help system users doing data entry by adding some easy to use technology that explores the cloud for relevant data related to the entry being done. Doing that has two main purposes:

  • Data entry becomes more effective. Less cumbersome investigation and fewer keystrokes.
  • Data quality is safeguarded by better real world alignment.

The combination of a more effective business process that also results in better data quality seems to be good – like a sugar-coated vitamin pill. By the way: The vitamin pill metaphor also serves well as vitamin pills should be supplemented by a healthy life style. It’s the same with data management.

Implementing improved data quality by better real world alignment may go beyond the usual goal for data quality being meeting the requirements for the intended purpose of use.  This means that you instantly are getting more by doing less.

Bookmark and Share

Out of Facebook

Some while ago it was announced that Facebook signed up member number 500,000,000.

If you are working with customer data management you will know that this doesn’t mean that 500,000,000 distinct individuals are using Facebook. Like any customer table the Facebook member table will suffer from a number of different data quality issues like:

  • Some individuals are signed up more than once using different profiles.
  • Some profiles are not an individual person, but a company or other form of establishment.
  • Some individuals who created a profile are not among us anymore.

Nevertheless the Facebook member table is a formidable collection of external reference data representing the real world objects that many companies are trying to master when doing business-2- consumer activities.

For those companies who are doing business-2-business activities a similar representation of real world objects will be the +70,000,000 profiles on LinkedIn plus profiles in other social business networks around the world which may act as external reference data for the business contacts in the master data hubs, CRM systems and so on.

Customer Master Data sources will expand to embrace:

  • Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
  • Data feed and data integration with traditional external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
  • Self registration by prospects and customers via webforms.
  • Social media master data captured during social CRM and probably harvested in more and more structured ways as a new wave of exploiting external reference data.

Doing “Social Master Data Management” will become an integrated part of customer master data management offering both opportunities for approaching a “single version of the truth” and some challenges in doing so.

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

But the fact that 500,000,000 profiles has been created on Facebook in a very few years by people from all over world shows that people are willing to share and that much information can be collected in the cloud. However no one wants to be spammed by sharing and indeed there have been some controversies around how data in Facebook is handled. 

Anyway I have no doubt that we will see less data entering clerks entering the same information in each company’s separate customer tables and that we increasingly will share our own master data attributes in the cloud.

Bookmark and Share

Linked Data Quality

The concept of linked data within the semantic web is in my eyes a huge opportunity for getting data and information quality improvement done.

The premises for that is described on the page Data Quality 3.0.

Until now data quality has been largely defined as: Fit for purpose of use.

The problem however is that most data – not at least master data – have multiple uses.

My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align fitness for all known purposes.

If we look at the different types of master data and what possibilities that may arise from linked data, this is what initially comes to my mind:

Location master data

Location data has been some of the data types that have been used the most already on the web. Linking a hotel, a company, a house for sale and so on to a map is an immediate visual feature appealing to most people. Many databases around however have poor location data as for example inadequate postal addresses. The demand for making these data “mappable” will increase to near unavoidable, but fortunately the services for doing so with linked data will help.

Hopefully increased open government data will help solve the data supply issue here.

Party master data

Linking party master data to external data sources is not new at all, but unfortunately not as widespread as it could be. The main obstacle until now has been smooth integration into business processes.

Having linked data describing real world entities on the web will make this game a whole lot easier.

Actually I’m working on implementations in this field right now.

Product master data

Traditionally the external data sources available for describing product master data has been few – and hard to find. But surely, at lot of data is already out there waiting to be found, categorized, matched and linked.

Bookmark and Share

Business Directory Musings

This coming Sunday I have worked professionally within Information Technology for 30 years. As I will be on a (well deserved!) vacation in Andalusia on Sunday, I’ll better post my thoughts today.

I have had a lot of different positions and worked in a lot of different domains. The single subject I have worked with the most is business directories.

My first job was at the Danish Tax Authorities and one of the assignments was being a secretary to the committee working for a joint registration of companies in Denmark. Besides I learned a lot about working in political driven organizations and about aligning business and technology I feel good about having been part of the start of building a public sector master data directory. Such directories are both essential for an effective public administration and can be used as external reference data in private enterprises as a valuable mean to improve data quality with business partner master data.

Later I have been working a lot with improving data quality through matching solutions around business directories. This goes from the Dun & Bradstreet WorldBase holding nearly 170 million business entities from all over the world, over databases like the EuroContactPool to national databases either holding all businesses (available) in a single country or given industry segments.

I guess I also will be spending some additional years from now with integrating business directory information into business processes as smooth as possible and preferable along with a range of other kind of external reference data.

One of the new sources building up in the cloud in the realm of business directories is master data references in social networks. The LinkedIn Companies feature is a prominent example. Of course such directories have some data quality issues. This is seen in looking at the companies where I currently work:

  • DM Partner A/S seems OK
  • Omikron Data Quality has 90 employees according to the company profile (filled out by yours truly). Then it’s strange that there are only 25 profiles in the network. But that’s because most employees are in Germany where the competing network called Xing is stronger.
  • Trapeze Group Europe has not been updated with a recent merger and not all profiles has changed their profile accordingly yet. But I’m sure that will be done as time goes by.

I have no doubt though that including information from social networks will become a part of integrating business partner master data in my future.

Bookmark and Share

Social Master Data Management

The term ”Social CRM” has been around for a while. Like traditional CRM (Customer Relationship Management) is heavily dependent on proper MDM (Master Data Management) we will also see that enterprise wide social CRM will be dependent on a proper social MDM element in order to be a success.

The challenge in social MDM will be that we are not going to replace some data sources for MDM, but we are actually going to add some more sources and handle the integration of these sources with the sources for traditional CRM and MDM and other new sources coming from the cloud.

Customer Master Data sources will expand to embrace:

  • Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
  • Data feed and data integration with external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
  • Self registration by prospects and customers via webforms.
  • Social media master data captured during social CRM and probably harvested in more and more structured ways.

Social media master data are found as profiles in services as Facebook mainly for business-to–consumer activities, LinkedIn mainly for business-to-business activities and Twitter somewhere in between. These are only some prominent examples of such services. Where LinkedIn may be dominant for professional use in English speaking countries and countries where English is widely spoken as Scandinavia and the Netherlands other regions are far less penetrated by LinkedIn. For example for German speaking countries the similar network service called Xing is much more crowded. So, when embracing global business you will have to acknowledge the diversity found in social network services.

A good way to integrate all these sources in business processes is using mashup’s. An example will be a mashup for entering customer data. If you are entering a business entity you may want to know:

  • What is already known in internal databases about that entity – either via a centralized MDM hub or throughout disparate databases?
  • Is the visit address correct according to public sector data?
  • How is the business account related to other business entities learned from a business directory?
  • Do we recognize the business contact in social networks – maybe we did have contact before in another relation?

If you are entering a consumer entity you may want to know:

  • Does that person already exist in our internal databases – as an individual and as a household?
  • What do we know about the residence address from public sector data?
  • Can we obtain additional data from phone book directories, nixie lists and what else being available, affordable and legal in the country in question?
  • How do we connect in social media?

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

If aligning people, processes and technology didn’t matter before, it will when dealing with social master data management.

Bookmark and Share