Survey Data Laundering

There are a lot of different words for data quality improvement activities like data cleaning, data cleansing, data scrubbing and data hygiene.

Today I stumbled upon “data laundering” and the site http://www.datalaundering.com that is owned by an old colleague of mine from way back when we were doing stuff not focused on data quality.

Joseph is specializing in laundering data from surveys. The issue is that surveys always have some unreliable responses that lead to wrong conclusions that again lead to wrong decisions.  This is a trail well known in data and information quality.

Unreliable responses resemble outliers in business intelligence. These are responses from respondents that provide answers distant from the most conceivable result. What I like about the presentation of the business value is that the example is about food: What we say that we eat and what we actually consume. Then there is a lot of math and even induction mechanism to support the proposition. Read all about it here.      

Bookmark and Share

Questions about Quebec

This is the third post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Is Quebec a country?

No. Quebec is a province in Canada. But it was close on the 30/10/1995 with a referendum on sovereignty with only a very slim majority against sovereignty for the only province in Canada where French is the only official language.

What’s that date: 30/10/1995?

Besides having a different language Quebec also uses a different date format than else in North America. Where North Americans write month-day-year (like 10/30/1995) Quebecker’s write day-month-year like in most other parts of the world. I learned that from this blog post comment here.

The North American multi-cultural sandbox

A lot of software including tools for data quality and master data management comes from North America. When the international (and none English) capabilities of the software and related stuff are questioned, a good answer is always: Well, we did something in Quebec. Like here.

Previous Data Quality World Tour blog posts:

Bookmark and Share

Do You Have an Official SnoopBook Account?

I have earlier written about how Facebook resembles a typical Business-to-Consumer customer table in the post Out of Facebook.

Like any customer table the Facebook member table will suffer from a number of different data quality issues like:

  • Some individuals are signed up more than once using different profiles.
  • Some individuals who created a profile are not among us anymore.
  • Some profiles are not an individual person, but a company or other form of establishment.

One type of the latter one seems to be government and other authorities who want to snoop into your daily whereabouts in order to see if you are paying the taxes you should and not receiving welfare services you shouldn’t.

Recently I read a story about a British woman who got jailed on such an account. Link here.

It was not said if the authorities used a special account for the investigation or it was the civil servants personal accounts that were used.

This morning I read an article (in Danish) about the Danish tax authority’s activities in this field. They have realized that they illegally have used personal accounts for such activities, but have stopped that now. However, they will now create an account for the organization to be used for snooping.         

Bookmark and Share

Inside India

This is the second post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Cultural Diversity

India‘s culture is marked by a high degree of syncretism and cultural pluralism. Every state and union territory has its own official languages, and the constitution also recognizes 21 languages.

National Identification Number for 1.2 Billion People

The government of India has initiated a program for assigning a unique citizen ID for the over 1.2 billion people living in India. The program called Aadhaar is the largest of that kind in the world.

A System Integration Superpower

Tata, Satyam, Infosys, Wipro is just some of the many mega system integrators within master data management and data quality with headquarters in India. Add to that that companies like Cognizant and many others have most of their professionals based in India.  

Bookmark and Share

Does One Size Fit Anyone?

Following up on a recent post about data silos I have been thinking (and remembering) a bit about the idea that one company can have all master data stored in a single master data hub.

Supply Chain Musings

If you for example look at a manufacturer the procurement of raw materials is of course an important business process.

Besides purchasing raw materials the manufacturer also buys machinery, spare parts for the machinery and maintenance services for the machinery.

Like everyone else the manufacturer also buys office supplies – including rare stuff as data quality tools and master data management consultancy.

If you look at the vendor table in such a company the number of “supporting suppliers” are much higher than the number of the essential suppliers of raw materials. The business processes, data structures and data quality metrics for on-boarding and maintaining supplier data and product data are “same same but very different” for these groups of suppliers and the product data involved.

Supply Chain Centric Selling

I remember at one client in manufacturing a bi-function in procurement was selling bi-products from the production to a completely different audience than the customers for the finished products. They had a wonderful multi-domain data silo for that.

Hierarchical Customer Relations

A manufacturer may have a golden business rule saying that all sales of finished products go through channel partners. That will typically mean a modest number of customers in the basic definition being someone who pays you. Here you typically need a complex data structure and advanced workflows for business-to-business (B2B) customer relationship management.

Your channel partners will then have customers being either consumers (B2B2C) or business users within a wider range of companies. I have noticed an increasing interest in keeping some kind of track of the interaction with end users of your products, and I guess embracing social media will only add to that trend. The business processes, data structures and data quality metrics for doing that are “same same but very different” from your basic customer relationship management.

Conclusion

The above musings are revolved around manufacturing companies, but I have met similar ranges of primary and secondary constructs related to master data management in all other industry verticals.   

So, can all master data in a given company be handled in a single master data hub?

I think it’s possible, but it has to be an extremely flexible hub either having a lot of different built-in functionality or being open for integration with external services.

Bookmark and Share

Check out the Czech Republic

This is the first post in a planned series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Companies all over

Last time I checked the Czech Republic had the highest number of Duns Numbers (unique company ID’s in the Dun & Bradstreet WorldBase) per capita in the world. Wonder if this is because of a very effective public sector registration, some special rules for incorporation or is it duplicates?

Exonyms, endonyms and beers

Many Czeck cities are known by the English exonyms (the name in English) but of course have a local endonym (name in Czech). The capital Prague is Praha in Czech. The town Pilsen is called Plzeň in Czech, but there are several towns around the world called Pilsen – and then of course there is a sort of beer called pilsener. (České) Budějovice is Czech for Budweis in German and English. We are certainly talking beer here also.

Ataccama

The data quality and master data management firm Ataccama was founded in the Czech Republic.

Bookmark and Share

Using X Factor in Data Quality

Lately I have been experimenting with the X Factor (or Idol) approach to data quality – and I must say, with very promising results.

The basic idea with the X Factor approach to data quality is that it is not about accuracy of data, but all about data appeal.

Data appeal is initially measured by a panel of judges in a data audition. Usually you have 3 or 4 judges, where at least one judge is unbelievably nice and friendly and at least one judge is extremely rude (aka honest). After a following rootcamp the surviving data records are knocked out one by one by the users until we have a golden record as the winner. A secret data steward is usually hosting the show. 

The great thing about the X Factor approach is that the so called “xingle version of the truth” doesn’t last very long. Soon we will have a new season where data is going through the same process again with a completely new golden record as the winner.

Wonder about what Simon says?   

Bookmark and Share

Typos in the Cloud

By 1st January this year the next largest city in Denmark changed its name. It was only a minor change from “Århus” to “Aarhus” – replacing the Scandinavian letter Å with a double A, which is the normal conversion to the English alphabet.

Data quality would be a lot easier if people, companies and cities stopped changing names. It always goes wrong. First of all a lot of data will be out-of-sync. And then the change may go wrong.

That is what happened at Google Maps. They introduced a typo so the name of the city on the map now is “Aahrus” – swapping the r and the h in the middle of the name.    

For those out there not sure where on earth Århus/Aarhus/Aahrus is, it is the red dot in the upper right corner, where you have London and Paris in the lower left corner on the map below. You may click on map to enlarge.

Bookmark and Share

Boiling Data Silos

Yesterday there where some blog posts dealing with data silos.

Graham Rhind posted: Data silos – learn to live with them.

Rob Karel posted: Stop trying to put a monetary value on data – it’s the wrong path. Though not being the main subject there was a remark saying: “Attempting to boil the ocean and trying to solve Customer, Product, or Financial data for all processes and decisions across the whole organization is too big an effort destined to fail before it starts”.  

Mark Montgomery made a comment on Rob’s post saying: “I also have trouble with the boil the ocean metaphor, which is used too often these days to justify all kinds of protectionist policies in the enterprise. You can’t have it both ways in the enterprise– either you have data silos or you don’t, and I argue that increasingly the world cannot afford them, albeit in highly secure formats in most situations”.

I guess we have to go for the golden mean on this one also. We shouldn’t accept data silos but we must expect them. We could go for eliminating them probably not in one big bang but slice by slice as we climb up the levels in an information maturity model.

I would definitely expect to see fewer and smaller data silos at the top level of an information maturity model than on a bottom level of a data quality immaturity model.

Bookmark and Share

Holistic Accuracy

In community economics you have two terms called

  • Partitive accuracy and
  • Holistic accuracy

In short, partitive accuracy is the accuracy of a single measure being part of a model while holistic accuracy is the accuracy of the model structure and its use. More information here.

I find these terms being very useful in data quality and master data management as well.

The distinction between partitive accuracy and holistic accuracy resembles the distinction between data quality and information quality.

One problem with the term information quality is that it implies a certain context of use, which makes it hard to prepare data for having high data quality for multiple uses other than assuring the accuracy of the single data elements – being similar to the term partitive accuracy.

One clue for assuring better information quality is looking at the model structure of data – being similar to the term holistic accuracy. Here I am thinking beyond traditional data modeling, which is anchored in the technical world, and into how end users of master data hubs are able to build structures of data (with partitive accuracy) that fits the daily business use.

Examples of such holistic information capabilities in master data management will be building flexible product hierarchies and hierarchies of party master data that at the same time reflects hierarchies in the real world as households and company family trees and hierarchies of related accounts and addresses used within the enterprise.

While a single data element as an address component like a postal code may be partitive accurate, the holistic accuracy is seen as how data elements contribute to a holistic accuracy as a part of a data structure that fits multiple purposes of use.

Bookmark and Share