2010 predictions

Today this blog has been live for ½ year, Christmas is just around the corner in countries with Christian cultural roots and a new year – even decade – is closing in according to the Gregorian calendar.

It’s time for my 2010 predictions.

Football

Over at the Informatica blog Chris Boorman and Joe McKendrick are discussing who’s going to win next years largest sport event: The football (soccer) World Cup. I don’t think England, USA, Germany (or my team Denmark) will make it. Brazil takes a co-favorite victory – and home team South Africa will go to the semi-finals.

Climate

Brazil and South Africa also had main roles in the recent Climate Summit in my hometown Copenhagen. Despite heavy executive buy-in a very weak deal with no operational Key Performance Indicators was reached here. Money was on the table – but assigned to reactive approaches.

Our hope for avoiding climate catastrophes is now related to national responsibility and technological improvements.

Data Quality

Reactive approach, lack of enterprise wide responsibility and reliance on technological improvements are also well known circumstances in the realm of data quality.

I think we have to deal with this also next year. We have to be better at working under these conditions. That means being able to perform reactive projects faster and better while also implementing prevention upstream. Aligning people, processes and technology is a key as ever in doing that. 

Some areas where we will see improvements will in my eyes be:

  • Exploiting rich external reference data
  • International capabilities
  • Service orientation
  • Small business support
  • Human like technology

The page Data Quality 2.0 has more content on these topics.

Merry Christmas and a Happy New Year.

Bookmark and Share

Data Quality and Climate Change Management

A month ago I made a blog post titled “Data Quality and climate politics”. In this post I highlighted some similarities between data governance / data quality and climate politics mainly focussing on why sometimes nothing is done.

Today, 1 day before the United Nations climate change summit commence in my hometown Copenhagen, it seems that executive buy-in has come through. Over 100 heads of states and government will attend the conference among them key stake holders as Indian prime minister Singh and US president Obama.

The plan for how to manage climate change seems at this moment to have some ingredients with similarities to how to manage data quality change.  

The bill

Related to my previous post Eugene Desyatnik commented on LinkedIn:

In both cases, everyone in their heart agrees it’s a noble cause, and sees how they can benefit — but in both cases, everyone also hopes someone else will pay for most of it.

Progress in fighting climate change seems to be closely related to that the rich countries seems to be in agreement about paying a fair share.

With enterprise data quality you also can’t rely on that one business unit will pay for solving all enterprise wide data quality issues related to common data domains. 

Key Performance Indicators

Reductions in greenhouse gas emissions are key performance indicators and goals in fighting climate change – measuring temperatures is more like looking at the final outcome.

For data quality we also knows that the business outcome is related to information in context but in order to look at improving progress we have to measure (raw) data quality at the root.  

Using technology

This article from BBC “Tackling climate change with technologypoints at a wealth of different technologies that may help fighting global warming while we still get the power we need. There is pros and cons for each. Some technologies works in some geographies but not somewhere else. Some technologies are mature now and some will be in the future. There is no silver bullet but a range of different possibilities

Very similar to data quality technology.

Santa Quality

On the 3rd of December I feel inspired to relate some data quality issues to Mr. Santa Claus – or what is exactly the name. Is it:

  • Saint Nicholas or
  • Père Noël as they say in French or
  • Weihnachtsmann as they say in German or
  • Julemand as we say in Denmark or
  • Plenty of other local names?

Santa Claus versus Saint Nicholas is an example of the use of nicknames which is a main issue in name matching in many cultures.

It’s also important to observe that the German and Danish name is one word versus two words in English and French. Many company names and other names in respective languages shares the same linguistic characteristic.

Father Christmas is an alternative identification maybe more being a job title.

Another question is where he lives.

The North Pole is acknowledged as the correct geographical address in Anglo countries – but there seems to be alternative mailing possibilities as:

  • Santa Claus, North Pole, Canada, HOH OHO
  • Father Christmas, North Pole, SAN TA1 (UK)

However the Finish claims the valid address to be:

In my home country Denmark we will accept nothing but:

  • Julemanden, Box 1615, 3900 Nuuk, Greenland

Finally I could imagine which data quality issues the Santa business has to face:

  • Too many duplicates on the “nice list” leading to heavy overhead in gift spending as well as extra costs in reindeer management.
  • Inaccurate product masters resulting in complaints from nice boys and girls and a lot of scrap and rework.
  • Fraud entries from children already on the ‘naughty list’ may be a challenge.
  • A lot of missing chimney positions may cause severe delivery problems.

But then, why should Santa be smarter than everyone else?

Bookmark and Share

Data Quality and Climate Politics

cop15_logo_imgIn 1 month and 1 day the United Nations Climate Change Conference commence in my hometown Copenhagen. Here the people of the Earth will decide if we want to save the planet now or we will wait a while and see what happens.

The Data Quality issue might seem of little importance compared to the climate issue. Nevertheless I have been thinking about some similarities between Data Governance/ Data Quality and climate politics.

It goes like this:

CEO buy-in

It’s often said that CEO’s don’t buy-in on data quality improvements because it’s a loser’s game. In climate politics the CEO’s are the heads of states. It’s still a question how many heads of state who will attend the Copenhagen conference. There is a great deal of attention around whether United States president Barack Obama will attend. His last visit to Copenhagen in early October didn’t turn out as a success as his recommendation for Chicago as Olympic host city was fruitless. I guess he will only come again if success is very likely.

Personal agendas  

On the other hand British Prime Minister Gordon Brown has urged all world leaders to come to Copenhagen. While I think this is great for the conference being a success I also have a personal reason to think, that it’s a very bad idea. Having all the world heads of states driving around in the Copenhagen streets surrounded by a horde of police bikes will make traffic jams interfering with my daily work and more seriously my Christmas shopping.

It’s no secret that much of the climate problem is caused by us as individuals not being more careful about our energy consumption in daily routines. Data Quality is all the same about individuals not thinking ahead but focusing on having daily work done as quickly and comfortable as possible.

The business perspective

My fellow countryman Bjørn Lomborg is a prominent proponent of the view of focusing more on battling starvation, diseases and other evils because the resources will be spent more effective here than the marginal effects the same resources will have on fighting changing climate.

Data Quality improvement is often omitted from Business Process Reengineering when the scope of these initiatives is undergoing prioritizing focusing on worthy measurable short term wins.

Final words

My hope for my planet – and my profession – is that we are able to look ahead and do what is best for the future while we take personal responsibility and care in our daily work and life.

Bookmark and Share

Slowly Changing Hierarchies

The term “slowly changing dimensions” is known from building data warehouses and attempting to make sense of data with business intelligence using reference data.

family treeThe fact that the world is changing all the time is also present when we look at Master Data Management and the essential hierarchy building taking place when structuring these data.

Company family trees are a common hierarchy structure in Master Data. One source of information about company family trees is the D&B Worldbase – a database operated by Dun & Bradstreet holding over 150 million business entities from all over the world.

I used to have Dun & Bradstreet as a customer. I don’t have that anymore – but I’m still working with the very same project. Because since I started this assignment US based Dun & Bradstreet handed over the operation in a range of European countries to the Swedish publishing group Bonnier. They later handed it over to Swedish company Bisnode. I started the project when I worked for Swedish consultancy group Sigma, continued in my Danish sole proprietorship and now serve Bisnode through German data quality tool vendor Omikron. Slowly changing relationships indeed.

As with many other activities in the realm of data quality establishing the “golden view”, “the single version of the truth” is only the beginning. If that “golden view” is not put into an ongoing maintenance the shiny gold will fade – slowly but steady.

Bookmark and Share

Man versus Computer. Special Edition.

trafficFollowing up on my previous post on Man versus Computer I am actually most workday mornings reminded about how man sucks.

Most workday mornings I leave home in my car heading into the following traffic:

  • A 4 lane motorway rolling in from southern Copenhagen, rest of Denmark, Germany and ultimately rest of Eurasia.
  • A 5th lane coming in from a local area.

These 5 lanes then split into:

  • 2 lanes heading for the Danish answer to Silicon Valley (called Ballerup)
  • 3 lanes leading to downtown Copenhagen or the main fair (called Bella Center), airport, Sweden and rest of Scandinavia.

Of course you will expect some mingling here. What happens every morning is rather a complete stop in traffic and the cause is not the merge and splitting but humans being drivers as:

  • Experienced local selfish drivers staying in the fastest lane until they suddenly want to switch lane according to their ongoing route.
  • Unexperienced (in this area) foreign drivers coming up from crowded central Europe in search for tranquility deep into the Swedish forests having no clue about where to position in this intersection. The same goes for Swedes returning for the opposite reason.
  • Everyone else having fun rejecting the switching from the selfish types and the foreign ones who should know better than passing in rush hours.

Some solutions to this problem might be:

  • Change Management learning people better driving habits.
  • Onboard computer in every car taking care of lane positioning. Should go smooth splitting 5 lanes into 2 + 3 lanes.

Now I am waiting for which solution that will be implemented first.

Master Data Survivorship

A Master Data initiative is often described as making a “golden view” of all Master Data records held by an organization in various databases used by different applications serving a range of business units.

In doing that (either in the initial consolidation or the ongoing insertion and update) you will time and again encounter situations where two versions of the same element must be merged into one version of the truth.

In some MDM hub styles the decision is to be taken at consolidation time, in other styles the decision is prolonged until the data (links) is consumed in a given context.

In the following I will talk about Party Master Data being the most common entity in Master Data initiatives.

mergeThis spring Jim Harris made a brilliant series of articles on DataQualityPro on the subject of identifying duplicate customers ending with part number 5 dealing with survivorship. Here Jim describes all the basic considerations on how some data elements survives a merge/purge and others will be forgotten and gives good examples with US consumer/citizens.

Taking it from there Master Data projects may have the following additional challenges and opportunities:

  • Global Data adds diversity into the rule set of consolidation data on record level as well as field level. You will have to comprise on simple global rules versus complex optimized rules (and supporting knowledge data) for each country/culture.
  • Multiple types of Party Master Data must be handled when Business Partners includes business entities having departments and employees and not at least when they are present together with consumers/citizens.
  • External Reference Data is becoming more and more common as part of MDM solutions adding valid, accurate and complete information about Business Partners. Here you have to set rules (on field level) of whether they override internal data, fills in the blanks or only supplements internal data.
  • Hierarchy building is closely related to survivorship. Rules may be set for whether two entities goes into two hierarchies with surviving parts from both or merges as one with survivorship. Even an original entity may be split into two hierarchies with surviving parts.

What is essential in survivorship is not loosing any valuable information while not creating information redundancy.

An example of complex survivorship processing may be this:

A membership database holds the following record (Name, Address, City):

  • Margaret & John Smith, 1 Main Street, Anytown

An eShop system has the following accounts (Name, Address, Place):

  • Mrs Margaret Smith, 1 Main Str, Anytown
  • Peggy Smith, 1 Main Street, Anytown
  • Local Charity c/o Margaret Smith, 1 Main Str, Anytown

A complex process of consolidation including survivorship may take place. As part of this example the company Local Charity is matched with an external source telling it has a new name being Anytown Angels. The result may be this “golden view”:

ADDRESS in Anytown on Main Street no 1 having
• HOUSEHOLD having
– CONSUMER Mrs. Margaret Smith aka Peggy
– CONSUMER Mr. John Smith
• BUSINESS Anytown Angels having
– EMPLOYEE Mrs. Margaret Smith aka Peggy

Observe that everything survives in a global applicable structure in a fit hierarchy reflecting local rules handling multiple types of party entities using external reference data.

But OK, we didn’t have funny names, dirt, misplaced data…..

Bookmark and Share

Sweden meets United States

obama-ikea

Finding duplicate customers may be very different tasks depending on from which country you are and from which country the data origins.

Besides all the various character sets, naming traditions and address formats also the alternative possibilities with external reference data makes something easy – and then something very hard.

Most technology, descriptions and presented examples around are from the United States.

But say you are a Swedish company having Swedish persons in your database and among those these 2 rows (name, address, postal code and city):

  • Oluf Palme, Sveagatan 67, 10001 Stockholm
  • Oluf Palme, Savegatan 76, 10001 Stockholm

What you do is that you plug into the government provided citizen master data hub and ask for a match. The outcome can be:

  • The same citizen ID is returned because the person has relocated. It’s a duplicate.
  • Two different citizen ID’s is returned. It’s not a duplicate.
  • Either only one or no citizen ID is returned. Leave it or do fuzzy matching.

If you go for fuzzy matching then you better be good, because all the easy ones are handled and you are left with the ones where false positives and false negatives are most likely. Often you will only do fuzzy matching if you have phone numbers, email addresses or other data to support the match.

Another angle is that it is almost only Swedish companies who use this service with the government provided reference data – but everyone having Swedish data may use it upon an approval.

Data quality solutions with party master data is not only about fuzzy matching but also about integrating with external reference data exploiting all the various world wide possibilities and supporting the logic and logistics in doing that. Also we know that upstream prevention as close to the root as possible is better than downstream cleansing.

Deployment of such features as composable SOA components is described in a previous post here.

Master Data meets the Customer

In the old days Master Data was predominately created, maintained and used by the staff in the organisation having these data. This is in many cases not the fact anymore. Besides exchanging data with partners in doing business, today the customer – and prospect – has become an important person to be considered when doing Data Governance and implementing technology around Master Data.

In the online world the customer works with your Master Data when:

  • The customer creates and maintains name, address and communication information by using registration functions
  • The customer searches for and reads product information on web shops and information sites

Having the prospects and customers helping with the name and address (party) data is apparently great news for lowering costs in the organisation. But in the long run you got yourself another silo with data and your Data Quality issues has become yet more challenging.

First thing to do is to optimise your registration forms. An important thing to consider here is that online is worldwide (unless you restrict your site to visitors from a single country). When doing business online with multi national customers then take care that the sequence, formats and labels are useful to everyone and that mandatory checks and other validations are in line with rules for the country in question.

External reference data may be used for lookup and validation integrated in the registration forms.

The concept of “one version of the truth” is a core element in most Master Data Management solutions. Doing deduplication within online registration have privacy considerations. When asking for personal data you can’t prompt “Possible duplicate found” and then present the data about someone else. Here you need more than one data quality firewall.

Many organisations are not just either offline or online but are operating in both worlds. To maintain the 360 degree view on customer in this situation you need strong data matching techniques capable of working with offline and online captured data. As the business case for online registration is very much about reducing staff involvement, this is about using technology and keeping human interaction to a minimum.

Search and navigationWhen a prospect comes to your site and tries to find information about your products, the first thing to do is very often using the search function. From deduplication of names and addresses we know that spelling is difficult and that sometimes we use other synonyms than used in the Master Data descriptions. Add to that the multi-cultural aspect. The solution here is that you use the same fuzzy search techniques that we use for data matching. This is a kind of reuse. I like that.

Bookmark and Share

The art of Business Directory Matching

A business directory is a list of companies in a given area and perhaps a given industry. One very useful type of such a directory related to data quality is a list of all companies in a given country. In many countries the authorities maintains such a list, other places it’s a matter of assembling local lists or other forms of data capture. Many private service providers offer such lists often with added information value of different kinds.

If you take the customer/prospect master table from an enterprise doing B2B in a given country one should believe that the rows in that table would match 100% to the business directory of that country. I am not talking about that all data are spelled exactly as in the directory but “only” about that it’s the same real world object reflected.

neural1During many years of providing solutions for business directory match and tuning these as well as handling such match services from colleagues in the business I have very, very seldom seen a 100% match – even 90% matches are very rare.

Why is that so? Some of the reasons – related to the classic data quality dimensions – I have stumbled over has been:

Completeness of business directories varies from country to country and between the lists provided by vendors. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordics and others have a tight registration and then it is less tight from countries in North America, other European countries and the rest of the world.

Actuality in business directories also differs a lot. Also it is important if the business directory covers dissolved entities and includes history tracking like former names and addresses. Then take the actuality of the customer/prospect table to be matched and once again the time dimension has a lot to say.

Validity, accuracy, consistency both concerning the directory and the table to be matched is a natural course of mismatch. Also many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.

Uniqueness may be different defined in the directory and table to be matched. This includes the perception of hierachies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also different roles as those of a small business owner makes challenges. The same is true about roles as franchise takers and the use of trading styles.

Then of course the applied automated match technique and the human interaction executed are factors of the resulting match rate and the quality of the match measured as frequency of false positives.