Bad word?: Data Owner

When reading a recent excellent blog post called “How to Assign a Data Owner” by Rayk Fenske I once again came to think about how I dislike the word owner in “Data Owner” and “Data Ownership”.

I am not alone. Recently Milan Kucera expressed the same feelings on DataQualityPro. I also remember that Paul Woodward from British Airways on MDM Summit Europe 2009 said: Data is owned by the entire company – not any individuals.

My thoughts are:

  • Owner is a good word where we strive for fit for a single purpose of use in one silo
  • Owner may be a word of choice where we strive for fit for single purposes of use in several silos
  • Owner is a bad word where we strive for fit for multiple purposes of use in several silos

Well, I of course don’t expect all the issues raised by Rayk will disappear if we are able to find a better term than “Data Owner”.

Nevertheless I will welcome better suggestions for coining what is really meant with “Data Ownership”.

Bookmark and Share

Under new Master Data Management

”Under new management” is a common sign in the window of a restaurant. The purpose of the sign is to tell: Yes, we know: Really bad food was served in a really bad way here. But from now on we have a new management dedicated to serve really good food in a really good way.

By the way: Restaurants are one of the more challenging business entities to handle in Party Master Data Management:

  • They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
  • On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
  • In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Master Data Management is not trivial – serving restaurants or not.

Improving Master Data Management starts with the sign in the window: Yes, we know: Really bad information was served here in a really bad way. But from now on we have a new master data management dedicated to serve really good information in a really good way.

Then you may have a look at the menu. Do we have the right mix of menu items for the guests we like to serve? How are we going to govern a steady flow of fresh raw data that’s going to be prepared and selected from the menu and end up at the tables?

What about the waiters attitude? Serving is much more fun if you are proud about the dishes coming from the kitchen. It’s pleasant to bring compliments from guests back to the kitchen – not at least given along with great tips.

The information chef have to be very much concerned about the raw data quality and the tools available for what may be similar to rinsing, slicing, mixing and boiling food.

Bon appetit.

Bookmark and Share

The Myth about a Myth

A sentiment repeated again and again related to Data (Information) Quality improvement goes like this:

“It’s a myth that Data Quality improvement is all about technology”.

In fact you see the same related to a lot of other disciplines as:

  • “It’s a myth that Master Data Management is all about technology”.
  • “It’s a myth that Business Intelligence is all about technology”.
  • “It’s a myth that Customer Relationship Management is all about technology”.

I have a problem with that: I have never heard anyone say that DQ/MDM/BI/CRM… is all about technology and I have never seen anyone writing so.

When I make the above remark the reaction is almost always this:

“Of course not, but I have seen a lot of projects carried out as if they were all about technology – and of course they failed”.

Unquestionable true.

But the next question is then about root cause. Why did those projects seem to be all about technology? I think it was:

  • Poor project management or
  • Bad balance between business and IT involvement or
  • Immature technology alienating business users.

In my eyes there is no myth about that Data Quality (and a lot of other things) is all about technology. It’s a myth it’s a myth.

Bookmark and Share

Bon Appetit

If I enjoy a restaurant meal it is basically unimportant to me what raw ingredients from where were used and which tools the chef used during preparing the meal. My concerns are whether the taste meet my expectations, the plate looks delicious in my eyes, the waiter seems nice and so on.

This is comparable to when we talk about information quality. The raw data quality and the tools available for exposing the data as tasty information in a given context is basically not important to the information consumer.

But in the daily work you and I may be the information chef. In that position we have to be very much concerned about the raw data quality and the tools available for what may be similar to rinsing, slicing, mixing and boiling food.

Let’s look at some analogies.

Best before

Fresh raw ingredients is similar to actualized raw data. Raw data also has a best before date depending on the nature of the data. Raw data older than that date may be spiced up but will eventually make bad tasting information.

One-stop-shopping

Buying all your raw ingredients and tools for preparing food – or taking the shortcut with ready made cookie cutting stuff – from a huge supermarket is fast and easy (and then never mind the basket usually also is filled with a lot of other products not on the shopping list).

A good chef always selects the raw ingredients from the best specialized suppliers and uses what he consider the most professional tools in the preparing process.

Making information from raw data has the same options.

Compliance

Governments around the world has for long time implemented regulations and inspection regarding food mainly focused at receiving, handling and storing raw ingredients.

The same is now going on regarding data. Regulations and inspections will naturally be directed at data as it is originated, stored and handled.

Diversity

Have you ever tried to prepare your favorite national meal in a foreign country?

Many times this is not straightforward. Some raw ingredients are simply not available and even some tools may not be among the kitchen equipment.

When making information from raw data under varying international conditions you often face the same kind of challenges.

Data Quality and Climate Change Management

A month ago I made a blog post titled “Data Quality and climate politics”. In this post I highlighted some similarities between data governance / data quality and climate politics mainly focussing on why sometimes nothing is done.

Today, 1 day before the United Nations climate change summit commence in my hometown Copenhagen, it seems that executive buy-in has come through. Over 100 heads of states and government will attend the conference among them key stake holders as Indian prime minister Singh and US president Obama.

The plan for how to manage climate change seems at this moment to have some ingredients with similarities to how to manage data quality change.  

The bill

Related to my previous post Eugene Desyatnik commented on LinkedIn:

In both cases, everyone in their heart agrees it’s a noble cause, and sees how they can benefit — but in both cases, everyone also hopes someone else will pay for most of it.

Progress in fighting climate change seems to be closely related to that the rich countries seems to be in agreement about paying a fair share.

With enterprise data quality you also can’t rely on that one business unit will pay for solving all enterprise wide data quality issues related to common data domains. 

Key Performance Indicators

Reductions in greenhouse gas emissions are key performance indicators and goals in fighting climate change – measuring temperatures is more like looking at the final outcome.

For data quality we also knows that the business outcome is related to information in context but in order to look at improving progress we have to measure (raw) data quality at the root.  

Using technology

This article from BBC “Tackling climate change with technologypoints at a wealth of different technologies that may help fighting global warming while we still get the power we need. There is pros and cons for each. Some technologies works in some geographies but not somewhere else. Some technologies are mature now and some will be in the future. There is no silver bullet but a range of different possibilities

Very similar to data quality technology.

55 reasons to improve data quality

The business value in data quality improvement is an ever recurring topic in the realm of data quality.

In the following I will list the first 55 reasons that comes to my mind for improving data quality related to the single most frequent data quality issue around, which is duplicates (and unresolved hierarchies) in party master data – names and addresses.

It goes like this:

1.  It’s a waste of money sending the same printed material twice or more times to the same individual consumer.

2.  Allowing the same customer enter twice or more times for an introduction offer challenges the return of investment in such campaigns.

3.  When measuring churn and win-back two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

4.  Sending the same promotion eMail twice or more times to the same individual consumer looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

5.  It’s probably a waste of money sending the same printed material with presentation and offerings to a household already having a customer.

6.  Assigning different credit terms for two or more unrelated accounts for the same business hierarchy will make uncontrolled financial risk.

7.  When measuring cross selling results two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

8.  When measuring life time value two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

9.  It’s probably a waste of money sending the same printed material twice or more times to the same household.

10.  When measuring life time value two or more unrelated accounts for the same individual being a consumer and a business owner will produce an incomplete result leading to a wrong decision.

11.  When wanting a 1-1 dialogue two or more unrelated accounts for the same individual consumer will not lead to a 1-1 dialogue.

12.  Having companies represented in two or more unrelated accounts for the same company with a different line-of-business assigned will produce an incomplete segmentation.

13.  When trying to point at your best customers being households in order to find similar households two or more unrelated accounts for the same household will produce an incomplete segmentation.

14.  When measuring cross selling results two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

15.  It’s a waste of money sending printed material with presentation and offerings to an individual consumer already being a customer.

16.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business hierarchy will not lead to a complete 1-1 dialogue.

17.  When measuring life time value two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

18.  Assigning different credit terms for two or more unrelated accounts for the same individual consumer will increase financial risk.

19.  When measuring cross selling results two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

20.  When wanting a 1-1 dialogue two or more unrelated accounts for the same household will not lead to a true 1-1 dialogue.

21.  Assigning different credit terms for two or more unrelated accounts for the same business entity could increase financial risk.

22.  Having activities related to companies attached to two or more unrelated accounts for the same company will show an incomplete customer history with the risk of taking damaging actions.

23.  It’s a waste of money and credibility sending printed material with presentation and offerings to an individual business decision maker in a business entity already being a customer.

24.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may miss discount opportunities.

25.  Having companies represented in two or more unrelated accounts for the same company with a different lead source assigned will produce a false measure of marketing and sales performance.

26.  Sending the same promotion eMail or newsletter twice or more times to the same individual business decision maker looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

27.  When measuring  churn and win-back two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

28.  Having activities related to influencers attached to two or more unrelated business contact records for the same person will show an incomplete business partner history with the risk of retaking already made actions.

29.  When buying from a supplier having two or more unrelated accounts despite they are belonging the same business hierarchy you could miss discount opportunities.

30.  Having activities related to households attached to two or more unrelated accounts for the same household will show an incomplete customer history with the risk of taking insufficient  actions.

31.  When trying to point at your best customers being individual consumers in order to find similar individuals two or more unrelated accounts for the same individual consumer will produce a wrong segmentation.

32.  Having companies represented in two or more unrelated accounts for the same company with a different address assigned will produce an incomplete segmentation.

33.  When measuring life time value two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

34.  Having activities related to decision makers in companies attached to two or more unrelated contacts for the same person will show an incomplete customer contact history with the risk of not taking appropriate actions.

35.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business entity will not lead to a real 1-1 dialogue.

36.  When trying to point at your best customers being companies in order to find similar companies two or more unrelated accounts for the same company will produce a false segmentation.

37.  Maintaining data related to two or more unrelated accounts for the same real world entity will probably be more costly than necessary when exploiting external reference data.

38.  It’s probably a waste of money sending printed material with presentation and offerings to a business entity already being a customer at a higher or lower hierarchy level.

39.  Having individual consumers represented in two or more unrelated accounts for the same individual consumer with a different lead source assigned will produce a wrong measure of marketing and sales performance.

40.  Allowing the same customer re-enter for an offer already turned down (e.g. credit services) will create unnecessary double validation work.

41.  When measuring churn and win-back two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

42.  When wanting a 1-1 dialogue two ore more unrelated accounts for the same individual being a consumer and a business owner will not lead to a sensible 1-1 dialogue.

43.  When measuring cross selling results two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

44.  Having activities related to individual consumers attached to two or more unrelated accounts for the same individual consumer will show an incomplete customer history with the risk of taking wrong actions.

45.  When measuring life time value two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

46.  Having activities related to customers attached to two or more unrelated accounts for the same real world entity may lead to that different sales representatives are working against each other.

47.  Allowing sales representatives creating new accounts for already existing customers may create time consuming commission disputes.

48.  Having households represented in two or more unrelated accounts for the same household with a different lead source assigned will produce an incomplete measure of marketing and sales performance.

49.  Maintaining data related to two or more unrelated accounts for the same real world entity will consume more manual work than necessary.

50.  When measuring churn and win-back two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

51.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may have multiple unnecessary inventory costs.

52.  It’s a waste of money and credibility sending the same printed material twice or more times to the same individual business decision maker.

53.  When measuring churn and win-back two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

54.  Assigning different credit terms for two or more unrelated accounts for the same household may increase financial risk.

55.  When measuring cross selling results two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

Bookmark and Share

Ongoing Data Maintenance

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

An obvious example: If I tell you that I am 49 years old that may be just that piece of information you needed for completing a business process. But if you asked me about my birth date you will have the age information also upon a bit of calculation plus you based on that raw data will know when I turn 50 (all too soon) and your organization will know my age if we should do business again later.

Birth dates are stable personal data. Gender is pretty much too. But most other data changes over time. Names changes in many cultures in case of marriage and maybe divorce and people may change names when discovering bad numerology. People move or a street name may be changed.

There is a great deal of privacy concerns around identifying individual persons and the norms are different between countries. In Scandinavia we are used to be identified by our unique citizen ID but also here within debatable limitations. But you are offered solutions for maintaining raw data that will make valid and timely B2C information in what precision asked for when needed.

Otherwise it is broadly accepted everywhere to identify a business entity. Public sector registrations are a basic source of identifying ID’s having various uniqueness and completeness around the world. Private providers have developed proprietary ID systems like the Duns-Number from D&B. All in all such solutions are good sources for an ongoing maintenance of your B2B master data assets.

Addresses belonging to business or consumer/citizen entities – or just being addresses – are contained as external reference data covering more and more spots on the Earth. Ongoing development in open government data helps with availability and completeness and these data are often deployed in the cloud. Right now it is much about visual presenting on maps, but no doubt about that more services will follow.

Getting data right at entry and being able to maintain the real world alignment is the challenge if you don’t look at your data asset as a throw-away commodity.

Figure 1: one year old prime information

PS: If you forgot to maintain your data: Before dumping Data Cleansing might be a sustainable alternative.

Bookmark and Share

Sharing data is key to a single version of the truth

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers:  Charles Blyth and Jim Harris. Our contest is a Blogging Olympics of sorts, with the Great Britain, United States and Denmark competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.”

Please take the time to read all three posts and then vote for who you think has won the debate (see poll below). Thanks!

My take

According to Wikipedia data may be of high quality in two alternative ways:

  • Either they are fit for their intended uses
  • Or they correctly represent the real-world construct to which they refer

In my eyes the term “single version of the truth” relates best to the real-world way of data being of high quality while “shared version of the truth” relates best to the hard work of making data fit for multiple intended uses of shared data in the enterprise.

My thesis is that there is a break even point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.  

The map analogy

In search for this truth we will go on a little journey around the world.

For a journey we need a map.

Traditionally we have the challenge that the real-world being the planet Earth is round (3 dimensions) but a map shows a flat world (2 dimensions). If a map shows a limited part of the world the difference doesn’t matter that much. This is similar to fitting the purpose of use in a single business unit.

MercatorIf the map shows the whole world we may have all kind of different projections offering different kind of views on the world having some advantages and disadvantages. A classic world map is the rectangle where Alaska, Canada, Greenland, Svalbard, Siberia and Antarctica are presented much larger than in the real-world if compared to regions closer to equator. This is similar to the problems in fulfilling multiple uses embracing all business units in an enterprise.

Today we have new technology coming to the rescue. If you go into Google Earth the world indeed looks round and you may have any high altitude view of a apparently round world. If you go closer the map tends to be more and more flat. My guess is that the solutions to fit the multiple uses conondrum will be offered from the cloud.  

Exploiting rich external reference data

But Google Earth offers more than powerfull technolgy. The maps are connected with rich information on places, streets, companies and so on obtained from multiple sources – and also some crowdsourced photos not always placed with accuracy. Even if external reference data is not “the truth” these data, if used by more and more users (one instance, multiple tenants), will tend to be closer to “the truth” than any data collected and maintained solely in a single enterprise.

Shared data makes fit for pupose information

You may divide the data held by an enterprise into 3 pots:

  • Global data that is not unique to operations in your enterprise but shared with other enterprises in the same industry (e.g. product reference data) and eventually the whole world (e.g. business partner data and location data). Here “shared data in the cloud” will make your “single version of the truth” easier and closer to the real world.
  • Bilateral data concerning business partner transactions and related master data. If you for example buy a spare part then also “share the describing data” making your “single version of the truth” easier and more accurate.    
  • Private data that is unique to operations in your enterprise. This may be a “single version of the truth” that you find superior to what others have found, data supporting internal business rules that make your company more competitive and data referring to internal events.

While private and then next bilateral data makes up the largest amount of data held by an enterprise it is often seen that it is data that could be global that have the most obvious data quality issues like duplicated, missing, incorrect and outdated party master data information.

Here “a global or bilateral shared version of the truth” helps approaching “a single version of the truth” to be shared in your enterprise. This way accurate raw data may be consumed as valuable information in a given context at once when needed.  

Call to action

If not done already, please take the time to read posts from fellow bloggers Charles Blyth and Jim Harris and then vote for who you think has won the debate. A link to the same poll is provided on all three blogs. Therefore, wherever you choose to cast your vote, you will be able to view an accurate tally of the current totals.

The poll will remain open for one week, closing at midnight on 19th November so that the “medal ceremony” can be conducted via Twitter on Friday, 20th November. Additionally, please share your thoughts and perspectives on this debate by posting a comment below.  Your comment may be copied (with full attribution) into the comments section of all of the blogs involved in this debate.

Vote here.

Bookmark and Share

Data Quality and Climate Politics

cop15_logo_imgIn 1 month and 1 day the United Nations Climate Change Conference commence in my hometown Copenhagen. Here the people of the Earth will decide if we want to save the planet now or we will wait a while and see what happens.

The Data Quality issue might seem of little importance compared to the climate issue. Nevertheless I have been thinking about some similarities between Data Governance/ Data Quality and climate politics.

It goes like this:

CEO buy-in

It’s often said that CEO’s don’t buy-in on data quality improvements because it’s a loser’s game. In climate politics the CEO’s are the heads of states. It’s still a question how many heads of state who will attend the Copenhagen conference. There is a great deal of attention around whether United States president Barack Obama will attend. His last visit to Copenhagen in early October didn’t turn out as a success as his recommendation for Chicago as Olympic host city was fruitless. I guess he will only come again if success is very likely.

Personal agendas  

On the other hand British Prime Minister Gordon Brown has urged all world leaders to come to Copenhagen. While I think this is great for the conference being a success I also have a personal reason to think, that it’s a very bad idea. Having all the world heads of states driving around in the Copenhagen streets surrounded by a horde of police bikes will make traffic jams interfering with my daily work and more seriously my Christmas shopping.

It’s no secret that much of the climate problem is caused by us as individuals not being more careful about our energy consumption in daily routines. Data Quality is all the same about individuals not thinking ahead but focusing on having daily work done as quickly and comfortable as possible.

The business perspective

My fellow countryman Bjørn Lomborg is a prominent proponent of the view of focusing more on battling starvation, diseases and other evils because the resources will be spent more effective here than the marginal effects the same resources will have on fighting changing climate.

Data Quality improvement is often omitted from Business Process Reengineering when the scope of these initiatives is undergoing prioritizing focusing on worthy measurable short term wins.

Final words

My hope for my planet – and my profession – is that we are able to look ahead and do what is best for the future while we take personal responsibility and care in our daily work and life.

Bookmark and Share