Star Bucks

Occasionally there are stories in the press about how multinational companies don’t pay taxes accordingly to where they earn their money.

Lately there has been a row in the UK about that Starbucks despite being very successful officially are losing money in the UK and therefore don’t pay taxes in the UK. The Guardian’s latest entry on that here.

The Guardian article quotes a call for more international co-operations.

I wonder if that will be done as we can’t even agree on simple concepts as:

  • Having the same format for a date across the globe: Today is 13/12/2012 in most parts of the world but 12/13/2012 in the United States.
  • Using comma or period as decimal mark. I have said that 1,731 times in the UK and 1.731 times when I lived in Denmark.
  • Agreeing about if a house number comes before or after the street name:

UPU S42
and many many more fundamental things about presenting data.

Bookmark and Share

What Happened in 1013

At this time of year it is very popular to try to predict what will happen in the next year, being 2013, within your field of expertise.

However, predictions, not at least about the future, may fail. And within data quality we don’t like flaws. So instead I will tell a little bit about what happened in year 1013 with respect to data quality.

1013As always Wikipedia is your friend when seeking knowledge. So I have picked a few of the highlights from the Wikipedia article about 1013:

Diversity

In 1013 the Viking warlord Sweyn Forkbeard replaced Æthelred the Unready as King of England. These were the happy days when the letter Æ was part of the English alphabet. Today Æ only exists in some of the Viking alphabets.

Definition

Kaifeng, capital of China, becomes the largest city of the world in 1013, taking the lead from Córdoba in Al-Andalus. However this is estimation. And even today, as reported by BBC, we actually can’t tell which one is the largest city in the world.

Multiple versions of the truth

The anti-pope John XVI dies in 1013. An anti-pope is a person who, in opposition to the one who is generally seen as the legitimately elected Pope, makes a significantly accepted competing claim to be the Pope. Even today we can’t always establish a single version of the truth.

Bookmark and Share

Who is accountable for melting ice?

When working with data quality issues some of the big questions are: How bad is it? Is it getting worse? Can we do something about it? Who should do something about it?

These questions are basically the same as those around the changing climate on this planet including rising sea levels.

This morning I read an article on BBC news telling that several scientific teams have joined forces in an attempt to quantify exactly how it is with rising sea levels. The short answer is that the sea level now is 11.1 millimeters (7⁄16 of an inch) higher than in 1992.

The sea is rising because of melting ice primary on Antarctica and Greenland as seen below:

Ice_sheet_contribution_464

So I think it’s high time to ask the people of Antarctica and not at least the people of Greenland to do something serious about that their ice is melting and flooding innocent people in the rest of the world.

Bookmark and Share

Data that is not aligned with the real world usually provides bad information

The shortcomings of data being fit for some purpose of use compared to data that is aligned with the real world is a repeating topic on this blog latest in the post “Fitness for Use” is Dead.

Today I had a reminder of that when waiting for baggage at Copenhagen Airport.

There is an information screen telling when your baggage will start rolling in. What actually seems to happen is that a fixed time is assigned to every flight and then it starts counting down the minutes. Most baggage then starts rolling in (and this is showed on the screen) before zero minutes is reached. If it, as with my flight, happens that zero minutes is reached without delivery, the information screen shows that the baggage from this flight is delayed – but not how long.

So, the information provided is when you could expect your baggage probably according to some service level goal. OK, fit for that purpose. But in fact that doesn’t help you as a passenger a lot and doesn’t help at all when that goal isn’t reached.

End of rant.

Bookmark and Share

Hotel Rating Data Quality

Whether you are traveling for business or pleasure you like to stay in a hotel that suites your expectations.

What is good and what is bad differs between us individuals. But we may all belong to some type of stereotype depending on from where in the world we are from. For example, if I walk into an even modest rated American driven (managed) hotel anywhere in the world, I am pretty sure that there will be a bed much larger that I actually need. On a local driven hotel I’m not so sure.

The most common used hotel rating methodology are one to five stars rating systems. However, the classification criteria are not universal. They differ from country to country. Some countries have a public regulated system, in some countries the industry sets the standards and in some countries there are competing systems.

So, I can’t be sure that three stars in one country means the same as three stars in another country. One of my personal foremost requirements is that there is a WiFI available. In the Swiss criteria that will be only 2 out of 863 possible points. So I couldn’t be sure even on a five star hotel. Using the English criteria I will have to go for a four star hotel to be sure.

Besides official ratings social ratings has become more and more popular. Typically guests rates the hotels on the portal where they booked using a scale from 1 to 10 and you may add verbal descriptions about the appealing things and even more popular the appalling things.

Bookmark and Share

Going in the Wrong Direction

When travelling with the London Underground I have several times noticed that the onboard passenger information system is set wrong, typically as if we are going in the opposite direction as what was announced on the station and where the train actually is heading.

People’s reactions

The reaction among the passengers to this data quality flaw varies. Most people who seem to be frequent commuters don’t seem to bother but keeps calm and carries on. Tourists on the other hand get confused and immediately try to appoint the culprit among them who apparently got them on the wrong train.

As the information system keeps on announcing the next station as the one we just left everyone not being new passengers keeps calm and carries on in the opposite direction of the data presented.

Big data quality issues

The problem with wrong journey settings in data collection within public transportation has actually been a challenge I have worked with a lot.

Besides confusing the passengers if presented on the onboard passenger information display and voicing, the data collection may also be corrupted leading to data quality issues when data is stored in a data warehouse or by other techniques in order to facilitate analysis of passenger travel patterns, how well the services applies to schedules and other reporting based on these big numbers of transaction data collected every day.

Aligning with master data

The challenge is to correctly join the transaction data with the right master data entities. A vehicle stop, and in some cases the passenger boarding and alighting, must be associated with the right product being a given journey on a given service according to a given time schedule.

Many other exploitations of big data shares the same basic data quality challenge. If we don’t get the transaction data joined correctly with the master data entities involved, any analysis and reporting may be going in the wrong direction.

Bookmark and Share

The Secret Behind Good Data Quality

This post is inspired by a little tweet chat I had with Daragh O Brien this morning:

The data quality angle was that a simple data quality rule around age (or date of birth) for living persons would be a check creating a warning if age is above 122, because this would, if true, be a new entry in the book of records.

Jeanne Louise Calment of France had the longest confirmed human life of span being 122 years.

Your data quality age check may even be refined as the record for a male is 115 years.

Christian Mortensen, born in Denmark and deceased in the United States, holds that record.

Both Jeanne Calment and Christian Mortensen have shared their secret behind a long life.

Surprisingly both recipes include what is usually not considered good for your health.

Jeanne Calment recommended a diet of port wine and she ate nearly one kilogram of chocolate every week.

Christian Mortensen on the other hand recommended lots of good water and no alcohol – but then a good cigar.

Even though there are lots of recipes and examples out there for a good health and a long life, there is probably no single one way and as told in the post Miracle Food for Thought:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Also as discussed in the post Turning a Blind Eye to Data Quality there are plenty of examples of business success despite of poor data quality.

So, no, there is no single secret behind good data quality. But there is a wealth of good practices, tools and services to choose from out there.

For example I’m not sure I like instant oatmeal – but Instant Data Enrichment for instant Data Quality are good ones for you. I promise.

Bookmark and Share

Most Times the Home Team Wins

This summer is going to be huge if you like sports. The Olympics is coming to London and only 14 days away from now we have the European football (soccer) championship in Poland and Ukraine.

As usual hopes are high for the England soccer team. But statistics doesn’t support the hopes. The England team haven’t really succeeded since the World Cup victory on home ground at Wembley in 1966. That victory was mainly (and now I’m going to be shot in the streets of London) due to a ghost goal.

In business, and in data quality and MDM business too, the home team usually also wins.

Yesterday I noticed a tweet telling that the MDM tool vendor Orchestra Network has been selected as tool vendor by a large bank. The bank is Credit Agricole, a big financial service provider based in France. Orchestra Networks is also based in France. A home win so to say.

In the post The Pond it was told how else dominating American tool vendors may in the first place succeed in expansion to Europe by coming to London, but in fact having a hard time competing in continental Europe due to diversity issues.

European tool vendors going to North America often tries to disguise as a home team. Orchestra Network for example uses Boston & Paris as place of origin in the messaging. Other examples are the leading open source data management tool vendor Talend with dual head quarter in Paris and California, hot Danish MDM vendor Stibo Systems messaging out of Atlanta and the Swedish business intelligence success QlikTech who officially has moved to Pennsylvania.

Bookmark and Share

The Data Quality Tool Vendor Difference

How do analysts look at the data quality tool vendor market? As with everything data quality there are differences and apparently no single source of truth.

Gartner has its magic quadrant. They sell it for money, but usually you are able to get a free copy from the leading vendors.

The Information Difference has its DQ Landscape in the cloud for free.

It is interesting to compare which vendors are included in the latest main pictures, as I have tried below:

The number of x’s is a rough measure of the ability to execute / market strength.

Three smaller vendors are considered by Gartner, but not by The Information Difference and vice versa. Two midsize vendors are included by The Information Difference, but not by Gartner. Experian QAS are included as a big one by The Information Difference, but did not (yet) meet the inclusion criteria used by Gartner.

Bookmark and Share

Bat-and-ball Data Quality

Lately Jim Harris of the OCDQblog has written two excellent blog posts, or may I say home runs, discussing data quality with inspiration from baseball.

In the post Quality Starts and Data Quality Jim talks about that you may have a tough loss in business despite stellar data quality and have a cheap win in business despite of horrible data quality, but in the long run by starting off with good data quality, your organization have a better chance to succeed.

The follow up post called Pitching Perfect Data Quality Jim ponders that business success is achievable without perfect data quality, but data quality has a role to play.

Now, despite that baseball is a very popular sport in the United States, but largely unknown in the rest of world, I think we all understand the metaphors.

Also we have different but similar sports, with other rules, statistics and terms attached, over the world. The common name for these sports is bat-and-ball games.

In Britain, where I live now, cricket is huge and can be used to attract awareness of data issues. As late as yesterday the Ordnance Survey, a government body that have registries with addresses, coordinates and maps, made a blog post called Anyone for cricket? British blogger Peter Thomas also wrote among others a post on cricket and data quality called Wager.

Before coming to Britain I lived in Denmark, where we don’t know baseball, don’t know cricket but sometimes at family picnics, perhaps after a Carlsberg and a snaps or two, plays a similar game called rundbold, with kids and grandpa friendly rules and score board and usually using a tennis ball.

Data quality, not at least data quality in relation to party master data, which is the most prominent domain within the discipline, is also a same same but different game around the world as told in the post Partnerships for the Cloud.

Understanding the rules, statistics and terms of baseball, cricket, rundbold and all the other bat-and-ball games of the world is a daunting task, even though we all know how to hit a ball with a bat.

Bookmark and Share