Social Media – Page 7 – Liliendahl on Data Quality

Birthday Party

21st June 20109th July 2010Henrik Gabs Liliendahl3 Comments

Today this blog has been online one year. It’s time for a birthday party.

The economy around a birthday party usually goes like this:

You, the guest, spend some money on a nice birthday present
I, the host, spend some money on fine food and beverage

Now, a blog is a virtual thing and I reckon that most of my readers live far, far away from the Copenhagen South Coast. So it’s going to be a remote birthday party and as most other things happening in the social media realm actually no money is going to be exchanged.

Anyway, here is what I would have liked to serve in the real world:

Paella

The dish I have prepared the most times when we have guests is the Spanish paella. I love paella very much and so do all our polite guests.

Also I am a shrimp addict, so I usually like to add two or three different kind of shrimps as the smaller but extremely tasteful Greenlandic shrimps to delicious giant Thai tiger prawns.

Steak

My second favorite meal is a steak. You probably don’t get a better steak than those originated from cattle grazing on the Argentinean pampas.

As I live in the Northern Hemisphere it’s summertime now and perfect weather for preparing the steak outside on the grill.

Wine

There is so much good wine coming from many places around the world. I like Californian wine, wine from Chile, South African wine, Australian wine, French wine and last but not least Italian wine including the unbeatable Amarone.

Beer

As I am a native Dane you will probably expect me to propose a Carlsberg. Don’t get me wrong: Carlsberg is probably a good beer. But there are many other good beers around. When I am in England I like the ultimate mainstream beer: A John Smith (now owned by Dutch Heineken). The best mainstream beer in my opinion is the Belgian Leffe.

Cheers

Thanks to everyone who has read this blog, subscribed, made a re-tweet and not at least those who has commented.

Picture This

9th June 201019th June 2010Henrik Gabs Liliendahl4 Comments

How do people find their way to your blog? I use Twitter and LinkedIn to say: Hey, I made a new post. And then I pretty much rely on that people find my blog when searching with terms as:

Data Quality
Master Data Survivorship
Fit for purpose

But honestly, the search terms that hits my blog many fold more than the above terms are those little texts I add to the images I use to have on every post. And I am pretty sure that those people were not looking for data quality and master data management.

The top term is pearls, including the same word in Russian (жемчуг), Turkish (inci) and Arabian (لآلئ). This word was the title in the image in the post “Universal Pearls of Wisdom” where I wrote about the new SOA manifesto and how this manifesto might as well be about data quality and a lot of other disciplines and concepts. Probably not very interesting for someone trying to buy pearls or so. But maybe a single or two of the +2,000 pearl fishers was captured in the data quality net.

The second most used term is Gorilla. This was used as text for the image in the post “Gorilla Data Quality”. Personally I like this gorilla picture, and so it seems that approximate 1,600 other people also do. Whether they also like the philosophic ideas around “Gorilla Data Quality” and “Guerilla Data Quality” I am not so sure.

Other terms hitting big is Brueghel and Tower of Babel used in a post about international challenges in data quality called “The Tower of Babel” as it was illustrated by a painting by Brueghel. Also Penny Black used in a post about “Postal Address Hierarchy, Granularity, Precision and History” raised the pageview counter.

But it doesn’t seem that every little common word will do. Once I used the word traffic, but it didn’t generate any traffic at all.

Information and Data Quality Blog Carnival, February 2010

2nd February 201023rd March 2013Henrik Gabs Liliendahl2 Comments

El Festival del IDQ Bloggers is another name for the monthly recurring post of selected (actually rather submitted) blog posts on information and data quality started last year by the IAIDQ.

This is the February 2010 edition covering posts published in December 2009 and January 2010.

I will go straight to the point:

Daragh O Brien shared the story about a leading Irish Hospital that has come under scrutiny for retaining data without any clear need. This highlights an important relationship between Data Protection/Privacy and Information Quality. Daragh’s post explores some of this relationship through the “Information Quality Lense”. Here’s the story: Personal Data – an Asset we hold on Trust.

Former Publicity Director of the IAIDQ, Daragh has over a decade of coal-face experience in Information Quality Management at the tactical and strategic levels from the Business perspective. He is the Taoiseach (Irish for chieftain) of Castlebridge Associates. Since 2006 he has been writing and presenting about legal issues in Information Quality amongst other topics.

Jim Harris is an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality. Obsessive-Compulsive Data Quality is an independent blog offering a vendor-neutral perspective on data quality.

If you are a data quality professional, know the entire works by Shakespeare by heart and are able to wake up at night and promptly explain the theories of Einstein you probably know Jim’s blogging. On the other hand: If you don’t know Shakespeare, don’t understand Einstein, then: Jim to the rescue. Read The Dumb and Dumber Guide to Data Quality.

In another post Jim discusses the out-of-box-experience (OOBE) provided by data quality (DQ) software under the title: OOBE-DQ, Where Are You? Jim also posted part 8 of Adventures in Data Profiling – a great series of knowledge sharing on this important discipline within data quality improvement.

Phil Wright is a consultant based in London, UK who specialises in Business Intelligence and Data Quality Management. With 10 years experience within the Telecommunications and Financial Services Industries, Phil has implemented data quality management programs, led data cleansing exercises and enabled organisations to realise their data management strategy.

The Data Factotum blog is a new blog in the Data Quality blogosphere, but Phil has kick started with 9 great posts during the first month. A balanced approach to scoring data quality is the start of a series on the topic of using the balanced scoreboard concept in measuring data quality.

Jan Erik Ingvaldsen is a colleague and good friend of mine. In a recent market competition scam cheap flight tickets from Norwegian Air Shuttle was booked by employees from competitor Cimber Sterling using all kinds of funny names. As usual Jan Erik not only has a nose for a good story but he is also able to propose the solutions as seen here in Detecting Scam and Fraud.

In his position as Nordic Sales Manager at Omikron Data Quality Jan Erik actually is a frequent flyer at Norwegian Air Shuttle. Now he is waiting whether he will be included on their vendor list or on the no-fly list.

William Sharp is a writer on technology focused blogs with an emphasis on data quality and identity resolution.

Informatica Data Quality Workbench Matching Algorithms is part of a series of postings were William details the various algorithms available in Informatica Data Quality (IDQ) Workbench. In this post William start by giving a quick overview of the algorithms available and some typical uses for each. The subsequent postings gets more detailed and outline the math behind the algorithm and will finally be finished up with some baseline comparisons using a single set of data.

Personally I really like this kind of ready made industrial espionage.

IQTrainwrecks hosted the previous blog carnival edition. From this source we also has a couple of postings.

The first was submitted by Grant Robinson, the IAIDQ’s Director of Operations. He shares an amusing but thought provoking story about the accuracy of GPS systems and on-line maps based on his experiences working in Environmental Sciences. Take a dive in the ocean…

Also it is hard to avoid including the hapless Slovak border police and their accidental transportation of high explosives to Dublin due to a breakdown in communication and a reliance on inaccurate contact information. Read all about it.

And finally, we have the post about the return of the Y2k Bug as systems failed to properly handle the move into a new decade, highlighting the need for tactical solutions to information quality problems to be kept under review in a continuous improvement culture in case the problem reoccurs in a different way. Why 2K?

If you missed them, here’s a full list of previous carnival posts:

April 2009 on Obsessive-Compulsive Data Quality by Jim Harris

May 2009 on The DOBlog by Daragh O Brien

June 2009 on Data Governance and Data Quality Insider by Steve Sarsfield

July 2009 on AndrewBrooks.co.uk by Andrew Brooks

August 2009 on The DQ Chronicle by William E Sharp

September 2009 on Data Quality Edge by Daniel Gent

October 2009 on Tooling around in the IBM Infosphere by Vincent McBurney

November 2009 on IQTrainwrecks.com by IAIDQ

2010 predictions

21st December 20092nd July 2010Henrik Gabs LiliendahlLeave a comment

Today this blog has been live for ½ year, Christmas is just around the corner in countries with Christian cultural roots and a new year – even decade – is closing in according to the Gregorian calendar.

It’s time for my 2010 predictions.

Football

Over at the Informatica blog Chris Boorman and Joe McKendrick are discussing who’s going to win next years largest sport event: The football (soccer) World Cup. I don’t think England, USA, Germany (or my team Denmark) will make it. Brazil takes a co-favorite victory – and home team South Africa will go to the semi-finals.

Climate

Brazil and South Africa also had main roles in the recent Climate Summit in my hometown Copenhagen. Despite heavy executive buy-in a very weak deal with no operational Key Performance Indicators was reached here. Money was on the table – but assigned to reactive approaches.

Our hope for avoiding climate catastrophes is now related to national responsibility and technological improvements.

Data Quality

Reactive approach, lack of enterprise wide responsibility and reliance on technological improvements are also well known circumstances in the realm of data quality.

I think we have to deal with this also next year. We have to be better at working under these conditions. That means being able to perform reactive projects faster and better while also implementing prevention upstream. Aligning people, processes and technology is a key as ever in doing that.

Some areas where we will see improvements will in my eyes be:

Exploiting rich external reference data
International capabilities
Service orientation
Small business support
Human like technology

The page Data Quality 2.0 has more content on these topics.

Merry Christmas and a Happy New Year.

Sharing data is key to a single version of the truth

12th November 200920th October 2010Henrik Gabs Liliendahl10 Comments

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Charles Blyth and Jim Harris. Our contest is a Blogging Olympics of sorts, with the Great Britain, United States and Denmark competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.”

Please take the time to read all three posts and then vote for who you think has won the debate (see poll below). Thanks!

My take

According to Wikipedia data may be of high quality in two alternative ways:

Either they are fit for their intended uses
Or they correctly represent the real-world construct to which they refer

In my eyes the term “single version of the truth” relates best to the real-world way of data being of high quality while “shared version of the truth” relates best to the hard work of making data fit for multiple intended uses of shared data in the enterprise.

My thesis is that there is a break even point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.

The map analogy

In search for this truth we will go on a little journey around the world.

For a journey we need a map.

Traditionally we have the challenge that the real-world being the planet Earth is round (3 dimensions) but a map shows a flat world (2 dimensions). If a map shows a limited part of the world the difference doesn’t matter that much. This is similar to fitting the purpose of use in a single business unit.

If the map shows the whole world we may have all kind of different projections offering different kind of views on the world having some advantages and disadvantages. A classic world map is the rectangle where Alaska, Canada, Greenland, Svalbard, Siberia and Antarctica are presented much larger than in the real-world if compared to regions closer to equator. This is similar to the problems in fulfilling multiple uses embracing all business units in an enterprise.

Today we have new technology coming to the rescue. If you go into Google Earth the world indeed looks round and you may have any high altitude view of a apparently round world. If you go closer the map tends to be more and more flat. My guess is that the solutions to fit the multiple uses conondrum will be offered from the cloud.

Exploiting rich external reference data

But Google Earth offers more than powerfull technolgy. The maps are connected with rich information on places, streets, companies and so on obtained from multiple sources – and also some crowdsourced photos not always placed with accuracy. Even if external reference data is not “the truth” these data, if used by more and more users (one instance, multiple tenants), will tend to be closer to “the truth” than any data collected and maintained solely in a single enterprise.

Shared data makes fit for pupose information

You may divide the data held by an enterprise into 3 pots:

Global data that is not unique to operations in your enterprise but shared with other enterprises in the same industry (e.g. product reference data) and eventually the whole world (e.g. business partner data and location data). Here “shared data in the cloud” will make your “single version of the truth” easier and closer to the real world.
Bilateral data concerning business partner transactions and related master data. If you for example buy a spare part then also “share the describing data” making your “single version of the truth” easier and more accurate.
Private data that is unique to operations in your enterprise. This may be a “single version of the truth” that you find superior to what others have found, data supporting internal business rules that make your company more competitive and data referring to internal events.

While private and then next bilateral data makes up the largest amount of data held by an enterprise it is often seen that it is data that could be global that have the most obvious data quality issues like duplicated, missing, incorrect and outdated party master data information.

Here “a global or bilateral shared version of the truth” helps approaching “a single version of the truth” to be shared in your enterprise. This way accurate raw data may be consumed as valuable information in a given context at once when needed.

Call to action

If not done already, please take the time to read posts from fellow bloggers Charles Blyth and Jim Harris and then vote for who you think has won the debate. A link to the same poll is provided on all three blogs. Therefore, wherever you choose to cast your vote, you will be able to view an accurate tally of the current totals.

The poll will remain open for one week, closing at midnight on 19^th November so that the “medal ceremony” can be conducted via Twitter on Friday, 20^th November. Additionally, please share your thoughts and perspectives on this debate by posting a comment below. Your comment may be copied (with full attribution) into the comments section of all of the blogs involved in this debate.

Vote here.

Data Quality and Climate Politics

6th November 200924th June 2010Henrik Gabs Liliendahl8 Comments

In 1 month and 1 day the United Nations Climate Change Conference commence in my hometown Copenhagen. Here the people of the Earth will decide if we want to save the planet now or we will wait a while and see what happens.

The Data Quality issue might seem of little importance compared to the climate issue. Nevertheless I have been thinking about some similarities between Data Governance/ Data Quality and climate politics.

It goes like this:

CEO buy-in

It’s often said that CEO’s don’t buy-in on data quality improvements because it’s a loser’s game. In climate politics the CEO’s are the heads of states. It’s still a question how many heads of state who will attend the Copenhagen conference. There is a great deal of attention around whether United States president Barack Obama will attend. His last visit to Copenhagen in early October didn’t turn out as a success as his recommendation for Chicago as Olympic host city was fruitless. I guess he will only come again if success is very likely.

Personal agendas

On the other hand British Prime Minister Gordon Brown has urged all world leaders to come to Copenhagen. While I think this is great for the conference being a success I also have a personal reason to think, that it’s a very bad idea. Having all the world heads of states driving around in the Copenhagen streets surrounded by a horde of police bikes will make traffic jams interfering with my daily work and more seriously my Christmas shopping.

It’s no secret that much of the climate problem is caused by us as individuals not being more careful about our energy consumption in daily routines. Data Quality is all the same about individuals not thinking ahead but focusing on having daily work done as quickly and comfortable as possible.

The business perspective

My fellow countryman Bjørn Lomborg is a prominent proponent of the view of focusing more on battling starvation, diseases and other evils because the resources will be spent more effective here than the marginal effects the same resources will have on fighting changing climate.

Data Quality improvement is often omitted from Business Process Reengineering when the scope of these initiatives is undergoing prioritizing focusing on worthy measurable short term wins.

Final words

My hope for my planet – and my profession – is that we are able to look ahead and do what is best for the future while we take personal responsibility and care in our daily work and life.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph