Guerrilla Data Quality

Estatua_La_GalanaOh yes, in my crazy berserkergang of presenting stupid buzzword suggestions it’s time for “Guerrilla Data Quality”. And this time there is no previous hits on google to point at as the original source.

But I noticed that “Guerrilla Data Governance” is in use and as Data Governance and Data Quality are closely related disciplines, I think there could be something being “Guerrilla Data Quality”.

Also recently an article called “How to set data quality goals any business can achieve” was published by Dylan Jones on DataQualityPro. Here the need for setting short term realistic goals is emphasised in contrast to making a full size enterprise wide all domain massive initiative. This article sets focus on the people and process side of what may be “Guerrilla Data Quality”.

Recently I wrote a blog post called “Driving Data Quality in 2 Lanes” focussing on the tool selection for what may be “Guerrilla Data Quality” and the enterprise wide follow up.

Actually I guess most Data Quality activity going on is in fact “Guerrilla Data Quality”. The problem then is that most literature and teaching on Data Quality is aimed at the massive enterprise wide implementations.

Any thoughts?

Multi-Purpose Data Quality

Say you are an organisation within charity fundraising. Since many years you had a membership database and recently you also introduced an eShop with related accessories.

The membership database holds the following record (Name, Address, City, YearlyContribution):

  •  Margaret & John Smith, 1 Main Street, Anytown, 100 Euro

The eShop system has the following accounts (Name, Address, Place, PurchaseInAll):

  • Mrs Margaret Smith, 1 Main Str, Anytown, 12 Euro
  • Peggy Smith, 1 Main Street, Anytown, 218 Euro
  • Local Charity c/o Margaret Smith, 1 Main Str, Anytown, 334 Euro

Now the new management wants to double contributions from members and triple eShop turnover. Based on the recommendations from “The One Truth Consulting Company” you plan to do the following:

  • Establish a platform for 1-1 dialogue with your individual members and customers
  • Analyze member and customer behaviour and profiles in order to:
    • Support the 1-1 dialogue with existing members and customers
    • Find new members and customers who are like your best members and customers

As the new management wants to stay for many years ahead, the solution must not be a one-shot exercise but must be implemented as a business process reengineering with a continuous focus on the best fit data governance, master data management and data (information) quality.

question-marksSo, what are you going to do with your data so they are fit for action with the old purposes and the new purposes?

Recently I wrote some posts related to these challenges:

Any other comments on the issues in how to do it are welcome.

Bookmark and Share

Driving Data Quality in 2 Lanes

Yesterday I visited a client in order to participate in a workshop on using a Data Quality Desktop tool by more users within that organisation.

This organisation makes use of 2 different Data Quality tools from Omikron:

  • The Data Quality Server, a complete framework of SOA enabled Data Quality functionality where we need the IT-department to be a critical part of the implementation.
  • The Data Quality Desktop tool, a user friendly piece of windows software installable by any PC user, but with sophisticated cleansing and matching features.

During the few hours of this workshop we were able to link several different departmental data sources to the server based MDM hub, setting up and confirming the business rules for this and reporting the foreseeable outcome of this process if it were to be repeated.

Some of the scenarios exercised will continue to run as ad hoc departmental processes and others will be upgraded into services embraced by the enterprise wide server implementation.

As I – for some reasons – went to this event going by car over a larger distance I had the time to compare the data quality progress made by different organisations with the traffic on the roads where we have:

  • Large busses with persons and large lorries with products being the most sustainable way of transport – but they are slow going and not too dynamic. Like the enterprise wide server implementations of Data Quality tools.
  • Private cars heading at different destinations in different but faster speeds. Like the desktop Data Quality tools.

 I noticed that:

  • One lane with busses or lorries works fine but slowly.
  • One lane with private cars is bit of a mess with some hazardous driving.
  • One lane with busses, lorries and private cars tends to be mortal.
  • 2 (or more) lanes works nice with good driving habits.

800px-E20_53So, encouraged by the workshop and the ride I feel comfortable with the idea of using both kind of Data Quality tools to have coherent user involved agile processes backed by some tools and a sustainable enterprise wide solution at the same time.

Bookmark and Share

When to cleanse in Data Migration?

About a year ago I posted a question on DataMigrationPro about when it is the best time for executing data cleansing in a migration project. Is it:

  • Before the migration?
  • During the migration?
  • After the migration?

In a recent excellent blog post by Dalton Cervo he explains some of the points considered to this question in a particular MDM migration project.

As I am going to prepare a speech including this subject I will be very pleased to receive additional considerations made on this matter. Please comment here or on DataMigrationPro.

data-migration-pro-banner3

Bookmark and Share

Data Quality and Common Sense

My favourite story is the fairytale “The Emperor’s new clothes” by Hans Christian Andersen.

Hans_Christian_AndersenIn this tale an emperor hires two swindlers (aka consultants) who offer him the finest dress from the most beautiful cloth. This cloth, they tell him, is invisible to anyone who is either stupid or unfit for his position. In fact there is no cloth at all, but no one (but at the end a little child) dares to say.

The Data Quality discipline is tormented by belonging to both the business side and the technology side of practice. This means that we have to live with the buzzwords and the smartness coming from both the management consultants and the technology consultants and vendors – including myself.

So you really have to believe in a lot of things and terms said in order not to look stupid or unfit for your position.

A way to cope with this is to look behind all the fine terms and recognize that most things said and presented is just another way of expressing common sense. Some examples:

Business Process: What you do at work – e.g. selling some stuff and putting data about it into a database so it’s ready for invoicing.

Referential Integrity Error: When you sold something not in the database. You may pick another item from the current list. Bad Change Management: When someone tells you to do it in another way. Now.

Organisational Resistance: When you find that way completely ridiculous because no one tells you why.

Fuzzy logic: This is about the common nature of most questions in life. Statements are not absolutely true or absolutely false but somewhere in between depending on the angle from where you observe.

Business Intelligence: When someone puts your data along with some other data into a new context visualised in a graph in order to replace human gut feeling.

Poor Enterprise Wide Data Quality: The invoicing went well. The decision made from the graph didn’t. 

Data Governance: Meetings and documents about what went wrong with the data and how we can do better.

My experience is that the most successful data quality improvements is made when it is guided by common sense and expressed as being that. From there you may find great inspiration and practical skills and tools in each area of expertise.

The Statue of Liberty versus The Little Mermaid

Statue_of_Liberty_NYThe Statue of Liberty in New York harbor is 46 metres (151 ft) high – 93 metres (305 ft) with foundation and pedestal.

The Little Mermaid sits on a rock in the Copenhagen harbour. The relatively small size of the statue typically surprises tourists visiting for the first time. The Little Mermaid statue is only 1.25 metres (4 ft) high.

Little_Mermaid_CopenhagenActually most things in Denmark are smaller than in the US – also the size of companies. Of course there are Maersk, Carlsberg and Lego, but most of companies from there are SMB’s (Small and Medium sized Business’s) in a global sense.

As Graham Rhind points out in his blog http://grcdi.blogspot.com/2009/05/what-about-rest-of-data.html most literature about data quality is fixed completely on data held in large corporate entities. Statistically the relative number of SMB’s are probably close to the same – but having only a few large companies somehow shifts the focus more to the SMB’s in my country (and our Nordic neighbours).

This is why I have actually worked with data quality improvement both at SMB’s and at large companies.

Most significant differences as I have seen is probably not surprising on the data governance part, where you have to use much more agile (guerrilla) approaches with the SMB’s.

The technology part is pretty much the same – but ROI is king as ever. With SMB’s results must show up almost immediately, there is no room for months of tuning. Software must be user friendly, there is no room for excessive consultancy.

I can recommend all data quality professionals to do a SMB implementation in order to sharpen your skills and tools.

Bookmark and Share

Qualities in Data Architecture

Data architecture describes the structure of data used by a business and its applications by mapping the data artifacts to data qualities, applications, locations etc.

Pont_du_gard2000 years ago the roman writer, architect and engineer Marcus Vitruvius Pollio wrote that a structure must exhibit the three qualities of firmitas, utilitas, venustas — that is, it must be strong or durable, useful, and beautiful.

I have worked with data quality for many years and always been a bit disappointed about the lack of (at)traction that has been around data quality issues. Perhaps the lack of attraction is due to that we focus so much on strength, durability and usefulness and too little about beauty – or at least attractiveness.

But how do the three qualities apply to data quality?

  • Firmitas, strength and durability, is connected to technology and how we tend to make our data be as close to reflecting real world objects as possible in terms as uniqueness, completeness, timeliness, validity, accuracy and consistency.  
  • Utilitas, usefulness, is connected to how we use data as information in business processes. Often “fit for purpose” is stated as a goal for data quality improvement – which makes it hard when multiple purposes exist in an organization.
  • Venustas – beauty or attractiveness – is connected to the mindset of people. Often we blame poor data quality on the people putting data into the data stores and direct initiatives that way using a whip called data governance. But probably we will get more attraction from people if we make or show quality data more attractive.

SidneyOperaHouseCompared to buildings data quality are often the sewers beneath the old cathedrals and new opera houses – which also may explain the lack of attraction.

If you consider yourself a data quality professional – being a tool maker, expert, whatever – you got to get up from the sewers and make and show some attractive data in the halls of the fine buildings. You know how hard it is to make quality data – but do tell about the success stories.

GreatBeltBridge