The GlobalMatchBox

dnbLogo10 years ago I spend most of the summer delivering my first large project after being a sole proprietorship. The client – or actually rather the partner – was Dun & Bradsteet’s Nordic operation, who needed an agile solution for matching customer files with their Nordic business reference data sets. The application was named MatchBox.

bisnode-logoThis solution has grown over the years while D&B’s operation in the Nordics and other parts of Europe is now operated by Bisnode.

Today matching is done with the entire WorldBase holding close to 150 million business entities from all over the world – with all the diversity you can imagine. On the technology side the application has been bundled with the indexing capacities of www.softbool.com and the similarity cleverness of www.omikron.net (disclosure: today I work for Omikron) all built with the RAD tool www.magicsoftware.com. The application is now called GlobalMatchBox.

It has been a great but fearful pleasure for me to have been able to work with setting up and tuning such a data matching engine and environment. Everybody who has worked with data matching knows about the scars you get when avoiding false positives and false negatives. You know that it is just not good enough to say that you only are able to automatically match 40% of the records when it is supposed to be 100%.

So this project has very much been an unlike experience compared to the occasional SMB (Small and Medium size Business) hit and run data quality improvement projects I also do as described in my previous post. With D&B we are not talking about months but years of tuning and I have been guilty of practicing excessive consultancy.

Bookmark and Share

Service Oriented Data Quality

puzzle

Service Oriented Architecture (SOA) has been a buzzword for some years.

In my opinion SOA is a golden opportunity for getting the benefits from data quality tools that we haven’t been able to achieve so much with the technology and approaches seen until now (besides the other SOA benefits being independent to technology).

Many data quality implementations until now have been batch cleansing operations suffering from very little sustainability. I have seen lots of well cleansed data never making it back to the sources or only being partially updated in operational databases. And even then a great deal of those updated cleansed data wasn’t maintained and prevented from there.

Embedded data quality functionality in different ERP, CRM, ETL solutions has been around for a long time. These solutions may serve their purpose very well when implemented. But often they are not implemented due to bundling of distinct ERP, CRM, ETL solutions and consultancies with specific advantages and data quality tools with specific advantages, which may not always be a perfect match. Also having different ERP, CRM, ETL solutions then often means different data quality tools and functionality probably not doing the same thing the same way.

Data Quality functionality deployed as SOA components have a lot to offer:

Reuse is one of the core principles of SOA. Having the same data quality rules applied to every entry point of the same sort of data will help with consistency.

Interoperability will make it possible to deploy data quality prevention as close to the root as possible.

Composability makes it possible to combine functionality with different advantages – e.g. combining internal checks with external reference data.

During the last years I have been on projects implementing data quality as SOA components. The results seem to be very promising so far, but I think we just started.

Bookmark and Share

Qualities in Data Architecture

Data architecture describes the structure of data used by a business and its applications by mapping the data artifacts to data qualities, applications, locations etc.

Pont_du_gard2000 years ago the roman writer, architect and engineer Marcus Vitruvius Pollio wrote that a structure must exhibit the three qualities of firmitas, utilitas, venustas — that is, it must be strong or durable, useful, and beautiful.

I have worked with data quality for many years and always been a bit disappointed about the lack of (at)traction that has been around data quality issues. Perhaps the lack of attraction is due to that we focus so much on strength, durability and usefulness and too little about beauty – or at least attractiveness.

But how do the three qualities apply to data quality?

  • Firmitas, strength and durability, is connected to technology and how we tend to make our data be as close to reflecting real world objects as possible in terms as uniqueness, completeness, timeliness, validity, accuracy and consistency.  
  • Utilitas, usefulness, is connected to how we use data as information in business processes. Often “fit for purpose” is stated as a goal for data quality improvement – which makes it hard when multiple purposes exist in an organization.
  • Venustas – beauty or attractiveness – is connected to the mindset of people. Often we blame poor data quality on the people putting data into the data stores and direct initiatives that way using a whip called data governance. But probably we will get more attraction from people if we make or show quality data more attractive.

SidneyOperaHouseCompared to buildings data quality are often the sewers beneath the old cathedrals and new opera houses – which also may explain the lack of attraction.

If you consider yourself a data quality professional – being a tool maker, expert, whatever – you got to get up from the sewers and make and show some attractive data in the halls of the fine buildings. You know how hard it is to make quality data – but do tell about the success stories.

GreatBeltBridge