Data Quality 2.0 meets MDM 2.0

My current “Data Quality 2.0” endeavor started as a spontaneous heading on the topic of where the data quality industry in my opinion are going in the near future. But partly encouraged by being friendly slammed on the buzzword bingo I have surfed the Web 2.0 for finding other 2.0’s. They are plenty and frequent.

handshake_after_matchThis piece by Mehmet Orun called “MDM 2.0: Comprehensive MDM” really caught my interest. Data Quality and MDM (Master Data Management) is closely related. When you do MDM you work much of the time with Data Quality issues, and doing Data Quality is most often doing Master Data Quality.

So assuming “Data Quality 2.0” and “MDM 2.0” is about what is referenced in the links above it’s quite natural that many points are shared between the two terms.

Service Oriented Architecture (SOA) is one of the binding elements as Data Quality solutions and MDM solutions will share Reference and Master Data Management services handling data stewardship, match-link, match-merge, address lookup, address standardization, address verification, data change management by doing Information Discrepancy Resolution Processes embracing internal and external data.

The mega-vendors will certainly bundle their Data Quality and MDM offerings by using more or less SOA. The ongoing vendor consolidation adds to that wave. But hopefully we will also see some true SOA where best-of-bread “Data Quality 2.0” and “MDM 2.0” technology will be implemented with strong business support under a broader solution plan to meet the intended business need by focusing on how the information is created, used, and managed for multiple purposes in a multi-cultural environment.

Actually I should have added a (part 1) to the heading of this post. But I will try to make 2.0 free headings in following posts on the next generation milestones in Data Quality and MDM coexistence. It is possible – I did that in my previous post called Master Data Quality: The When Dimension.

Bookmark and Share

Master Data Quality: The When Dimension

Often we use the who, what and where terms in defining master data opposite to transaction data, like saying:

  • Transaction data accurately identifies who, what, where and when and
  • Master data accurately describes who, what and where

Who is easily related to our business partners, what to the products we sell, buy and use – where is the locations of the events.

In some industries when is also easily related to master data entities like in public transportation a time table valid for a given period. Also a fiscal year in financial reporting belongs to the when side of things.

But when is also a factor in improving and preventing data quality related to our business partners, products and locations and assigned categories because the description of these entities do change over time.

This fact is named as “slowly changing dimensions” when building data warehouses and attempting to make sense of data with business intelligence.

But also in matching, deduplication and identity resolution the “when” dimension matters. Having data with the finest actuality doesn’t necessary lead to a good match as you may compare with data not having the same actuality. Here history tracking is a solution by storing former names, addresses, phones, e-mail addresses, descriptions, roles and relations.

Clouds_and_their_shadowsSuch a complexity is often not handled in master data containers around – and even less in matching environments.

My guess is that the future will bring public accessible reference data in the cloud describing our master data entities with a rich complexity including the when – the time – dimension and capable matching environments around.

Bookmark and Share

The art of Business Directory Matching

A business directory is a list of companies in a given area and perhaps a given industry. One very useful type of such a directory related to data quality is a list of all companies in a given country. In many countries the authorities maintains such a list, other places it’s a matter of assembling local lists or other forms of data capture. Many private service providers offer such lists often with added information value of different kinds.

If you take the customer/prospect master table from an enterprise doing B2B in a given country one should believe that the rows in that table would match 100% to the business directory of that country. I am not talking about that all data are spelled exactly as in the directory but “only” about that it’s the same real world object reflected.

neural1During many years of providing solutions for business directory match and tuning these as well as handling such match services from colleagues in the business I have very, very seldom seen a 100% match – even 90% matches are very rare.

Why is that so? Some of the reasons – related to the classic data quality dimensions – I have stumbled over has been:

Completeness of business directories varies from country to country and between the lists provided by vendors. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordics and others have a tight registration and then it is less tight from countries in North America, other European countries and the rest of the world.

Actuality in business directories also differs a lot. Also it is important if the business directory covers dissolved entities and includes history tracking like former names and addresses. Then take the actuality of the customer/prospect table to be matched and once again the time dimension has a lot to say.

Validity, accuracy, consistency both concerning the directory and the table to be matched is a natural course of mismatch. Also many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.

Uniqueness may be different defined in the directory and table to be matched. This includes the perception of hierachies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also different roles as those of a small business owner makes challenges. The same is true about roles as franchise takers and the use of trading styles.

Then of course the applied automated match technique and the human interaction executed are factors of the resulting match rate and the quality of the match measured as frequency of false positives.

Fit for what purpose?

The goal of data quality improvement is often set as ”fit for purpose”. The first purpose addressed will almost naturally be within the domain where the data in question are captured. Then you address other domains where the same data also may be used, but probably with other purposes leading to additional or varying measures for fitness.

tricky_signIf an organisation identifies several domains where the same data are used the normal approach will be to gather all purposes and then start to align all the needs, find the highest common denominators and so on. This may be a very cumbersome process as you need to consider all the different dimensions of data quality: uniqueness, completeness, timeliness, validity, accuracy, consistency.

Another way will be to assume that if you gather many purposes the total needs will almost certainly tend to be a reflection of the real world objects to which the data refer.

So my thesis is, that there is a break even point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.

Master Data are often used in many different functions in an organisation and not at least party data – names and addresses – are known to be a focus area for data quality improvement. Here it is very obvious that real world objects exists and they are basically the same to every organisation.acme

Earlier this year I wrote an entry on dataqualitypro about possibilities with external party reference data:  http://www.dataqualitypro.com/data-quality-home/external-reference-data-an-overview.html

In my previous post on this blog I noticed that governments around the world are releasing data stores that surely add traction to the real world approach to data quality improvement.

I will for sure touch this subject in forthcoming posts on this blog.

Bookmark and Share