The Data Quality Tool Vendor Difference

How do analysts look at the data quality tool vendor market? As with everything data quality there are differences and apparently no single source of truth.

Gartner has its magic quadrant. They sell it for money, but usually you are able to get a free copy from the leading vendors.

The Information Difference has its DQ Landscape in the cloud for free.

It is interesting to compare which vendors are included in the latest main pictures, as I have tried below:

The number of x’s is a rough measure of the ability to execute / market strength.

Three smaller vendors are considered by Gartner, but not by The Information Difference and vice versa. Two midsize vendors are included by The Information Difference, but not by Gartner. Experian QAS are included as a big one by The Information Difference, but did not (yet) meet the inclusion criteria used by Gartner.

Bookmark and Share

The Taxman: Data Quality’s Best Friend

Collection of taxes has always been a main driver for having registries and means of identifying people, companies and properties.

5,000 years ago the Egyptians made the first known census in order to effectively collect taxes.

As reported on the Data Value Talk blog, the Netherlands have had 200 years of family names thanks to Napoleon and the higher cause of collecting taxes.

Today the taxman goes cross boarder and wants to help with international data quality as examined in the post Know Your Foreign Customer. The US FATCA regulation is about collecting taxes from activities abroad and as said on the Trillium blog: Data Quality is The Core Enabler for FATCA Compliance.

My guess is that this is only the beginning of a tax based opportunity for having better data quality in relation to international data.

In a tax agenda for the European Union it is said: “As more citizens and companies today work and operate across the EU’s borders, cooperation on taxation has become increasingly important.”.

The EU has a program called FISCALIS in the making. Soon we not only have to identify Americans doing something abroad but practically everyone taking part in the globalization.

For that we all need comprehensive accessibility to the wealth of global reference data through “cutting-edge IT systems” (a FISCALIS choice of wording).

I am working on that right now:

Bookmark and Share

Know Your Foreign Customer

I’m not saying that Customer Master Data Management is easy. But if we compare the capabilities within most companies with handling domestic customer records they are often stellar compared to the capabilities of handling foreign customer records.

It’s not that the knowledge, services and tools doesn’t exist. If you for example are headquartered in the USA, you will typically use best practice and services available there for domestic records. If you are headquartered in France, you will use best practice and services available there for domestic records. Using the best practices and services for foreign (seen from where you are) records is more seldom and if done, it is often done outside enterprise wide data management.

This situation can’t, and will not, continue to exist. With globalization running at full speed and more and more enterprise wide data management programs being launched, we will need best practices and services embracing worldwide customer records.

Also new regulatory compliance will add to this trend. Being effective next year the US Foreign Account Tax Compliance Act (FATCA) will urge both US Companies and Foreign Financial Institutions to better know your foreign customers and other business partners.

In doing that, you have to know about addresses, business directories and consumer/citizen hubs for an often large range of countries as described in the post The Big ABC of Reference Data.

It may seem a daunting task for each enterprise to be able to embrace big reference data for all the countries where you have customers and other business partners.

My guess, well, actually plan, is, that there will be services, based in the cloud, helping with that as indicated in the post Partnerships for the Cloud.

Bookmark and Share

Partnerships for the Cloud

Earlier this month Loraine Lawson was so kind to quote me in an article on IT Business Edge called New Partnerships Create Better Customer Data via the Cloud.

The article mentions some cloud services from StrikeIron and Melissadata. These services are currently based on improving North American, being US and Canadian, customer data.

I am involved in similar services that currently are based on improving Danish customer data, which then covers the rest of North America being Greenland.

Improving customer data from all over the world is surely a daunting task that needs partnerships.

The cloud is the same, the reference data isn’t and the rules and traditions aren’t either as governments around the world has found 240 (or so) different solutions to balancing privacy concerns and administrative efficiency.

So, if not partnering, you risk getting solutions that are nationally international.

Bookmark and Share

Wildcard Search versus Fuzzy Search

My last post about search functionality in Master Data Management (MDM) solutions was called Search and if you are lucky you will find.

In the comments the use of wildcards versus fuzzy search was touched.

The problem with wildcards

I have a company called “Liliendahl Limited” as this is the spelling of the name as it is registered with the Companies House for England and Wales.

But say someone is searching using one of the following strings:

  • “Liliendahl Ltd”,
  • “Liliendal Limited” or
  • “Liljendahl Limited”

Search functionality should in these situations return with the hit “Liliendahl Limited”.

Using wildcard characters could, depending on the specific syntax, produce a hit in all combinations of the spelling with a string like this: “lil?enda*l l*”.

The problem is however that most users don’t have the time, patience and skills to construct these search strings with wildcard characters. And maybe the registered name was something slightly else not meeting the wildcard characters used.  

Matching algorithms

Tools for batch matching of name strings have been around for many years. When doing a batch match you can’t practically use wildcard characters. Instead matching algorithms typically rely of one, or in best case a combination, of these techniques:

The same techniques can be used for interactive search thus reaching a hit in one fast search.

Fuzzy search

I have worked with the Omkron FACT algorithm for batch matching. This algorithm has morphed into being implemented as a fuzzy search algorithm as well.

One area of use for this is when webshop users are searching for a product or service within your online shop. This feature is, along with other eCommerce capabilities, branded as FACT-Finder.

The fuzzy search capabilities are also used in a tool I’m involved with called iDQ. Here external reference data sources, in combination with internal master data sources, are searched in an error tolerant way, thus making data available for the user despite heaps of spelling possibilities.

Bookmark and Share

Reference Data at Work in the Cloud

One of the product development programs I’m involved in is about exploiting rich external reference data and using these data in order to get data quality right the first time and being able to maintain optimal data quality over time.

The product is called instant Data Quality (abbreviated as iDQ ™). I have briefly described the concept in an earlier post called instant Data Quality.

iDQ ™combines two concepts:

  • Software as a Service
  • Data as a Service

While most similar solutions are bundled with one specific data provider the iDQ ™ concept embraces a range data sources. The current scope is around customer master data where iDQ ™ may include Business-to-Business (B2B) directories, Business-to-Consumer (B2C) directories, real estate directories, Postal Address Files and even social media network data from external sources as well as internal master data at the same time all presented in a compact mash-up.

The product has already gained a substantial success in my home country Denmark leading to the formation of a company solely working with development and sales of iDQ ™.

The results iDQ ™ customers gains may seem simple but are the core advantages of better data quality most enterprises are looking for, like said by one of Denmark’s largest companies:

“For DONG Energy iDQ ™ is a simple and easy solution when searching for master data on individual customers. We have 1,000,000 individual customers. They typically relocate a few times during the time they are customers of us. We use iDQ ™ to find these customers so we can send the final accounts to the new address. iDQ ™ also provides better master data because here we have an opportunity to get names and addresses correctly spelled.

iDQ ™ saves time because we can search many databases at the time. Earlier we had to search several different databases before we found the right master data on the customer. ”

Please find more testimonials here.

I hope to be able to link to testimonials in more languages in the future.

Bookmark and Share

The Pond

The term ”The Pond” is often used as an informal term for the Atlantic Ocean, especially the North Atlantic Ocean being the waters that separates North America and Europe.

Within information technology and not at least my focus areas being data quality and master data management there is a lot of exchange going on over the pond as European companies are using North American technology and sometimes vice versa. Also European companies are setting up operations in North America and of course also the other way around.

Some technologies works pretty much the same regardless of in which country it is deployed. A database manager product is an example of that kind of technology. Other pieces of software must be heavily localized. An ERP application belongs to that category. Data quality and master data management tools and implementation practice are indeed also subject to diversity considerations.

When North American companies go to Europe my gut feeling is that an overwhelming part of them chooses to start with a European or EMEA wide head quarter on the British Isles – and that again means mostly in the London area.

The reasons for that may be many. However I guess that the fact that people on the British Isles doesn’t speak a strange language has a lot to say. What many North American companies with a head quarter in London often has to realize then is, that this move only got them half way over the pond.  

Bookmark and Share

Nonprofit Data Quality

One of the industries where I have worked a lot with data quality issues is at nonprofit organizations such as charities and other form of membership based organizations.

A general characteristic of such organizations is that they have databases with as many “customers” as huge global enterprises; however the number of employee records is only a fraction compared to those large companies.

So the emphasis is often not at creating well manned data governance organizational structures but implementing the best automation available in order to have optimal party master data management, where the parties involved are members and other roles played by individuals and companies with a common interest.

Many nonprofit organizations have several different fundraising activities going on at the same time. This means that real world individuals, households, organizations and their contacts are registered through different channels. The challenges of getting a “single view of customer” from the data streams created in these processes are discussed in the post Multi-Purpose Data Quality.

There are many nonprofit organizations working internationally. The often decentralized management structures in nonprofit organizations means that way of doing things will naturally be different between countries where nonprofits are operating. Also the differences in legislation and culture are important. Some examples related to how to exploit master data are examined in the post Feasible Names and Addresses.

When it comes to creating business cases for data quality nonprofits are basically of course not different from any other organization. The main goals are increased fundraising and lowering administration costs. As said, the low number of employees often leads to using technology. The low amount of money available often leads to using agile technology.

Bookmark and Share

Lean MDM

With a discipline as master data management there will of course always be an agile or lean way of doing things.

What is lean MDM?

A document from 2008 called A LEAN APPROACH TO MASTER DATA MANAGEMENT by Duff Bailey examines the benefits of lean MDM.

The document has a view close to me saying that: “While there is little argument over what constitutes an individual person, many existing data models make the mistake of modeling “roles” (customer, employee, stock-holder, vendor contact, etc.) instead”.

As discussed in the article similar views can be made around organization entities, location entities and product entities.

In conclusion Duff says that: “Because of their universality and their abstract nature, these core data models can be established quickly, without the need for lengthy review that normally accompanies an enterprise data model. Thereafter, the focus of the lean data managemnent effort will be to grow the models and populate the repositories in support of specific business objectives”.

MDM in the high gear

The fast time-to-value for lean MDM was also emphasized by MDM guru Aaron Zornes in a tweet yesterday:

The mentioned LeanMDM offer from Omikron Data Quality (which is one of my employers) is described in the link (in German). A short resume of the text is that you among other things will get this from lean MDM:

  • An increase in the corporate value of customer data
  • Short project times and fast results
  • Lower implementation costs through service-oriented architecture (SOA)

I have been involved in one of the implementations of the LeanMDM concept as described in this article (in English) about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

Bookmark and Share

Who Is Not Using Data Quality Magic?

The other day the latest Gartner Magic Quadrant for Data Quality Tools was released.

If you are interested in knowing what it says, it’s normally possible to download a copy from the leading vendors’ website.

Among the information in the paper you will find some estimated numbers of customers who has purchased the tools from the vendors included in the quadrant.

If you sum up these numbers, then it is estimated that 16,540 organizations worldwide is a customer at an included vendor.

So, if I matched that compiled customer list with the Dun & Bradstreet WorldBase holding at least 100 million active business entities worldwide, I will have a group of at least 99,983,460 companies who is not using magical data quality tools.

And that is probably falsely excluding that there are customers who has more than one vendor.

Anyway, what do all the others do then?

Well, of course the overwhelming number of companies will be too small to have any chance of investing in a data quality tool from a vendor that made it to the quadrant.

The quadrant also list a range of other vendors of data quality tools typically operating locally around the world. These vendors also have customers and probably more customers in numbers but not at the size of the companies who chooses a vendor in the quadrant.   

A lot of data quality technology is also used by service providers who either use a tool from a data quality tool vendor or has made a homegrown solution. So a lot of companies benefit from such services when processing large number of data records to be standardized, deduplicated and enriched.

Then we must not forget that technology doesn’t solve all your data quality issues as stated by the founder of DataQualityPro Dylan Jones in a recent post on a data quality forum operated by the (according to Gartner) leading data quality tool vendor. The post is called Finding the Passion for Data Quality.

My take is that it’s totally true that data quality tools doesn’t solve most of your data quality issues, but those issues addressed, typically data profiling and data matching, are hard to solve without a tool. So there is still a huge market out there currently covered by the true leader in the data quality market: Laissez-Faire.

Bookmark and Share