Data Quality Dimensions and Real World Alignment

Real world alignment is often seen as a competing measure of data quality opposite to the popular approach of data quality being seen as fitness for purpose of use.

When we try to narrow down what constitutes quality of data we may use data quality dimensions. So, how does data quality dimensions look like in the light of real world alignment? Here is a few thoughts:

  • Uniqueness is probably the data quality dimension that most closely relates to real world alignment as the opposite of uniqueness is duplication which in the data quality world means that two or more different data records describes the same real world entity.
  • Accuracy is best measured as in what degree data describes something in the real world.
  • Credibility was recently proposed as an important data quality dimension by Malcolm Chisholm on Information Management in the article called Data Credibility: A New Dimension of Data Quality? Here credibility is if data is without any malicious manipulation performed to fulfill an evil purpose of use.
Some data quality dimensions
Some data quality dimensions

Bookmark and Share

Service Oriented MDM

puzzleMuch of the talking and doing related to Master Data Management (MDM) today revolves around the master data repository being the central data store for information about customers, suppliers and other parties, products, locations, assets and what else are regarded as master data entities.

The difficulties in MDM implementations are often experienced because master data are born, maintained and consumed in a range of applications as ERP systems, CRM solutions and heaps of specialized applications.

It would be nice if these applications were MDM aware. But usually they are not.

As discussed in the post Service Oriented Data Quality the concepts of Service Oriented Architecture (SOA) makes a lot of sense in deploying data quality tool capacities that goes beyond the classic batch cleansing approach.

In the same way, we also need SOA thinking when we have to make the master data repository doing useful stuff all over the scattered application landscape that most organizations live with today and probably will in the future.

MDM functionality deployed as SOA components have a lot to offer, as for example:

  •  Reuse is one of the core principles of SOA. Having the same master data quality rules applied to every entry point of the same sort of master data will help with consistency.
  •  Interoperability will make it possible to deploy master data quality prevention as close to the root as possible.
  •  Composability makes it possible to combine functionality with different advantages – e.g. combining internal master data lookup with external reference data lookup.

Bookmark and Share

Completeness is still bad, while uniqueness is improving

In a recent report called The State of Marketing Data prepared by Netprospex over 60 million B2B records were analyzed in order to assess the quality of the data measured as fitness for use related to marketing purposes.

An interesting find was that out of a score of maximum 5.0 duplication, the dark side of uniqueness, was given the average score 4.2 while completeness was given the average score 2.7.

The STaTe of MarkeTing DaTa

This corresponds well with my experience. We have in the data quality realm worked very hard with deduplication tools using data matching approaches over the years and results are showing up. We are certainly not there yet, but it seems that completeness, and in my experience also accuracy, are data quality dimensions currently suffering more.

In my eyes the remedy for improvement in completeness and accuracy goes hand in hand with even better uniqueness. It is about getting the basic data right the first time as described in the post instant Single Customer View and being able to keep up completeness and accuracy as told in the post External Events, MDM and Data Stewardship.

Bookmark and Share

Now We Have a Data Governance Tool Market

Do we need data governance tools? This was a question discussed recently here on the blog in the comments to the post called Data Governance Tools: The New Snake Oil?

As mentioned in a comment one analyst firm, Bloor, has actually made a data governance market update with vendors positioned in their bulls-eye style of visualization. Both a data quality market update and the data governance market update can be fetched via Trillium Software here.

The data governance report states that especially regulations has urged organizations to focus on data quality and thereby data governance. Furthermore Bloor says: “Previously, compliance was typically process-focused: you had to prove the lineage of data, for example, but not its accuracy.”

The vendors positioned in the data governance market is pretty much the usual suspects known from the analyst reports on the data quality tool market. Interesting to see that Experian though makes one of the not so frequent appearances in such a report. That must be about accuracy, since Experian is not so known for process-focused tools but indeed for tools using external reference data in order to improve accuracy.

Market Update Data Governance

Bookmark and Share

A Digital Sharing Revolution

The last couple of days I have been part of a so called Innovation Camp around how to exploit open public sector data in the private sector. In one of the inspirational keynotes Professor Birgitte Andersen of the Big Innovation Centre used the term “A Digital Sharing Revolution” to describe the trend of increasingly sharing data both within the public sector, between the public sector and the private sector and within the private sector.

energy saving bulbDuring the two days a lot of ideas for how to exploit open public sector data within the private sector were put on the table. I was so lucky to win a SmartWatch as being part of the group with the winning concept that is a service for identifying buildings with potential for energy saving improvements. This service will be of benefit for both large enterprises as building material manufacturers (and in fact energy suppliers), local small and midsize businesses, the house owners and the society as a whole in order to fulfil climate change prevention goals.

At iDQ we see great potential in using such a service in conjunction with our current offerings for exploiting both open public sector data and other external big reference data sources. Of course, there is a dilemma for enterprises in the private sector in using the same data provided by the same services as their competitors. However there is still a lot of possibilities in sticking out from the crowd in how data and services are actually used in the way of doing business and concentrating on that and not reinventing the wheel in the way collecting data.

Bookmark and Share

There is Open Data in the Air

It is spring in Europe and the good news in Europe this week is that from December next year we finally have the end of paying exorbitant fees for having data access on your mobile phone outside a WiFi when in a another EU country as told by BBC here. As a person travelling a lot between EU countries this is, though years too late, fantastic news.

open-doorBeing too late was unfortunately also the case as examined in the article Sale of postcodes data was a ‘mistake’ say Committee – in News from UK Parliament. When the UK Royal Mail was privatised last year the address directory, known as the PAF file, was part of the deal. It would have been a substantial better deal for the society as a whole if the address data had been set free. This calculation is backed up by figures from experiences in Denmark as reported in the post The Value of Free Address Data.

In the next week I’m looking forward to being part of an innovation camp arranged by the Danish authorities as a step in an initiative to exploit open public sector data in the private sector. Here public data owners, IT students, enterprise data consumers and IT tool and service vendors including iDQ A/S will meet openly and challenge each other in the development of the most powerful ideas for new ways to create valuable knowledge based on open public sector data.

Bookmark and Share

External Events, MDM and Data Stewardship

Exploiting external data is an essential part of party master data management as told in the post Third-Party Data and MDM.

TimingExternal data supports data quality improvement and prevention of party master data by:

  • Ensuring accuracy of party master data entities best at point of entry but sometimes also by later data enrichment
  • Exploring relationships between master data entities and thereby enhance the completeness of party master data
  • Keeping up the timeliness of party master data by absorbing external events in master data repositories

External events around party master data are:

Updating with some of these events may be done automatically and some events requires manual intervention.

Right now I’m working with data stewardship functionality in the instant Data Quality MDM Edition where the relocation event, the deceased event and other important events in party master data life-cycle management is supported as part of a MDM service.

Bookmark and Share

EU to regulate the term ”big data”

Today it has been announced that the European Union will regulate the use of the term “big data”.

“Volumes of misuse of the term big data has gone way over what is acceptable” says an EU spokesperson. Therefore the Commission will initiate a snap roadmap for legislation leading to that every use of the term big data has to be approved by the authorities beforehand.

A variety of ways to declare that your use of the term big data has been approved will be put into force for the different languages used within the Union. So far France has announced that “big data appellation d’originalité contrôlée” will be used there.

Velocity is the word that best describes the planned process for clamping down on the misuse of the term big data. As soon as in 2020 every member state must have started the legislation process and not later than 2025 the rules must be implemented in national laws. However there is a great deal of skepticism over if things could move that fast.

Say big data one more time

Bookmark and Share

Winning by Sharing Data

When I changed my laptop a few months ago, it was the easiest migration to a new computer ever.

Basically I just had to connect to all the services in the cloud I had been using before and for many services the path was to get connected to Google+, Twitter and FaceBook and then connect to many other services via these connections.

ShareThis was a personal win.

Most of the teams I am working with are sharing their data with me in the cloud. As in the bad old days I do not have to call and ask for progress on this and that. I can check the status myself and even get notifications on my phablet when a colleague completes a task.

ShareThis is a shared win.

Within my profession being data quality improvement and Master Data Management (MDM) sharing data is going to be a winning path too as told in the post Sharing is the Future of MDM.

There are several ways of sharing master data like using commercial third party data, digging into open government data, having your own data locker and relying on social collaboration. These options are examined in the post Ways of Sharing Master Data.

Bookmark and Share

Identity Resolution and Social Data

Fingerprint
Identity Resolution

Identity resolution is a hot potato when we look into how we can exploit big data and within that frame not at least social data.

Some of the most frequent mentioned use cases for big data analytics revolves around listening to social data streams and combine that with traditional sources within customer intelligence. In order to do that we need to know about who is talking out there and that must be done by using identity resolution features encompassing social networks.

The first challenge is what we are able to do. How we technically can expand our data matching capabilities to use profile data and other clues from social media. This subject was discussed in a recent post on DataQualityPro called How to Exploit Big Data and Maintain Data Quality, interview with Dave Borean of InfoTrellis. In here InfoTrellis “contextual entity resolution” approach was mentioned by David.

The second challenge is what we are allowed to do. Social networks have a natural interest in protecting member’s privacy besides they also have a commercial interest in doing so. The degree of privacy protection varies between social networks. Twitter is quite open but on the other hand holds very little usable stuff for identity resolution as well as sense making from the streams is an issue. Networks as Facebook and LinkedIn are, for good reasons, not so easy to exploit due to the (chancing) game rules applied.

As said in my interview on DataQualityPro called What are the Benefits of Social MDM: It is a kind of a goldmine in a minefield.

Bookmark and Share