Real world alignment – Page 2 – Liliendahl on Data Quality

Post No. 666

26th October 201427th October 2014Henrik Gabs Liliendahl2 Comments

This is post number 666 on this blog. 666 is the number of the beast. Something diabolic.

The first post on my blog came out in June 2009 and was called Qualities in Data Architecture. This post was about how we should talk a bit less about bad data quality and instead focus a bit more on success stories around data quality. I haven’t been able to stick to that all the time. There are so many good data quality train wrecks out there, as the one told in the post called Sticky Data Quality Flaws.

Some of my favorite subjects around data quality were lined up in Post No. 100. They are:

The role of technology in data quality improvement. This subject was discussed not long ago in the post Reading the right Reading.
Fit for purpose versus real world alignment, a subject revisited recently in the post called The “Fit for Purpose” Trap.
Diversity in data quality was touched latest in the post American Exceptionalism in Data Management.

The biggest thing that has happened in the data quality realm during the five years this blog has been live is probably the rise of big data. Or rather the rise of the term big data. This proves to me that changes usually starts with technology. Then we after sometime starts thinking about processes and finally peoples roles and responsibilities.

The “Fit for Purpose” Trap

20th October 201420th October 2014Henrik Gabs Liliendahl7 Comments

Gartner (the analyst firm), represented by Saul Judah, takes data quality back to basics in the recent post called Data Quality Improvement.

While I agree with the sentiment around measuring the facts as expressed in the post I have cautions about relying on that everything is good when data are fit for the purpose for business operations.

Some clues lies in the data quality dimensions mentioned in the post:

Accuracy (for now):

As said in the Gartner post data are indeed temporal. The real world changes and so does business operations. When you got your data fit for the purpose of use the business operations has changed. And when you got your data re-fit for the new purpose of use the business operations has changed again.

Furthermore most organizations can’t take all business operations into account at the same time. If you go down the fit for purpose track you will typically address a single business objective and make data fit for that purpose. Not at least when dealing with master data there are many business objectives and derived purposes of use. In my experience that leads to this conclusion:

“While we value that data are of high quality if they are fit for the intended use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”

Existence – an aspect of completeness:

The Gartner post mentions a data quality dimension being existence. I tend to see this as an aspect of the broader used term completeness.

For example having a fit for purpose completeness related to product master data has been a huge challenge for many organizations within retail and distribution during the last years as explained in the post Customer Friendly Product Master Data.

Omni

Fitness, Data Quality, Big Data and IT Projects

7th September 2014Henrik Gabs Liliendahl2 Comments

This weekend I’m in Copenhagen where I, opposite to when in London, enjoy a bicycle ride.

In the old days I had a small cycle computer that gave you a few key performance indicators about your ride as time of riding, distance covered, average and maximum speed. Today you can use an app on your smartphone and along the way have current figures displayed on your smartwatch.

As explained in the post American Exceptionalism in Data Management the first thing I do when installing an app is to change Fahrenheit to Celsius, date format to an useable one and in this context not at least miles to kilometers.

The cool thing is that the user interface on my smartwatch reports my usual speed in kilometer per hour as miles per hour making me 60 % faster than I used to be. So next year I will join Tour de France making Jens Voigt (aka Der Alte) look like a youngster.

A Viking tour around Roskilde and Vallø Borgring. Click for report with a wonderful mixup of date formats.

Using such an app is also a good example of why we have big data today. The app tracks a lot of data as detailed route on map with x, y and z coordinates, split speed per kilometer and other useful stuff. Analyzing these data tells me Tour de France maybe isn’t a good idea. After what I thought was 100 miles, but was 100 kilometers, my speed went from slow to grandpa.

That’s a bit like IT projects by the way. Regardless of timeframe, they slows down in progress after 80 % of plan has been covered.

Data Models and Real World Alignment

12th August 201412th August 2014Henrik Gabs Liliendahl13 Comments

Usually data models are made to fit a specific purpose of use. As reported in the post A Place in Time this often leads to data quality issues when the data is going to be used for purposes different from the original intended. Among many examples we not at least have heaps of customer tables like this one:

Compared to how the real world works this example has some diversity flaws, like:

state code as a key to a state table will only work with one country (the United States)
zipcode is a United States description only opposite to the more generic “Postal Code”
fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
company name should belong to an own party entity acting in the role as customer
address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

In my experience looking at the real world will help a lot when making data models that can survive for years and stand use cases different from the one in immediate question. I’m not talking about introducing scope creep but just thinking a little bit about how the real world looks like when you are modelling something in that world, which usually is the case when working with Master Data Management (MDM).

Data Quality Dimensions and Real World Alignment

8th May 20148th May 2014Henrik Gabs LiliendahlLeave a comment

Real world alignment is often seen as a competing measure of data quality opposite to the popular approach of data quality being seen as fitness for purpose of use.

When we try to narrow down what constitutes quality of data we may use data quality dimensions. So, how does data quality dimensions look like in the light of real world alignment? Here is a few thoughts:

Uniqueness is probably the data quality dimension that most closely relates to real world alignment as the opposite of uniqueness is duplication which in the data quality world means that two or more different data records describes the same real world entity.
Accuracy is best measured as in what degree data describes something in the real world.
Credibility was recently proposed as an important data quality dimension by Malcolm Chisholm on Information Management in the article called Data Credibility: A New Dimension of Data Quality? Here credibility is if data is without any malicious manipulation performed to fulfill an evil purpose of use.

External Events, MDM and Data Stewardship

3rd April 2014Henrik Gabs Liliendahl2 Comments

Exploiting external data is an essential part of party master data management as told in the post Third-Party Data and MDM.

Timing External data supports data quality improvement and prevention of party master data by:

Ensuring accuracy of party master data entities best at point of entry but sometimes also by later data enrichment
Exploring relationships between master data entities and thereby enhance the completeness of party master data
Keeping up the timeliness of party master data by absorbing external events in master data repositories

External events around party master data are:

When someone moves to a new address as examined in post The Relocation Event
When someone moves to another world as told in the post Undertaking in MDM
Heaps of other changes in big reference data

Updating with some of these events may be done automatically and some events requires manual intervention.

Right now I’m working with data stewardship functionality in the instant Data Quality MDM Edition where the relocation event, the deceased event and other important events in party master data life-cycle management is supported as part of a MDM service.

Omni-purpose Data Quality

22nd March 2014Henrik Gabs LiliendahlLeave a comment

A recent post on this blog was called Omni-purpose MDM. Herein it is discussed in what degree MDM solutions should cover all business cases where Master Data Management plays a part.

Master Data Management (MDM) is very much about data quality. A recurring question in the data quality realm is about if data quality should be seen as in what degree data are fit for the purpose of use or if the degree of real world alignment is a better measurement.

The other day Jim Harris published a blog post called Data Quality has a Rotating Frame of Reference. In a comment Jim takes up the example of having a valid address in your database records and how measuring address validity may make no sense for measuring how data quality supports a certain business objective.

My experience is that if you look at each business objective at a time measuring data quality against the purpose of use is sound of course. However, if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs. This is explained further within the concept of Data Quality 3.0.

Using the example of a valid address measurements, and actual data quality prevention, typically work with degrees of validity as notably:

The validity in different levels as area, entrance and specific unit as examined in the post A Universal Challenge.
The validity of related data elements as an address may be valid but the addressee is not as examined in the post Beyond Address Validation.

Data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.

Unique Data = Big Money

14th February 201415th February 2014Henrik Gabs LiliendahlLeave a comment

In a recent tweet Ted Friedman of Gartner (the analyst firm) said:

I think he is right.

Duplicates has always been pain number one in most places when it comes to the cost of poor data quality.

Though I have been in the data matching business for many years and been fighting duplicates with dedupliaction tools in numerous battles the war doesn’t seem to be won by using deduplication tools alone as told in the post Somehow Deduplication Won’t Stick.

Eventually deduplication always comes down to entity resolution when you have to decide which results are true positives, which results are useless false positives and wonder how many false negatives you didn’t catch, which means how much money you didn’t have in return of your deduplication investment.

Bringing in new and be that obscure reference sources is in my eyes a very good idea as examined in the post The Good, Better and Best Way of Avoiding Duplicates.

From B2B and B2C to H2H

28th January 201428th January 2014Henrik Gabs Liliendahl1 Comment

I stumbled upon an article from yesterday by Bryan Kramer called There is no more B2B or B2C: It’s Human to Human, H2H.

The article is about the implications for marketing caused by the rise of social media which now finally seems to eliminate what we have known as business-to-business (B2B) and more or less merges B2B and business-to-consumer (B2C).

As discussed here on the blog several times starting way back in 2009 in the post Echoes in the Database a problem with B2B indeed is that while business transactions takes place between legal entities a lot of business processes are done between employees related to the selling and buying entities. You may call that employee-to-employee (E2E), people-to-people (P2P) or indeed human-to-human (H2H).

Related to databases, data quality and Master Data Management (MDM) this means we need real world alignment with two kinds of parties:

Natural persons as examined in the post Create Table Homo_Sapiens
Legal entities as discussed in the post Select Company_ID from External_Source where possible.

While B2B and B2C may melt together in the way we do messaging the distinction between B2B and B2C will be there in many other aspects. Even in social media we see it as for example two of the most used social networks being FaceBook and LinkedIn clearly belongs mainly to B2C and B2B respectively for marketing and social selling purposes.

The different possibilities with B2B and B2C in the H2H world was touched in an interview on DataQualityPro last year: What are the Benefits of Social MDM?

Location Data Quality for MDM

21st January 2014Henrik Gabs LiliendahlLeave a comment

The location domain is after the customer, or rather party, domain and the product domain the most frequent addressed domain for Master Data Management (MDM).

In my recent work I have seen a growing interest in handling location data as part of a MDM program.

Traditionally location data in many organizations have been handled in two main ways:

As a part of other domains typically as address attributes for customer and other party entities
As a silo for special business processes that involves spatial data using Geographic Information Systems (GIS) as for example in engineering and demographic market research.

Handling location data most often involves using external reference data as location data doesn’t have the same privacy considering as party data, not at least data describing natural personals, tend to have and opposite to product data location data are pretty much the same to everyone.

MDM for the location domain is very much about bringing the two above mentioned ways of working with locations together while consistently exploiting external reference data.

As in all MDM work data quality is the important factor and the usual data quality dimensions are indeed in place here as well. Some challenges are:

Uniqueness and precision: Locations comes in hierarchies. As told in the post The Postal Address Hierarchy we when referring to textual addresses have levels as country, region, city or district, thoroughfare (street) or block, building number and unit within a building. Uniqueness may be defined within one of these levels. A discussed in the post Where is the Spot? the precision and use case for coordinates may cause uniqueness issues too.
Timeliness and accuracy: Though it doesn’t happen too often locations do change names as reported in the post MDM in LED and features on new locations does show up every day. I remember a recent press coverage in the United Kingdom over people who couldn’t get car and other insurances because the address of their newly build house wasn’t in the database at the insurance company.
Completeness and conformity: Availability of all “points of interest” in reference data is an issue. The available of all attributes of interest at the desired level is an issue too. The available formats and possible mappings between them is a usual challenge. Addresses in both local and standardized alphabets and script systems using endonyms and exonyms is a problem as told in the posts Where the Streets have Two Names and Where the Streets have one Name but Two Spellings.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph