Disciplines come and go in the data management world. Here is a mind map of the disciplines on top of my mind today. Some of the disciplines goes back to the emerge of IT in the previous millennium and some have risen during the latest years.

What is data quality anyway? This question has been touched many times on this blog.
Data quality can be assessed using a range of data quality dimensions – the ones coloured green in the above mind map. These dimensions relate in different ways to various data domains as examined in the post Multi-Domain MDM and Data Quality Dimensions.
Data quality can be managed using a toolbox of sub disciplines – as the ones coloured turquoise in the above mind map. The reasons for data cleansing was discussed in the blog post Top 5 Reasons for Downstream Cleansing. Data profiling was visited in the post Data Quality Tools Revealed along with data matching. The relationship between data matching and identity resolution was recently described in the post Data Matching and Real-World Alignment.
The data quality discipline is closely related to – the yellow coloured – other disciplines as data modelling, Reference Data Management (RDM), Master Data Management (MDM), metadata management and – if not a sub discipline of – data governance as also shown in the post A Data Management Mind Map.
This blog is about Data Quality 3.0, Product Data Syndication Freedom, Multienterprise MDM – and many more data management topics.
These topics and the many more data management topics I have been around looks like the mind map below:
If I can be of any help to you in the data management realm, here are some Popular Offerings.
A core attribute in customer master data when dealing with business entities is assigning values for your customers/prospects industry vertical (or Line-of-Business or market segment or whatever metadata name you like).
When handling this particular data element you will come across many of the classic different options in data and information management.
Unstructured versus structured
Many early CRM (Customer Relationship Management) implementations offered a free text field for the industry vertical. While this approach may have been good for the free flow in data entry it of course has created havoc when business intelligence was applied to the CRM data. Countless cleansing projects have been done (and is going on) around in order to fix this basic mistake.
Most data entry forms today having an industry vertical value has a value list to choose from.
Your list versus an external standard
When having a value list it may be a list of your own creation or be based on an external standard list, for example SIC or NACE codes.
Having a list of your own tends to fulfill the data quality principle of fit for purpose of use while an external standard tends to fulfill the data quality principle of reflecting the real world construct.
The main weaknesses of a list of your own are that it requires continuous manual based maintenance and may cause conflicts. Deep down into a discussion on the Initiate MDM blog Julian Schwarzenbach offered a good example saying:
“I have also come across ‘flip-flop’ data – which is typically subjective data where two users cannot agree what the correct value is and it keeps getting changed between two values. This could be the classification of a customer by market sector where two different territories are reflecting different capabilities in their territories.” – Link here.
The main weaknesses of an external standard are that they seldom offer the granularity you need and for global data the different standards (SIC versions and different national NACE implementations and others) are a pain in the…
One versus several values
Many companies have more than one distinct activity. Catching only one (the primary) value for each company is keeping it simple, stupid. Having more than one value in relevant cases is adding complexity but may lead to better decisions.
Today we are only one month from the start of the biggest single-sport event in the world this year: The 2010 FIFA World Cup taking place in South Africa.
Now, shouldn’t the name be The Football World Cup?
Well, the problem is that football is a different game in some parts of the world like football is considered what is American Football in Northern America and Australian Football down under. The football we now in most other parts of the world is known as soccer in these areas. Association Football is the technically correct name, which is also why the acronym FIFA is an abbreviation of Fédération Internationale de Football Association which is French for International Federation of Association Football.
So, to avoid confusion the FIFA World Cup is the common – and official – name of the event.
Such naming difficulties are a very common source of information quality issues. In my work with global party master data I meet the naming issue daily – or on a daily basis as some might put it. Examples:
The discipline concerning with unique naming of data is called Metadata Management – or Meta Data Management by some.
Today I will like to invent a new word.
The word ”Meterencedata” is a combination of the two terms:
Metadata is data about data. Roughly spoken; in relation to databases and spreadsheets metadata describes what is in the columns.
Reference Data are high level value lists that categorize the data. Roughly spoken; in relation to databases and spreadsheets reference data explains what is in the rows.
Data Management activities – like Data Quality improvement, Master Data Management and Data Migration – will be (and have I seen are) like working in the dark if you don’t know the Metadata – and the Reference Data.
Data Models may look different. Some information may be understood through metadata in a model but through reference data in another model.
Example:
In the latter case the original phone types may have been the classic fixed line, cell and fax but the entries may have been extended over time as the real world changes. This model also reflects the reality of several same type numbers attached to a single party.
Conclusion: One man’s Metadata is another man’s Reference Data as you don’t meet and mete out the data equal ways.
If you ask me the question ”How many people live in your town?” I could give you a correct answer being 5,000 % besides what you are looking for.
I live in Greve Municipality in Denmark. Population close to 48,000. Greve is a suburb south of Copenhagen. According to Wikipedia Copenhagen urban area has a population of 1.2 million and Copenhagen metro area has a population of 1.9 million people.
The Copenhagen metro area stretches from 40 km (20 miles) south of the city to 40 km (20 miles) north at Elsinore and Kronborg Castle (immortalized in Shakespeare’s Hamlet – always remember to include Shakespeare in a blog).
Further more: From Copenhagen you can look across the water to the east seeing Sweden and the city Malmoe. The Copenhagen-Malmoe bi-national urban agglomeration has a total population of 2.5 million people.
The real data quality issue in my initial question is not the precision, validity and timeliness in the number given in the answer but the shared understanding of the label attached to the number.
I noticed that Wikipedia has developed a good metadata habit when stating town populations giving 3 distinct labels: City, Urban and Metro.