The Countryside Data Quality Journey Through 2015

I guess this is the time for blog posts about big things that is going to happen in 2015. But you see, we could also take a route away from the motorways and highways and see how the traditional way of life is still unfolding the data quality landscape.

LostWhile the innovators and early adopters are fighting with big data quality the late majority are still trying get the heads around how to manage small data. And that is a good thing, because you cannot utilize big data without solving small data quality problems not at least around master data as told in the post How important is big data quality?

ShittertonSolving data quality problems is not just about fixing data. It is very much also about fixing the structures around data as explained in a post, featuring the pope, called When Bad Data Quality isn’t Bad Data.

No Mans LandA common roadblock on the way to solving data quality issues is that things that what are everybody’s problem tends to be no ones problem. Implementing a data governance programme is evolving as the answer to that conundrum. As many things in life data governance is about to think big and start small as told in the post Business Glossary to Full-Blown Metadata Management or Vice Versa.

UgleyData governance revolves a lot around peoples roles and there are also some specific roles within data governance. Data owners have been known for a long time, data stewards have been around some time and now we also see Chief Data Officers emerge as examined in the post The Good, the Bad, and the Ugly Data Governance Role.

As experienced recently, somewhere in the countryside, while discussing how to get going with a big and shiny data governance programme there is however indeed still a lot to do with trivial data quality issues as fields being too short to capture the real world as reported in the post Everyday Year 2000 Problems.

Wales

Bookmark and Share

Putting Two Things in One Field

A very common data quality issue is when a field in a data record is populated with more than one piece of information.

Sometimes this is done as a work around, because we have a piece of information,  but we haven’t a field with that distinct purpose of use. Then we find a more or less related existing field where in we can squeeze this additional piece of information.

But we also have some very common cases where this bad habit is required by external business rules or wide spread tradition.

Legal formsLegal Form in Company Names

This example is examined in the post Legal Forms from Hell.

One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.

An Address Line

An address line will typically hold a couple of elements as a street (thoroughfare) name, a house number and maybe some kind of unit identification.

By the way the order of street name and house number is opposite in approximately two equal parts of the world, with the exception of places where numbering within blocks between streets is the standard.

Education in Person Name

You can put professor in front of your name and even MBA – Master of Business Administration!! – after your name in the name field.

In the next few days I will put AFCM (Accidental Field Content Misuser) after my name.

Bookmark and Share

MDM Aware MDM Solutions

The concept of MDM aware applications have been around for some time. What the Master Data Management establishment, including yours truly, is hoping for, is that applications like CRM, ERP and other systems will start to utilize the master entities in MDM solutions instead of having their own more or less useful data models within data silos around master data entities as parties, products, locations and assets as well as exploiting other good structures and services in the MDM realm.

puzzleBut what about MDM solutions themselves? Are MDM solutions that smug that they don’t take in good capabilities from other MDM solutions?

One reason to do so is if a MDM vendor have several MDM solutions to offer. An example of that I experienced recently was when attending the Informatica MDM day for EMEA in London the other day. Informatica has recently acquired the Product MDM specialist firm Heiler and has therefore two MDM solutions to offer to the market. It has been too early for the newest version 10 of the general Informatica MDM solution to embrace the Heiler solution, so what I learned from one of the good now Informatica folks was that the Heiler solution is becoming MDM aware – at least aware of the Informatica MDM version 10 solution I guess.

On another front I’m working with the iDQ™ MDM Edition. Here we do have a default data model for party master entities, but we are not that smug that we can’t be aware of other MDM solutions and their capabilities in a given IT landscape. Even in the party domain.

Bookmark and Share

Bringing the Location to Multi-Domain MDM

When we talk about multi-domain Master Data Management (MDM) we often focus on the two dominant MDM domains being customer (or rather party) MDM and product (or maybe things) MDM.

The location domain is the third bigger domain within MDM. Location management can be more or less complex depending on the industry vertical we are looking at. In the utility and telco sectors location management is a big thing. Handling installations, assets and networks is typically supported by a Geographical Information System (GIS).

Master Data Management is much about supporting that different applications can have a unified view of the same core business entities. Therefore, in the utility and telco sectors a challenge is to bring the GIS application portfolio into the beat with other applications that also uses locations as explained in the post Sharing Big Location Reference Data.

Location2

The last couple of days I enjoyed taking part in the Nordic user conference for a leading GIS solution in the utility and telco sector. This solution is called Smallword.

It is good to see that at least one forward looking organization in the utility and telco sector is working with how location master data management can be shared between business functions and applications and aligned with party master data management and product master data management.

Bookmark and Share

CRM systems and Customer MDM

Last week I had some fun making a blog post called The True Leader in Product MDM. This post was about how product Master Data Management still in most places is executed by having heaps of MS Excel spreadsheets flowing around within the enterprise and between business partners, as I have seen it.

business partnersWhen it comes to customer Master Data Management MS Excel may not be so dominant. Instead we have MS CRM and the competing offerings as Salesforce.com and a lot of other similar Customer Relationship Management solutions.

CRM systems are said to deliver a Single Customer View. Usually they don’t. One of the reasons is explained in the post Leads, Accounts, Contacts and Data Quality. The way CRM systems are built, used and integrated is a certain track to create duplicates.

Some remedies out there includes periodic duplicate checks within CRM databases or creating a federated Customer Master Data Hub with entities coming from CRM systems and other databases with customer master data. This is good, but not good enough as told in the post The Good, Better and Best Way of Avoiding Duplicates.

During the last couple of years I have been working with the instant Data Quality service. This MDM service sits within or besides CRM systems and/or Master Data Hubs in order to achieve the only sustainable way of having a Single Customer View, which is an instant Single Customer View.

Bookmark and Share

Data Models and Real World Alignment

Usually data models are made to fit a specific purpose of use. As reported in the post A Place in Time this often leads to data quality issues when the data is going to be used for purposes different from the original intended. Among many examples we not at least have heaps of customer tables like this one:

Customer Table

Compared to how the real world works this example has some diversity flaws, like:

  • state code as a key to a state table will only work with one country (the United States)
  • zipcode is a United States description only opposite to the more generic “Postal Code”
  • fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
  • The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

  • fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
  • company name should belong to an own party entity acting in the role as customer
  • address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

In my experience looking at the real world will help a lot when making data models that can survive for years and stand use cases different from the one in immediate question. I’m not talking about introducing scope creep but just thinking a little bit about how the real world looks like when you are modelling something in that world, which usually is the case when working with Master Data Management (MDM).

Bookmark and Share

Foreign Addresses

The New YorkerThere is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.

The same phenomenon is often seen in data management as told in the post Foreign Affaires.

If we for example work with postal addresses we tend to think that postal addresses in our own country has a well-known structure while foreign addresses is a total mess.

In Denmark where I am born and raised and has worked most of my life we have two ways of expressing an address:

  • The envelope way where there are a certain range of possibilities especially on how to spell a street name and how to write the exact unit within a high rise building, though there is a structure more or less known to native people.
  • The code way, as every street has a code too and there is a defined structure for units (known as the KVHX code). This code is used by the public sector as well as in private sectors as financial services and utility companies and this helps tremendously with data quality.

But around 3.5 percent of Danes, including yours truly, has a foreign address. And until now the way of registering and storing those addresses in the public sector and elsewhere has been totally random.

This is going to change. The public authorities has, with a little help from yours truly, made the first standard and governance principles for foreign addresses as seen in this document (in Danish).

At iDQ A/S we have simultaneously developed Master Data Management (MDM) services that helps utility companies, financial services and other industries in getting foreign addresses right as well.

Bookmark and Share

Where to put Master Data?

The core of most Master Data Management (MDM) solutions is a master data hub. MDM solutions as those appearing in analyst reports revolves around a store for master data that is a new different place than where master data usually are. That is for example being in CRM, SCM and ERP systems.

For large organizations with a complex IT landscape having a MDM hub is usually the only sensible solution.

However for many midsize and smaller organizations, and even large organizations with a dominant ERP system as well, the choice is often naming one of the application databases to be the main master data hub for a given master data domain as customer, supplier, product and what else is considered a master data entity.

In such cases you may apply things as data quality services as described in the post Lean MDM and other master data related services as told in post Service Oriented MDM.

scaleThere are arguments for and against both approaches. The probably most used argument against the MDM hub approach is that why you should solve the issue of having X data silos with creating data silo X + 1. The argument against naming a given application as the place of master data is that an application is built for a specific purpose and therefore is not good for other purposes of master data use.

Where do you put your master data? Why?

Bookmark and Share

Do You Like the Lake?

CapgemeniToday Capgemini as a result of a co-innovation partnership with Pivotal released their take on information management in the big data era in a piece called The Principles of the Business Data Lake.

The business data lake concept is a new try on getting rid of all the excel spreadsheets business people operate because of limitations in today’s enterprise data warehouses and the business intelligence solutions sitting on top of those extracted, transformed and loaded data.

In the business data lake you load raw data including unstructured data sources. Single view and related governance is restricted to master and reference data.

It’s not that you are going to load all the data in the world in your business data lake. You will link internal and external data based on where and when needed.

Thomas Redman has made a famous metaphor in the data quality realm about a polluted lake where the best option to deal with that is to prevent polluted water from streaming into the lake. I guess the rise of big data challenges that take as told some years ago in the post Extreme Data Quality.

In the business data lake we will have polluted data. In that view I think it’s a good thing that master and reference data has a special place in the lake.

What do you think? Do you like the lake – the old and/or the new one?

Bookmark and Share

The Postal Address Hierarchy

Using postal addresses is a core element in many data quality improvement and master data management (MDM) activities.

HierarchyAs touched many times on this blog postal addresses are formatted very differently around the world. However they may all be arranged in a sort of hierarchy, where there are up to 6 general levels being:

  • Country
  • Region
  • City or district
  • Thoroughfare (street) or block
  • Building number
  • Unit within building

In addition to that the postal code (postcode or zip code) is part of many address formats. Seen in the hierarchical light the postal code is a tricky concept as it may identify a city, district, thoroughfare, a single building or even a given unit within or section of a building. The latter is true for my company address in the United Kingdom, where we have a very granular postcode system.

Country

As discussed in the post The Country List even the top level of a postal address hierarchy isn’t a simple list fit for every purpose. Some issues are:

  • There are different sources with different perceptions of which are the countries on this planet
  • What we regard as countries comes in hierarchies
  • Several coding systems are available

Region

The region is an element in some address formats like the states in the United States and the provinces in Canada, while other countries like Germany that is divided into quite independent Länder do not have the region as a required part of the postal address. The same goes for Swiss cantons.

City or district

I once read that if you used the label city in a web form in Australia, you would get a lot of values like: “I do not live in a city”.

Anyway this level is often (but as mentioned certainly not always) where the postal code is applied. The postal code district may be a single town with surroundings, several villages or a district within a big city.

Thoroughfare (street) or block

Most countries use thoroughfares as streets, roads, lanes, avenues, mews, boulevards and whatever they are called around. Beware that the same street may have several spellings and even several names.

Japan is a counterexample of the use of thoroughfares, as here it’s the blocks between the thoroughfares that are part of the postal address.

Building number

Usually this element will be an integer. However formats with a letter behind the integer (example: 21 A) or a range of integers (example: 21-23) are most annoying. And then this British classic: One Main Grove. OMG.

Unit within a building

This element may or may not be present in a postal address depending on if the building is a single family house or company site, the postal delivery sees it as such or you may actually indicate where within the building the delivery goes or you go. The ups and downs of this level are examined in the post A Universal Challenge.

Bookmark and Share