The term American exceptionalism is born in the political realm but certainly also applies to other areas including data management.
As a lot of software and today cloud services are made in the USA, the rest of world has some struggle with data standards that only or in high degree applies to the United States.
Some of the common ones are:
In the United States Fahrenheit is the unit of temperature. The rest of the world (with a few exceptions) use Celsius. Fortunately many applications has the ability of switching between those two, but it certainly happens to me once in a while that I uninstall a new exciting app because it only shows temperature in Fahrenheit, and to me 30 degrees is very hot weather.
The Month-Day-Year date format is another American exceptionalism in data management. When dates are kept in databases there is no problem, as databases internally use a counter for a date. But as soon as the date slips into a text format and are used in an international sense, no one can tell if 10/9/2014 is the 10th September as it is seen outside the United States or 9th October as it is seen inside the United States. For example it took LinkedIn years before the service handled the date format accordingly to their international spread, at there are still mix-ups.
Having a state as part of a postal address is mandatory in the United States and only shared with a few other countries as Australia and Canada, though the Canadians calls the similar concept a province. The use of a mandatory state field with only US states present is especially funny when registering online for a webinar about an international data quality solution.
In order to have all my travel arrangements in one place I use a service called TripIt. When I receive eMail confirmations from airlines, hotels, train planners and so, I simply forward those to email@example.com, and within seconds they build or amend to an itinerary for me that is available in an app.
Today I noticed a slight flaw though. I was going by train from London, UK up to the Midlands via a large town in the UK called Reading.
The strange thing in the itinerary was that the interchanges in Reading was placed in chronology after arriving at and leaving the final destination.
A closer look at the data revealed two strange issues:
- Reading was spelled Reading, PA
- The time zone for the interchange was set to EST
Hmmm… There must be a town called Reading in Pennsylvania across the pond. Tripit must, when automatically reading the eMail, have chosen the US Reading for this ambiguous town name and thereby attached the Eastern American time zone to the interchange.
Picking the right Reading for me in the plan made the itinerary look much more sensible.
Usually data models are made to fit a specific purpose of use. As reported in the post A Place in Time this often leads to data quality issues when the data is going to be used for purposes different from the original intended. Among many examples we not at least have heaps of customer tables like this one:
Compared to how the real world works this example has some diversity flaws, like:
- state code as a key to a state table will only work with one country (the United States)
- zipcode is a United States description only opposite to the more generic “Postal Code”
- fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
- The length of the state, zipcode and most other fields are obviously too small almost anywhere
More seriously we have:
- fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
- company name should belong to an own party entity acting in the role as customer
- address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company
In my experience looking at the real world will help a lot when making data models that can survive for years and stand use cases different from the one in immediate question. I’m not talking about introducing scope creep but just thinking a little bit about how the real world looks like when you are modelling something in that world, which usually is the case when working with Master Data Management (MDM).
Many CRM applications have the concepts of leads, accounts and contacts for registering customers or other parties with roles in sales and customer service.
Most CRM systems have a data model suited for business-to-business (B2B) operations. In a B2B environment:
- A lead is someone who might become your customer some day
- An account is a legal entity who has or seems to become your customer
- A contact is a person that works at or in other ways represent an account
In business-to-consumer (B2C) environments there are different ways of making that model work.
The general perception is that data about a lead can be so and so while it of course is important to have optimal data quality for accounts and contacts.
However, this approach works against the essential data quality rule of getting things right the first time.
Converting a lead into an account and/or a contact is a basic CRM process and the data quality pitfalls in that process are many. To name a few:
- Is the lead a new account or did we already have that account in the database?
- Is the contact new or did we know that person maybe at another account?
- How do we align the known data about the lead with external reference data during the conversion process?
In other words, the promise of having a 360-degree customer view is jeopardized by the concept of most CRM systems.
Every year Information Difference publishes a report about the Master Data Management (MDM) Landscape. This year’s report celebrates the 10th year of MDM solutions around. Of course, the MDM industry didn’t start on a certain date 10 years ago, but the use of MDM as a common accepted notation for a branch of IT solutions within data management, and in my eyes as a much needed spinoff of the data quality discipline, was commonly being accepted.
A birthday is a good occasion to look ahead. The Information Difference report takes on some of the trends in the MDM solutions around, being that:
- Most MDM vendors today claims to be multi-domain MDM providers, but certainly they are on different stages coming from different places
- Providing MDM in the cloud is slowly but steadily adapted
- Integrating big data into MDM solutions has, in my words, reached the marketing and R&D departments at the MDM vendors and will someday also reach the professional service and accounting folks there
Read the MDM landscape Q2 2014 report from Information Difference here.
A problem in data cleansing I have come across several times is when you have some name and address registrations where it is uncertain to which country the different addresses belong.
Many address-cleansing tools and services requires a country code as the first parameter in order to utilize external reference data for address cleansing and verification. Most business cases for address cleansing is indeed about a large number of business-to-consumer (B2C) addresses within a particular country. But sometimes you have a batch of typical business-to-business (B2B) addresses with no clear country registration.
The problem is that many location names applies to many different places. That is true within a given country – which was the main driver for having postal codes around. If a none-interactive tool or service have to look for a location all over the world that gets really difficult.
For example I’m in Richmond today. That could actually be a lot of places all over the world as seen on Wikipedia.
I am actually in the Richmond in the London, England, UK area. If I were in the state capital of the US state of Virginia, I could have written I’m in “Richmond, VA”. If an international address-cleansing tool looked at that address, I guess it would first look for a country code, quickly find VA as a two-character country code in the end of the string and firmly conclude I’m at something called Richmond in the Vatican City State.
Have you tried using or constructing an international address cleansing process? Where did you end up?
When laying out data policies and data standards within a data governance program one the most important input is the business rules that exist within your organization.
I have often found that it is useful to divide business rules into two different types:
- External business rules, which are rules based on laws, regulations within industries and other rules imposed from outside your organization.
- Internal business rules, which are rules made up within your organization in order to make you do business more competitive than colleagues in your industry do.
External imposed business rules are most often different from country to country (or group of countries like the EU). Internal business rules may be that too but tend to be rules that apply worldwide within an organization.
The scope of external business rules tend to be fairly fixed and so does the deadline for implementing the derived data policy and standard. With internal business rules you may minimize and maximize the scope and be flexible about the timetable for bringing them into force and formalizing the data governance around the rules. It is often a matter of prioritizing against other short term business objectives.
The distinctions between these two kinds of business rules may not be so important in the first implementation of a data governance program but comes very much into play in the ongoing management of data policies and data standards.