Multi-Domain MDM, Santa Style

How would a Multi-Domain Master Data Management (MDM) solution look like at Santa Claus’s organization?

julemandenI think it may look like this:

Santa’s MDM solution covers all 4 classic domains:

  • Party
  • Product
  • Location
  • Calendar

Party

A main business improvement achieved through Santa’s MDM solution is better Nice or Naughty management. The old CRM system didn’t have a dedicated field for Nice or Naughty assignment, so this information was found in many different fields used during the years including as part of a street address or as a “send Christmas card” check mark. Today Santa handles Nice and Naughty information including historical tracking as a kid may be Nice one year but Naughty the next. This also helps with predictive analysis for future present demand. Ho ho ho.

Party master data management at Santa’s also includes keeping track of all the business partners as manufacturers of toys and other stuff, the shopping malls where Santa has to sit in December and so on. A given legal entity may have different roles in different business processes. For example a reindeer insurance company may also require Santa’s presence at the company’s Christmas tree family party.

Product

Product Information Management (PIM) has always been a complex operation at Santa’s. In Wish List Fulfillment (Wishful) you may have kids wishing for the same thing with different wording. The new MDM solutions flexible hierarchy management features helps a lot when the wishes are matched with specifications obtained by the purchase elves. At Santa’s they increasingly work with the suppliers in sharing complete and timely product descriptions and specifications.

Location

Handling location information relates to different locations where Santa is supposed to live be that at the North Pole, in Greenland, in Lapland or any other believes as discussed in the post Notes about the North Pole.

Also related to knowing where to deliver all the presents Santa has realized that maintaining an address as part of the record for each boy and girl isn’t the best way. Today each boy and girl record has a relation with a start and end date to a location entity where location specific information, including precise chimney positions, are kept.

Calendar

Christmas present delivery timing is crucial for Santa. In some countries Christmas morning the 25th December is the right time for the stuff to be there. In other countries Christmas evening the 24th December is the right time. Add to that doing present delivery across all time zones. Ho ho ho.

The MDM implementation at Santa’s has indeed helped a lot with Santa Quality. But it is an ongoing journey.

Right now Santa is looking for a smart Information management firm to help with defining to what time zone the North Pole belongs.

Anyone out there?

Bookmark and Share

The Letter Æ

This blog is written in English. Therefore the letters used are normally restricted to A to Z.

The English alphabet is one of many alphabets using Latin (or Roman) letters. Other alphabets like the Russian uses Cyrillic letters. Then there are other script systems in the world which besides alphabets are abjads, abugidas, syllabic scripts and symbol scripts. Learn more about these in the post Script Systems.

Æ, which in lower case is æ, was part of the old English alphabet. For example an old English king was called Æthelred the Unready.

The letter Æ is a combined AE and is pronounced in English as the first letter in Edmund and Edward.

Today Æ exists in a few alphabets: The Danish/Norwegian, the Faroese and the Icelandic. People and places from the corresponding Viking territories  may have the letter Æ/æ as part of the string. For example the home of Microsoft Dynamics AX and NAV is the town Vedbæk north of Copenhagen. When represented in the English alphabet the town name will be Vedbaek.

So Vedbæk and Vedbaek should be a 100% match when doing data matching. And so should Vedbæk and Vedb%C%A6k when systems are as bad as Æthelred the Unready was in handling the Vikings.

And oh, Æthelred wasn’t actually unready. He was unræd meaning bad-counseled.

Bookmark and Share

Cross Border Data Quality

In data quality improvement you always have to find a balance between the almost impossible, and usually not sensible, vision of achieving zero percent defects and the good old 80-20 rule about aiming at the 80% most frequent issues and leaving the 20% not so frequent issues to a random fate.

One of the issues that usually falls into the 20% neglected issues is cross border challenges with contact master data.

In a recent blog post on the Postcode Anywhere blog Graham Rhind describes the data quality flaws arising from his relocation from Holland in the Netherlands to Germany. The post is called Validate … intelligently.

Personally I have had a lot of similar issues when moving from Denmark to England in the United Kingdom as for example described in the post Staying in Doggerland.

My guess is that we will see an increasing demand for cross border data quality services not at least as regulators are increasingly looking into cross border issues. The FATCA regulation from the United States tax authorities is an example as described in the post The Taxman: Data Quality’s Best Friend.

As globalization moves forward organizations will increasingly work cross border, people will move between countries and more frequently live in one country and work in another country and buy services in another country. In coping with this reality you can’t keep up with data quality by just using a National Change of Address service and other data quality services focused on and optimized for a single country.

Bookmark and Share

Sometimes Google Translate is a Foolish Friendship

This morning I stumbled upon an article in a Norwegian online newspaper. A rather unlikely incident actually happened to a driver, as he avoided hitting an elk on the road, but then ran into a bear.

The original text in Norwegian is here:

As I wanted to see how that would be in English, I hit the Google Translate button:

In the headline the two animals are translated from “elg” to “elk” and from “bjørn” to “bear”. Very well.

But in the subtitle the two words are translated differently. Now “elg” is “moose” and “bjørn” is “disservice”.

Hmmm…

Not sure why elk is substituted to moose. The two words are used synonymously. As I understand it, it must have been a moose, which is called an elk. Wkipedia has the details here.

But how did the bear become a disservice. Well, I guess it relates to an old fable called “The Bear and the Gardener” or the variant “The Hermit and the Bear”. Here a human becomes friend with a bear. While the man takes a nap, the bear helps driving off the flies, but eventually crushes the mans head in doing so. The moral is that you should not make foolish friendships.

In Danish/Norwegian such a well-meant but very bad attempt to help is a “bear’s service” (bjørnetjeneste) also known in German as a bärendienst. Just like Google Translate in this case became a disservice.

Bookmark and Share

Naming the Olympians

The British newspaper The Guardian has a feature on their website where you can get data about the Olympians. Link here: London 2012 Olympic athletes: the full list.

Browsing the list is a good reminder of the world-wide diversity we have with person names.

The names are here formatted with the surname(s) followed by the given name(s). The surname is in upper case.

The sequence of names is for the Chinese and other East Asian Olympians like they are used to opposite to other Olympians from places where we have the first name being the given name and last name being our surname.

Having the surname in upper case also shows where Olympians have two surnames as it is custom in Spanish cultures.

And oh yes. The South African guy has JIM as his surname.

Finally from this screen shot there is a good question. Is JIANG Wenwen superb at both synchronized swimming and track cycling – or is it two different Olympians with the same name. Some names are very common in China. A little goggling tells me it is two different persons. The synchronized swimmer is more related to her twin sister and swimming partner JIANG Tingting.

Let’s check if there is more than one “John Smith”.

Nope.

But it could be fun if “Kim Smith” and “Kimberley Smith” came from the same country.

Many Olympians actually don’t have the names reflected in this sheet as many have names in a different alphabet or script system.

The Danish cycling rider “SORENSEN Nicki” actually share my last name, as we know him as “Nicki Sørensen”. The Serbs, Ukrainians and Russian Olympians have their original name in the Cyrillic alphabet, but they have been transliterated to the English alphabet and Olympians from countries with other script systems than an alphabet have had their names gone through a transcription to the (English) alphabet.

So, is the list bad data quality?

Bookmark and Share

The Big Tower of Babel

3 years ago one of the first blog posts on this blog was called The Tower of Babel.

This post was the first of many posts about multi-cultural challenges in data quality improvement. These challenges includes not only language variations but also different character sets reflecting different alphabets and script systems, naming traditions, address formats, measure units, privacy norms, government registration practice to name some of the ones I have experienced.

When organizations are working internationally it may be tempting to build a new Tower of Babel imposing the same language for metadata (probably English) and the same standards for names, addresses and other master data (probably the ones of the country where the head quarter is).

However, building such a high tower may end up the same way as the Tower of Babel known from the old religious tales.

Alternatively a mapping approach may be technically a bit more complex but much easier when it comes to change management.

The mapping approach is used in the Universal Postal Unions’ (UPU) attempt to make a “standard” for worldwide addresses. The UPU S42 standard is mentioned in the post Down the Street. The S42 standard does not impose the same way of writing on envelopes all over the world, but facilitates mapping the existing ways into a common tagging mapped to a common structure.

Building such a mapping based “standard” for addresses, and other master data with international diversity, in your organization may be a very good way to cope with balancing the need for standardization and the risks in change management including having trusted and actionable master data.

The principle of embracing and mapping international diversity is a core element in the service I’m currently working with. It’s not that the instant Data Quality service doesn’t stretch into the clouds. Certainly it is a cloud service pulling data quality from the cloud. It’s not that that it isn’t big. Certainly it is based on big reference data.

Bookmark and Share

Sharing Bigger Data

Yesterday I attended an event called Big Data Forum 2012 held in London.

Big data seems to be yet a buzzing term with many definitions. Anyway, surely it is about datasets that are bigger (and more complex) than before.

The Olympics is Going to be Bigger

One session on the big data forum was about how BBC will use big data in covering the upcoming London Olympics on the BBC website.

James Howard who I know as speckled_jim on Twitter told that the bulk of the content on the BBC Sports website is not produced by BBC. The data is sourced from external data providers and actually also the structure of the content is based on the external sources.

So for the Olympics there will be rich content about all the 10,000 athletes coming from all over the world. The BBC editorial stuff will be linked to this content of course emphasizing on the British athletes.

I guess that other broadcasting bodies and sports websites from all over the world will base the bulk of the content from the same sources and then more or less link targeted own produced content in the same way and with their look and feel.

There are some data quality issues related to sourcing such data Jim told. For example you may have your own guideline for how to spell names in other script systems.

I have noticed exactly that issue in the news from major broadcasters. For example BBC spells the new Egyptian president Mursi while CNN says his name is Morsi.

Bigger Data in Party Master Data Management

The postal validation firm Postcode Anywhere recently had a blog post called Big Data – What’s the Big Deal?

The post has the well known sentiment that you may use your resources better by addressing data quality in “small data” rather than fighting with big data and that getting valid addresses in your party master data is a very good place to start.

I can’t agree more about getting valid addresses.

However I also see some opportunities in sharing bigger datasets for valid addresses. For example:

  • The reference dataset for UK addresses typically based on the Royal Mail Postal Address File (PAF) is not that big. But the reference dataset for addresses from all over the world is bigger and more complex. And along with increasing globalization we need valid addresses from all over the world.
  • Rich address reference data will be more and more available. The UK PAF file is not that big. The AddressBase from Ordnance Survey in the UK is bigger and more complex. So are similar location reference data with more information than basic postal attributes from all over world not at least when addressed together.
  • A valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address. Therefore you often need to combine address reference data with business directories and consumer/citizen reference sources. That means bigger and more complex data as well.

Similar to how BBC is covering the Olympics my guess is that organizations will increasingly share bigger public address, business entity and consumer/citizen reference data and link private master data that you find more accurate (like the spelling example) along with essential data elements that better supports your way of doing business and makes you more competitive.

My recent post Mashing Up Big Reference Data and Internal Master Data describes a solution for linking bigger data within business processes in order to get a valid address and beyond.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share

Obscure Date and Time Formats

Date and time can be represented in many ways.

Here are some of the peculiar ones:

Roman Numerals

The Romans had a numbering system where letters from the Latin alphabet signified a value. Roman numerals are still used around the clock and many times for expressing a year something is build, written or made.

This year being 2012 in Arabic numerals is MMXII in Roman numerals. Next year is MMXIII and the year after is of course MMXIIII. No wait, it is MMXIV.

The 12-Hour Clock

A day consists of 24 hours. So naturally 5 hours into the day will be 5:00 and 17 hours into the day will be 17:00. But no. Several countries around the world still stick to the 12-hour clock writing 5:00 AM and 5:00 PM. And in most countries verbal use of the 12-hour clock is common.

The American Date Format

A date consists of three elements: Day, Month and Year.

So to most of the world yesterday the 1st June 2012 will be: 01/06/2012

If you insist using an ISO standard, you’ll do it backward: 2012-06-01

However, if you are from the United States, you’ll do it awkward: 06/01/2012

Even if you are a US data quality tool vendor selling to the whole world, you will still do it awkward:

Blog post published 1st June 2012. Flip that date! – as it will be 6th January to the rest of the world.

Best practice will be writing June 1st 2012 or in other way avoiding ambiguity.

Bookmark and Share

Häagen-Dazs Datakvalitet

There is a term called foreign branding. Foreign branding is describing an implied cachet or superiority of products and services with foreign-sounding names

Häagen-Dazs ice cream is an example of foreign branding. Though the brand was established in New York the name was supposed to sound Scandinavian.

However, Häagen-Dazs does sound and look somewhat strange to a Scandinavian. The reason is probably that the constellation of the letters “äa” and “zs” are not part of any native Scandinavian words.

By the way, datakvalitet is the Scandinavian compound word for data quality.

Getting datakvalitet right in world wide data isn’t easy. What works in some countries doesn’t work in other countries, not at least when we are talking datakvalitet regarding party master data such as customer master data, supplier master data and employee master data.

One of the reasons why datakvalitet for party master data is different is the various possibilities with applying big reference data sources. For example the availability of citizen data is different in New York than in Scandinavia. This affects the ways of reaching optimal datakvalitet as reported in the post Did They Put a Man on the Moon.

As part of the ongoing globalization handling international datakvalitet is becoming more and more common. Many enterprises try to deploy enterprise wide datakvalitet initiatives and shared service centers handles party master data uncommon to the people working there. This often results in finding a strange word like Häagen-Dazs.

Bookmark and Share