Precisely one year ago I wrote a post called Single Company View examining the challenges of getting a single business partner view in business-to-business (B2B) party master data.
Yesterday Robert Hawker of Vodafone made a keynote at the MDM Summit Europe 2012 telling about supplier master data management.
One of the points was that sometimes you really want the exactly same real world entity to be two golden records in your master data hub, as there may be totally different business activities made with the same legal entity. The Vodafone example was:
- Having an antenna placed on the top of a building owned by a certain company and thus paying a fee for that
- Buying consultancy services from the same company
I have met such examples many times when doing data matching as told in the post Entity Revolution vs Entity Evolution.
However at one occasion, many years ago, I worked in a company where not having a single business partner view nearly became a small disaster.
Our company delivered software for membership administration and was at the same time a member of an employer organisation that also happened to be a customer.
A new director got the brilliant idea, that cancelling the membership of the employer organization was an obvious cost reduction.
The cancellation was sent. The employer organisation confirmed the cancellation adding, that they were very sorry that internal business rules at the same time forced them to not being a customer anymore.
Cancellation was cancelled of course and damage control was initiated.
I always wanted to make the above headline, but unfortunately one of the hardest things to do is documenting the direct link between data quality improvement and competitive advantage. Apart from the classic calculation of the cost of returned direct mails most other examples have circumstantial evidences, but there is no smoking gun.
Then yesterday I stumbled upon an example with a different angle. A travel company issued a press release about that new strict rules requires that your name on the flight ticket have to be exactly spelled the same and hold the same name elements as in your passport. So if you made a typo or missed a middle name on your self registration you have to make a correction. Traditional travel companies do that for free, but low-cost airlines may charge up to 100 Euros (often more than the original ticket price) for making the correction.
So traditional travel companies invokes a competitive advantage in allowing better data quality – and the low-cost airlines are making profit from bad data quality.
When making the baseline for customer data in a new master data management hub you often involve heavy data matching in order to de-duplicate the current stock of customer master data, so you so to speak start with a cleansed duplicate free set of data.
I have been involved in such a process many times, and the result has never been free of duplicates. For two reasons:
- Even with the best data matching tool and the best external reference data available you obviously can’t settle all real world alignments with the confidence needed and manual verification is costly and slowly.
- In order to make data fit for the business purposes duplicates are required for a lot of good reasons.
Being able to store the full story from the result of the data matching efforts is what makes me, and the database, most happy.
The notion of a “golden record” is often not in fact a single record but a hierarchical structure that reflects both the real world entity as far as we can get and the instances of this real world entity in a form that are suitable for different business processes.
Some of the tricky constructions that exist in the real world and are usual suspects for multiple instances of the same real world entity are described in the blog posts:
The reasons for having business rules leading to multiple versions of the truth are discussed in the posts:
I’m looking forward to yet a party master data hub migration next week under the above conditions.
The title of this blog post is stolen from/was inspired by a post on the Nation of Why Not blog. The Nation of Why Not is the branded name of Royal Caribbean. Royal Caribbean operates among a lot of other vessels the world’s two largest cruise ships: ‘Oasis of the Seas’ and ‘Allure of the Seas’. The youngest ship ‘Allure of the Seas’ has just left the shipyard in Turku, Finland and passed under the Great Belt Bridge in grey Danish waters on the way to the blue Caribbean Sea.
The Oasis and Allure are sister ships supposed to have exactly the same dimensions. But according to the official measures by DNV, Allure is 50 millimeters longer than Oasis. This has led to some teasing between the crews and now it has been suggested that NASA should make a new measurement (from up above I guess).
This is a good old classic data quality issue. Is it acceptable to assume that two similar things have the same attributes? Or do you need to measure each thing separately? And is an eventual difference a difference in the real world or a difference in measurement?
Now, with the ships I think they are a bit different anyway, as I see that the new ship Allure opposite to Oasis also have a Samba Grill, Rita’s Cantina and a Starbucks café inside.
A while ago I wrote a short blog post about a tweet from the Gartner analyst Ted Friedman saying that clients are disappointed with the ability to support wide deployment of complex business rules in popular data quality tools.
Speaking about popular data quality tools; on the DataFlux Community of Experts blog Founder of DataQualityPro Dylan Jones posted a piece this Friday asking: Are Your Data Quality Rules Complex Enough?
Dylan says: “Many people I speak to still rely primarily on basic data profiling as the backbone of their data quality efforts”.
The classic answers to the challenge of complex business rules are:
- Relying on people to enforce complex business rules. Unfortunately people are not as consistent in enforcing complex rules as computer programs are.
- Making less complex business rules. Unfortunately the complexity may be your competitive advantage.
In my eyes there is no doubt about that data quality tool vendors has a great opportunity in research and development of tools that are better at deploying complex business rules. In my current involvement in doing so we work with features as:
- Deployment as Service Oriented Architecture components. More on this topic here.
- Integrating multiple external sources. Further explained here.
- Combining the best algorithms. Example here.
I am currently involved in a data management program dealing with multi-entity (multi-domain) master data management described here.
Besides covering several different data domains as business partners, products, locations and timetables the data also serves multiple purposes of use. The client is within public transit so the subject areas are called terms as production planning (scheduling), operation monitoring, fare collection and use of service.
A key principle is that the same data should only be stored once, but in a way that makes it serve as high quality information in the different contexts. Doing that is often balancing between the two ways data may be of high quality:
- Either they are fit for their intended uses
- Or they correctly represent the real-world construct to which they refer
Some of the balancing has been:
For some intended uses you don’t have to know the precise identity of a passenger. For some other intended uses you must know the identity. The latter cases at my client include giving discounts based on age and transport need like when attending educational activity. Also when fighting fraud it helps knowing the identity. So the data governance policy (and a business rule) is that customers for most products must provide a national identification number.
Like it or not: Having the ID makes a lot of things easier. Uniqueness isn’t a big challenge like in many other master data programs. It is also a straight forward process when you like to enrich your data. An example here is accurately geocoding where your customer live, which is rather essential when you provide transportation services.
You may use a range of different coordinate systems to express a position as explained here on Wikipedia. Some systems refers to a round globe (and yes, the real world, the earth, is round), but it is a lot easier to use a system like the one called UTM where you easily may calculate the distance between two points directly in meters assuming the real world is as flat as your computer screen.
I am not a royalist, but anyway: Today 16th April 2010 is the 70 years birthday of Queen Margrethe II of Denmark. Congratulations Your Majesty.
Having a queen (or king) and a royal family is a good example of that there are always exceptions. As a matter related to data quality: I would say that every person in our country has a first (given) name and a last (family) name. But the royal family hasn’t a last name – only they have some first names like those of Her Majesty being Margrethe Alexandrine Þórhildur Ingrid. (By the way: The third name is actually Icelandic; I guess that explains the ash cloud sent as a greeting from there.)
There are always exceptions. We may define data quality validation rules from here to doomsday – there will always be exceptions. We may write down business rules from now to eternity – tomorrow you will encounter the first exception. Data quality (and democracy) is never perfect – but it’s worth striving for.
When finding or avoiding duplicates or doing similar kind of consolidation with party master data you will encounter lots of situations, where it is disputable what to do.
The “political correct” answer is: Depends on your business rules.
Yea right. Easier said than done.
Often you face the following:
- Business rules doesn’t exist. Decisions are based on common sense.
- Business rules differs between data providers.
Lets have an example.
We have these business rules (Owner, Brief):
Finance, No sales and deliveries to dissolved business entities
Logistics, Access to premises must be stated in Address2 if different from Address1
Sales, Every event must be registered with an active contact
Customer Service, In case of duplicate contacts the contact with the first event date wins
In a CRM system we have these 2 accounts (AccountID, CompanyName, Address1, Address2, City):
1, Restaurant San Remo, 2 Main Street, entrance thru no 4, Anytown
2, Ristorante San Remo, 2 Main Street, , Anytown
Also we have some contacts (AccountID, ContactID, JobTitle, ContactName, Status, StartYear. EventCount):
1, 1, Manager, Luigi Calda, Inactive, 2001, 2
1, 2, Chef de la Cusine, John Hothead, Active, 2002, 87
2, 1, Chef de la Cuisine, John Hothead, Duplicate, 2008, 2
2, 2, Owner, Gordon Testy, Active, 2008, 7
We are so lucky that a business directory is available now. Here we have (NationalID, Name, Address, City, Owner, Status):
3, Ristorante San Remo, 2 Main Street, Anytown, Luigi Calda, Dissolved
4, Ristorante San Remo, 2 Main Street, Anytown, Gordon Testy, Active
So, I don’t think we will produce a golden view of this business relationship based on the data (structure) available and the business rules available.
Building and aligning business rules and data structures to solve this example – and a lot of other examples with different challenges – may seem difficult and are often omitted in the name of simplicity. But:
- Master data – not at least business partners – is a valuable asset in the enterprise, so why treat it with simplicity while we do complex handling with a lot of other (transaction) data.
- Common sense may help you a lot. Many of these questions are not specific to your business but are shared among most other enterprises in your industry and many others in the whole real world.
- I guess the near future will bring increased number of available services with software and external data support that helps a lot in selecting common business rules and apply these in the master data processing landscape.
Say you are an organisation within charity fundraising. Since many years you had a membership database and recently you also introduced an eShop with related accessories.
The membership database holds the following record (Name, Address, City, YearlyContribution):
- Margaret & John Smith, 1 Main Street, Anytown, 100 Euro
The eShop system has the following accounts (Name, Address, Place, PurchaseInAll):
- Mrs Margaret Smith, 1 Main Str, Anytown, 12 Euro
- Peggy Smith, 1 Main Street, Anytown, 218 Euro
- Local Charity c/o Margaret Smith, 1 Main Str, Anytown, 334 Euro
Now the new management wants to double contributions from members and triple eShop turnover. Based on the recommendations from “The One Truth Consulting Company” you plan to do the following:
- Establish a platform for 1-1 dialogue with your individual members and customers
- Analyze member and customer behaviour and profiles in order to:
- Support the 1-1 dialogue with existing members and customers
- Find new members and customers who are like your best members and customers
As the new management wants to stay for many years ahead, the solution must not be a one-shot exercise but must be implemented as a business process reengineering with a continuous focus on the best fit data governance, master data management and data (information) quality.
So, what are you going to do with your data so they are fit for action with the old purposes and the new purposes?
Recently I wrote some posts related to these challenges:
Any other comments on the issues in how to do it are welcome.
The goal of data quality improvement is often set as ”fit for purpose”. The first purpose addressed will almost naturally be within the domain where the data in question are captured. Then you address other domains where the same data also may be used, but probably with other purposes leading to additional or varying measures for fitness.
If an organisation identifies several domains where the same data are used the normal approach will be to gather all purposes and then start to align all the needs, find the highest common denominators and so on. This may be a very cumbersome process as you need to consider all the different dimensions of data quality: uniqueness, completeness, timeliness, validity, accuracy, consistency.
Another way will be to assume that if you gather many purposes the total needs will almost certainly tend to be a reflection of the real world objects to which the data refer.
So my thesis is, that there is a break even point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.
Master Data are often used in many different functions in an organisation and not at least party data – names and addresses – are known to be a focus area for data quality improvement. Here it is very obvious that real world objects exists and they are basically the same to every organisation.
Earlier this year I wrote an entry on dataqualitypro about possibilities with external party reference data: http://www.dataqualitypro.com/data-quality-home/external-reference-data-an-overview.html
In my previous post on this blog I noticed that governments around the world are releasing data stores that surely add traction to the real world approach to data quality improvement.
I will for sure touch this subject in forthcoming posts on this blog.