Out-of-Africa

Besides being a memoir by Karen Blixen (or the literary double Isak Dinesen) Out-of-Africa is a hypothesis about the origin of the modern human (Homo Sapiens). Of course there is a competing scientific hypothesis called Multiregional Origin of Modern Humans. Besides that there is of course religious beliefs.

The Out-of-Africa hypothesis suggests that modern humans emerged in Africa 150,000 years ago or so. A small group migrated to Eurasia about 60,000 years ago. Some made it across the Bering Strait to America maybe 40,000 years ago or maybe 15,000 years ago. The Vikings said hello to the Native Americans 1,000 years ago, but cross Atlantic movement first gained pace from 500 years ago, when Columbus discovered America again again.

½ year ago (or so) I wrote a blog post called Create Table Homo_Sapiens. The comment follow up added to the nerdish angle with discussing subjects as mutating tables versus intelligent design and MAX(GEEK) counting.

But on the serious side comments also touched the intended subject about making data models reflect real world individuals.

Tables with persons are the most common entity type in databases around. As in the Out-of-Africa hypothesis it could have been as a simple global common same structural origin. But that is not the way of the world. Some of the basic differences practiced in modeling the person entity are:

  • Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
  • Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
  • “Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Many, many data quality struggles around the world is caused by how we have modeled real world – old world and new world – individuals.

Bookmark and Share

Follow Friday Data Quality

Every Friday on Twitter people are recommending other tweeps to follow using the #FollowFriday (or simply #FF) hash tag.

My username on twitter is @hlsdk.

Sometimes I notice tweeps I follow are recommending the username @hldsk or @hsldk or other usernames with my five letters swapped.

It could be they meant me? – but misspelled the username. Or they meant someone else with a username close to mine?

As the other usernames wasn’t taken I have taken the liberty to create some duplicate (shame on me) profiles and have a bit of (nerdish) fun with it:

@hsldk

For this profile I have chosen the image being the Swedish Chef from the Muppet show. To make the Swedish connection real the location on the profile is set as “Oresund Region”, which is the binational metropolitan area around the Danish capital Copenhagen and the 3rd largest Swedish city Malmoe as explained in the post The Perfect Wrong Answer.

@hldsk

For this profile I have chosen the image being a gorilla originally used in the post Gorilla Data Quality.

This Friday @hldsk was recommended thrice.

But I think only by two real life individuals: Joanne Wright from Vee Media and Phil Simon who also tweets as his new (one-man-band I guess) publishing company.

What’s the point?

Well, one of my main activities in business is hunting duplicates in party master databases.

What I sometimes find is that duplicates (several rows representing the same real world entity) have been entered for a good reason in order to fulfill the immediate purpose of use.

The thing with Phil and his one-man-band company is explained further in the post So, What About SOHO Homes.

By the way, Phil is going to publish a book called The New Small. It’s about: How a New Breed of Small Businesses is Harnessing the Power of Emerging Technologies.

Bookmark and Share

360° Share of Wallet View

I have found this definition of Share of Wallet on Wikipedia:

Share of Wallet is the percentage (“share”) of a customer’s expenses (“of wallet”) for a product that goes to the firm selling the product. Different firms fight over the share they have of a customer’s wallet, all trying to get as much as possible. Typically, these different firms don’t sell the same but rather ancillary or complementary product.

Measuring your share of given wallets – and your performance in increasing it – is a multi-domain master data management exercise as you have to master both a 360° view of customers and a 360° view of products.

With customer master data you are forced to handle uniqueness (consolidate duplicates) of customers and handle hierarchies of customers, which is further explained in the post 360° Business Partner View.

With product master data you are not only forced to categorize your own products and handle hierarchies  within, but you also need to adapt to external categorizations in order to getting access to external data available for spending probably on a high level for a segment of customers but sometimes even possible down to the single customer.

Location master data may be important here for geographical segmentations and identification.

My educated guess is that companies will increasing rely on having better data quality and master data management processes and infrastructure in order to measure precise shares of wallets and thereby gain advantages in a stiff competition rather than relying on gut feelings and best guesses.

Bookmark and Share

Linked Data Quality

The concept of linked data within the semantic web is in my eyes a huge opportunity for getting data and information quality improvement done.

The premises for that is described on the page Data Quality 3.0.

Until now data quality has been largely defined as: Fit for purpose of use.

The problem however is that most data – not at least master data – have multiple uses.

My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align fitness for all known purposes.

If we look at the different types of master data and what possibilities that may arise from linked data, this is what initially comes to my mind:

Location master data

Location data has been some of the data types that have been used the most already on the web. Linking a hotel, a company, a house for sale and so on to a map is an immediate visual feature appealing to most people. Many databases around however have poor location data as for example inadequate postal addresses. The demand for making these data “mappable” will increase to near unavoidable, but fortunately the services for doing so with linked data will help.

Hopefully increased open government data will help solve the data supply issue here.

Party master data

Linking party master data to external data sources is not new at all, but unfortunately not as widespread as it could be. The main obstacle until now has been smooth integration into business processes.

Having linked data describing real world entities on the web will make this game a whole lot easier.

Actually I’m working on implementations in this field right now.

Product master data

Traditionally the external data sources available for describing product master data has been few – and hard to find. But surely, at lot of data is already out there waiting to be found, categorized, matched and linked.

Bookmark and Share

Data Quality Is Like Parenting

Thinking about it: Data Quality has a lot of similarities with parenting.

Some equivalence that comes to my mind is:

  • Parenting must be done by everyone who has children; you are not supposed to have an education in education before being parents. The same about data. You are not supposed be a data quality expert before working with data; some common sense will bring you a long way.
  • Some parenting experts never had their own children. I have seen the same with data quality experts too.
  • Many people are more knowledgeable about how other people should raise children than about raising their own children. Same same with data quality.
  • While we internally in the family may have some noise when parenting we keep that internally and keep up appearances to the outside. I think everyone have seen the same with data quality.
  • There may be different styles in parenting going from “because I said so” to talking about it. The same is true around data quality improvement efforts.
  • We do see more and more regulatory around parenting like it in my country now is forbidden to slap your kids.  I think it should be forbidden to slap your naughty data too.

Bookmark and Share

Same Same But Different

The two most common master data types are:

  • Party master data (customers, prospects, suppliers and other business partners)
  • Product master data

When working with data quality within master data management you may of course encounter some similarities between these two master data types, but you will certainly also meet a range differences.  

The basic activities as standardization, consolidation and hierarchy building are the same.

Some of the differences I have learned are:

Multi-cultural issues:

  • Party master data is often stored in a single global format but should be transformed to embrace multi-cultural diversities.
  • Product master data may have multi-cultural issues but should be transformed into a single global format (of course embracing multi-language hierarchies and so).

External reference data available:

  • For party master data the possibilities for real world alignment with external data sources are plenty.
  • For product master data the possibilities for real world alignment with external data sources are few.

Industry specific requirements:

  • Requirements for party master data quality are pretty much the same across industries with few variations as B2B (corporate customers) or B2C (private customers) or both being the most prominent.
  • Requirements for product master data quality vary tremendously across different industries.

Your say:

What are your examples of (similarities and) differences between party master data quality and product master data quality?

Bookmark and Share

3 out of 10

Just before I left for summer vacation I noticed a tweet by MDM guru Aaron Zornes saying:

This is a subject very close to me as I have worked a lot with business directory matching during the last 15 years not at least matching with the D&B WorldBase.

The problem is that if you match your B2B customers, suppliers and other business partners with a business directory like the D&B WorldBase you could naively expect a 100% match.

If your result is only a 30% hit rate the question is: How many among the remaining 70% are false negatives and how many are true negatives.

True negatives

There may be a lot of reasons for true negatives, namely:

  • Your business entity isn’t listed in the business directory. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordic countries and others have a tight public registration of companies and then it is less tight from countries in North America, other European countries and the rest of the world.
  • Your supposed business entity isn’t a business entity. Many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.
  • Uniqueness may be different defined in the business directory and your table to be matched. This includes the perception of hierarchies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also the different roles as those of small business owners are a challenge. The same is true about roles as franchise takers and the use of trading styles.

False negatives

In business directory matching the false negatives are those records that should have been matched by an automated function, but isn’t.

The number of false negatives is a measure of the effectiveness of the automated matching tool(s) and rules applied. Big companies often use the magic quadrant leaders in data quality tools, but these aren’t necessary the best tools for business directory matching.

Personally I have found that you need a very complex mix of tools and rules for getting a decent match rate in business directory matching, including combining both deterministic and probabilistic matching. Some different techniques are explained in more details here.

Bookmark and Share

Consultants

Just arrived home from summer vacation I have been thinking a bit about how we consultants act at work. On our vacation we used local guides at some places. These guides were our consultants at places they know very well and we didn’t know at all. But I also noticed they had some habits which may be considered as common weak sides of practicing consultancy.

Different language

Francisco Caballero has lived all his long life in the beautiful town Ronda in Southern Spain. He shared his great knowledge about the town with us in his distinguished blend of English and Spanish spiced up with some Russian, German and probably also Dutch words. I think we understood the most though we did have some variances when we compared our perceptions afterwards.

Personal opinions

Besides telling about the town and the history behind Señor Caballero also shared his views about politics. He told about problems with young people today and increasing crime. He remembered things were much better when Generalissimo Franco was in charge. He admitted though that today there is no “bandidos” in the mountains as in the old days, but as he put it: “Today all bandidos in Madrid”. I guess he was referring to recent governments.

Assessing risk

Robert is fifth generation of British descent living in Gibraltar, the small English enclave around the marvelous rock on the Southern tip of Spain facing Africa cross the narrow strait. I remember the opening scene of the James Bond film The Living Daylights is a hazardous car ride down the rock. Robert took us in his taxi on the very same narrow roads, practicing pretty much the same style of driving while explaining that as we had to go off and on the car all the time at the different sights, there was really no point in using the safety belts.

Personal commercial agenda

Salam seemed to know everyone and everything in Tangier, the Moroccan city on the Northern tip of Africa on the other side of the Strait of Gibraltar. Salam offered us a guided tour where we would go everywhere we wanted and look at everything we fancied using any time as we pleased. Only when going around he strongly urged us to go to exactly that spice shop he knew and strongly recommended not sitting at that café we spotted but preceding to a much better one. As infidels we couldn’t of course go into a mosque, unless (of course) we gave some extra Euro.

Bookmark and Share