MDM – Page 54 – Liliendahl on Data Quality

Out of Facebook

1st September 20105th September 2010Henrik Gabs Liliendahl7 Comments

Some while ago it was announced that Facebook signed up member number 500,000,000.

If you are working with customer data management you will know that this doesn’t mean that 500,000,000 distinct individuals are using Facebook. Like any customer table the Facebook member table will suffer from a number of different data quality issues like:

Some individuals are signed up more than once using different profiles.
Some profiles are not an individual person, but a company or other form of establishment.
Some individuals who created a profile are not among us anymore.

Nevertheless the Facebook member table is a formidable collection of external reference data representing the real world objects that many companies are trying to master when doing business-2- consumer activities.

For those companies who are doing business-2-business activities a similar representation of real world objects will be the +70,000,000 profiles on LinkedIn plus profiles in other social business networks around the world which may act as external reference data for the business contacts in the master data hubs, CRM systems and so on.

Customer Master Data sources will expand to embrace:

Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
Data feed and data integration with traditional external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
Self registration by prospects and customers via webforms.
Social media master data captured during social CRM and probably harvested in more and more structured ways as a new wave of exploiting external reference data.

Doing “Social Master Data Management” will become an integrated part of customer master data management offering both opportunities for approaching a “single version of the truth” and some challenges in doing so.

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

But the fact that 500,000,000 profiles has been created on Facebook in a very few years by people from all over world shows that people are willing to share and that much information can be collected in the cloud. However no one wants to be spammed by sharing and indeed there have been some controversies around how data in Facebook is handled.

Anyway I have no doubt that we will see less data entering clerks entering the same information in each company’s separate customer tables and that we increasingly will share our own master data attributes in the cloud.

Out-of-Africa

30th August 201027th March 2012Henrik Gabs Liliendahl4 Comments

Besides being a memoir by Karen Blixen (or the literary double Isak Dinesen) Out-of-Africa is a hypothesis about the origin of the modern human (Homo Sapiens). Of course there is a competing scientific hypothesis called Multiregional Origin of Modern Humans. Besides that there is of course religious beliefs.

The Out-of-Africa hypothesis suggests that modern humans emerged in Africa 150,000 years ago or so. A small group migrated to Eurasia about 60,000 years ago. Some made it across the Bering Strait to America maybe 40,000 years ago or maybe 15,000 years ago. The Vikings said hello to the Native Americans 1,000 years ago, but cross Atlantic movement first gained pace from 500 years ago, when Columbus discovered America again again.

½ year ago (or so) I wrote a blog post called Create Table Homo_Sapiens. The comment follow up added to the nerdish angle with discussing subjects as mutating tables versus intelligent design and MAX(GEEK) counting.

But on the serious side comments also touched the intended subject about making data models reflect real world individuals.

Tables with persons are the most common entity type in databases around. As in the Out-of-Africa hypothesis it could have been as a simple global common same structural origin. But that is not the way of the world. Some of the basic differences practiced in modeling the person entity are:

Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
“Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Many, many data quality struggles around the world is caused by how we have modeled real world – old world and new world – individuals.

360° Share of Wallet View

26th August 201023rd February 2011Henrik Gabs Liliendahl4 Comments

I have found this definition of Share of Wallet on Wikipedia:

Share of Wallet is the percentage (“share”) of a customer’s expenses (“of wallet”) for a product that goes to the firm selling the product. Different firms fight over the share they have of a customer’s wallet, all trying to get as much as possible. Typically, these different firms don’t sell the same but rather ancillary or complementary product.

Measuring your share of given wallets – and your performance in increasing it – is a multi-domain master data management exercise as you have to master both a 360° view of customers and a 360° view of products.

With customer master data you are forced to handle uniqueness (consolidate duplicates) of customers and handle hierarchies of customers, which is further explained in the post 360° Business Partner View.

With product master data you are not only forced to categorize your own products and handle hierarchies within, but you also need to adapt to external categorizations in order to getting access to external data available for spending probably on a high level for a segment of customers but sometimes even possible down to the single customer.

Location master data may be important here for geographical segmentations and identification.

My educated guess is that companies will increasing rely on having better data quality and master data management processes and infrastructure in order to measure precise shares of wallets and thereby gain advantages in a stiff competition rather than relying on gut feelings and best guesses.

Same Same But Different

21st August 201015th December 2010Henrik Gabs Liliendahl7 Comments

The two most common master data types are:

Party master data (customers, prospects, suppliers and other business partners)
Product master data

When working with data quality within master data management you may of course encounter some similarities between these two master data types, but you will certainly also meet a range differences.

The basic activities as standardization, consolidation and hierarchy building are the same.

Some of the differences I have learned are:

Multi-cultural issues:

Party master data is often stored in a single global format but should be transformed to embrace multi-cultural diversities.
Product master data may have multi-cultural issues but should be transformed into a single global format (of course embracing multi-language hierarchies and so).

External reference data available:

For party master data the possibilities for real world alignment with external data sources are plenty.
For product master data the possibilities for real world alignment with external data sources are few.

Industry specific requirements:

Requirements for party master data quality are pretty much the same across industries with few variations as B2B (corporate customers) or B2C (private customers) or both being the most prominent.
Requirements for product master data quality vary tremendously across different industries.

Your say:

What are your examples of (similarities and) differences between party master data quality and product master data quality?

3 out of 10

17th August 20101st September 2010Henrik Gabs LiliendahlLeave a comment

Just before I left for summer vacation I noticed a tweet by MDM guru Aaron Zornes saying:

This is a subject very close to me as I have worked a lot with business directory matching during the last 15 years not at least matching with the D&B WorldBase.

The problem is that if you match your B2B customers, suppliers and other business partners with a business directory like the D&B WorldBase you could naively expect a 100% match.

If your result is only a 30% hit rate the question is: How many among the remaining 70% are false negatives and how many are true negatives.

True negatives

There may be a lot of reasons for true negatives, namely:

Your business entity isn’t listed in the business directory. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordic countries and others have a tight public registration of companies and then it is less tight from countries in North America, other European countries and the rest of the world.
Your supposed business entity isn’t a business entity. Many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.
Uniqueness may be different defined in the business directory and your table to be matched. This includes the perception of hierarchies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also the different roles as those of small business owners are a challenge. The same is true about roles as franchise takers and the use of trading styles.

False negatives

In business directory matching the false negatives are those records that should have been matched by an automated function, but isn’t.

The number of false negatives is a measure of the effectiveness of the automated matching tool(s) and rules applied. Big companies often use the magic quadrant leaders in data quality tools, but these aren’t necessary the best tools for business directory matching.

Personally I have found that you need a very complex mix of tools and rules for getting a decent match rate in business directory matching, including combining both deterministic and probabilistic matching. Some different techniques are explained in more details here.

Business Directory Musings

27th July 20101st September 2010Henrik Gabs Liliendahl3 Comments

This coming Sunday I have worked professionally within Information Technology for 30 years. As I will be on a (well deserved!) vacation in Andalusia on Sunday, I’ll better post my thoughts today.

I have had a lot of different positions and worked in a lot of different domains. The single subject I have worked with the most is business directories.

My first job was at the Danish Tax Authorities and one of the assignments was being a secretary to the committee working for a joint registration of companies in Denmark. Besides I learned a lot about working in political driven organizations and about aligning business and technology I feel good about having been part of the start of building a public sector master data directory. Such directories are both essential for an effective public administration and can be used as external reference data in private enterprises as a valuable mean to improve data quality with business partner master data.

Later I have been working a lot with improving data quality through matching solutions around business directories. This goes from the Dun & Bradstreet WorldBase holding nearly 170 million business entities from all over the world, over databases like the EuroContactPool to national databases either holding all businesses (available) in a single country or given industry segments.

I guess I also will be spending some additional years from now with integrating business directory information into business processes as smooth as possible and preferable along with a range of other kind of external reference data.

One of the new sources building up in the cloud in the realm of business directories is master data references in social networks. The LinkedIn Companies feature is a prominent example. Of course such directories have some data quality issues. This is seen in looking at the companies where I currently work:

DM Partner A/S seems OK
Omikron Data Quality has 90 employees according to the company profile (filled out by yours truly). Then it’s strange that there are only 25 profiles in the network. But that’s because most employees are in Germany where the competing network called Xing is stronger.
Trapeze Group Europe has not been updated with a recent merger and not all profiles has changed their profile accordingly yet. But I’m sure that will be done as time goes by.

I have no doubt though that including information from social networks will become a part of integrating business partner master data in my future.

Social Master Data Management

24th July 201029th May 2012Henrik Gabs Liliendahl3 Comments

The term ”Social CRM” has been around for a while. Like traditional CRM (Customer Relationship Management) is heavily dependent on proper MDM (Master Data Management) we will also see that enterprise wide social CRM will be dependent on a proper social MDM element in order to be a success.

The challenge in social MDM will be that we are not going to replace some data sources for MDM, but we are actually going to add some more sources and handle the integration of these sources with the sources for traditional CRM and MDM and other new sources coming from the cloud.

Customer Master Data sources will expand to embrace:

Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
Data feed and data integration with external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
Self registration by prospects and customers via webforms.
Social media master data captured during social CRM and probably harvested in more and more structured ways.

Social media master data are found as profiles in services as Facebook mainly for business-to–consumer activities, LinkedIn mainly for business-to-business activities and Twitter somewhere in between. These are only some prominent examples of such services. Where LinkedIn may be dominant for professional use in English speaking countries and countries where English is widely spoken as Scandinavia and the Netherlands other regions are far less penetrated by LinkedIn. For example for German speaking countries the similar network service called Xing is much more crowded. So, when embracing global business you will have to acknowledge the diversity found in social network services.

A good way to integrate all these sources in business processes is using mashup’s. An example will be a mashup for entering customer data. If you are entering a business entity you may want to know:

What is already known in internal databases about that entity – either via a centralized MDM hub or throughout disparate databases?
Is the visit address correct according to public sector data?
How is the business account related to other business entities learned from a business directory?
Do we recognize the business contact in social networks – maybe we did have contact before in another relation?

If you are entering a consumer entity you may want to know:

Does that person already exist in our internal databases – as an individual and as a household?
What do we know about the residence address from public sector data?
Can we obtain additional data from phone book directories, nixie lists and what else being available, affordable and legal in the country in question?
How do we connect in social media?

If aligning people, processes and technology didn’t matter before, it will when dealing with social master data management.

Feasible Names and Addresses

17th July 201029th May 2012Henrik Gabs LiliendahlLeave a comment

Most data quality technology was born in relation to the direct marketing industry back in the good old offline days. Main objectives have been deduplication of names and addresses and making names and addresses fit for mailing.

When working with data quality you have to embrace the full scope of business value in the data, here being the names and addresses.

Back in the 90’s I worked with an international fund raising organization. A main activity was sending direct mails with greeting cards for optional sale with motives related to seasonal feasts. Deduplication was a must regardless of the country (though the means was very different, but that’s for another day). Obviously the timing of the campaigns and the motives on the cards was different between countries, but also within the countries based on the names and addresses.

Two examples:

German addresses

When selecting motives for Christmas cards it’s important to observe that Protestantism is concentrated in the north and east of the country and Roman Catholicism is concentrated in the south and west. (If you think I’m out of season, well, such campaigns are planned in summertime). So, in the North and East most people prefer Christmas cards with secular motives as a lovely winter landscape. In the South and West most people will like a motive with Madonna and Child. Having well organized addresses with a connection to demographic was important.

Malaysian names

Malaysia is a very multi-ethnic society. The two largest groups being the ethnic Malayans and the Malaysians of Chinese descent have different seasonal feasts. The best way of handling this in order to fulfill the business model was to assign the names and addresses to the different campaigns based on if the name was an ethnic Malayan name or a Chinese name. Surely an exercise on the edge of what I earlier described in the post What’s in a Given Name?

New Blog Name?

16th July 201018th September 2010Henrik Gabs Liliendahl11 Comments

As reported by Mark Goloboy here ”Data Quality” is becoming a dirty word. ”Information Quality” is in vogue.

Maybe I will soon have to change the name of my blog?

Also one may expect other related terms will be changed, like:

Data Governance becomes Information Governance
Master Data Management becomes Master Information Management
Data Matching becomes Information Matching
Data Warehouse becomes Information Warehouse
Database becomes Informationbase
Information Technology becomes Data Technology

But changing the name of a blog is a serious thing you shouldn’t do too often. I think I will wait and see if the term renaming stops at simply replacing data and information. Some guesses for further renaming:

Information Fitness replaces Data Quality as Data quality is often defined as “fit for intended purpose of use” and by replacing data with information that trail is even more clear – opposed to the other trail being real world alignment.

Information Political Correctness replaces Data Governance as Data Governance is a lot about policies and the Data Governance practice is a lot about maneuvering in the corporate political landscape.

Master Information Technology (MIT) replaces Master Data Management (MDM)

Data Quality is an Ingredient, not an Entrée

9th July 20109th July 2010Henrik Gabs Liliendahl8 Comments

Fortunately it is more and more recognized that you don’t get success with Business Intelligence, Customer Relationship Management, Master Data Management, Service Oriented Architecture and many more disciplines without starting with improving your data quality.

But it will be a big mistake to see Data Quality improvement as an entrée before the main course being BI, CRM, MDM, SOA or whatever is on the menu. You have to have ongoing prevention against having your data polluted again over time.

Improving and maintaining data quality involves people, processes and technology. Now, I am not neglecting the people and process side, but as my expertise is in the technology part I will like to mention some the technological ingredients that help with keeping data quality at a tasty level in your IT implementations.

Mashups

Many data quality flaws are (not surprisingly) introduced at data entry. Enterprise data mashups with external reference data may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

External ID’s

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

Error tolerant search

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for introducing duplicates.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph