Liliendahl on Data Quality

Proactive Data Governance at Work

28th July 201119th May 2015Henrik Gabs Liliendahl1 Comment

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

Data entry directly in the master data hub
Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Phishing in Wrong Waters

27th July 201127th July 2011Henrik Gabs Liliendahl4 Comments

Yesterday a lot of Danes received an e-mail apparently coming from the tax authorities but was a phishing attempt.

The form to be filled may seem professional at first glance, but it actually had errors all over.

While such errors may be common in phishing as the ones behind only need a fraction of the receivers to take the bite, you actually do see many of the errors in lawful activities.

Some of the errors in the phishing attempt were:

It is very unlikely that the public sector would communicate in English instead of Danish
They got our national ID for every citizen right; it is called CPR-NR. But why ask for date of birth as this is included in the national ID.
Asking for “Mother Maiden Name” and “The name of your son” seems ridiculous to me. Don’t know if it’s some kind of custom anywhere else in the world.
The address format is (as usual) a United States standard. Here it would be: Address, Postal Code, Town/City.
You would never expect the public sector to pay anything to your credit/debit card. Our national ID is connected to a bank account selected for that purpose.

As the tax authorities stated in a warning e-mail today: “We do not know of anyone who has been cheated by the mail”.

I guess they are right.

Also, if you are doing lawful activities but committing the same kind of diversity errors in your forms: Don’t expect a whole lot of conversion.

Data Quality and Decision Intelligence

26th July 2011Henrik Gabs Liliendahl2 Comments

“The substitute for Business Intelligence is called Decision Intelligence” was the headline in an article on the Danish IT site Version2 last month. The article was an interview with Michael Borges, head of Copenhagen based data management system integrator Platon. The article is introduced in English on Platon’s Australia site.

The term Decision Intelligence as a successor for Business Intelligence (BI) has been around for a while. In an article from 2008 Claudia Imhoff and Colin White explains what Decision Intelligence does that Business Intelligence don’t. Very simplified it is embracing and integrating operational Business Intelligence, traditional Data Warehouse based Business Intelligence and (Business) Content Analytics.

It is said in the article: “This, of course, has implications for both data integration and data quality. This aspect of decision intelligence will be covered in a future article.” I haven’t been able to find that future article. Maybe it’s still pending.

Anyway, certainly this – call it Decision Intelligence or something else – has implications for data quality.

The operational BI side is about supporting, and maybe have the systems making, decisions based on events taking place here and now based on incoming transactions and related master data. This calls for data quality prevention at data collection time opposite to data cleansing downstream which may have served well for informed decisions in traditional Data Warehouse based BI.

The content analysis side, which according to Imhoff/White article includes information expertise, makes me consider the ever recurring discussion in the data quality realm about the difference between data quality and information quality. Maybe we will come to an intelligent decision on that one when Business Intelligence is succeeded by Decision Intelligence.

Big Master Data

24th July 20113rd May 2012Henrik Gabs LiliendahlLeave a comment

Right now I am overseeing the processing of yet a master data file with millions of records. In this case it is product master data also with customer master data kind of attributes, as we are working with a big pile of author names and related book titles.

The Big Buzz

Having such high numbers of master data records isn’t new at all and compared to the size of data collections we usually are talking about when using the trendy buzzword BigData, it’s nothing.

Data collections that qualify as big will usually be files with transactions.

However master data collections are increasing in volume and most transactions have keys referencing descriptions of the master entities involved in the transactions.

The growth of master data collections are also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to near 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Extreme Master Data

Gartner, the analyst firm, has a concept called “extreme data” that rightly points out, that it is not only about volume this “big data” thing; it is also about velocity and variety.

This is certainly true also for master data management (MDM) challenges.

Master data are exchanged between organizations more and more often in higher and higher volumes. Data quality focuses and maturity may probably not be the same within the exchanging parties. The velocity and volume makes it hard to rely on people centric solutions in these situations.

Add to that increasing variety in master data. The variety may be international variety as the world gets smaller and we have collections of master data embracing many languages and cultures. We also add more and more attributes each day as for example governments are releasing more data along with the open data trend and we generally include more and more attributes in order to make better and more informed decisions.

Variety is also an aspect of Multi-Domain MDM, a subject that according to Gartner (the analyst firm once again) is one of the Three Trends That Will Shape the Master Data Management Market.

More Social Master Data Management

22nd July 201122nd July 2011Henrik Gabs LiliendahlLeave a comment

Yesterday my American cyberspace friend Jim Harris was so kind to send an invitation for Google+ – the new social network service you must hook into. Thanks Jim, now I had to fill in yet a profile, upload the same picture as always and start networking from scratch once again 🙂

As many people I have several profiles in different social network services as Twitter, Facebook and LinkedIn. As I’m doing business also with German speaking countries I also use XING as alternative to LinkedIn as told in the post LinkedIn and the other Thing.

In a comment to that post my Austria based French connection Olivier Mathurin noted: “Disconnected duplicated siloed professional profiles, mmm…”

In a post on this blog called Social Master Data Management made one year ago it is discussed how social CRM will add new sources from social networks to the external reference data sources we already know from old time CRM.

With all the different faces everyone are wearing in the social media realm this isn’t going to be easy and one may consider if social master data management is a wrong path giving the individual nature and built-in privacy in social networking services.

Well, Gartner (the analyst firm) says that increasing links between MDM and social networks is one of the Three Trends That Will Shape the Master Data Management Market.

So, acknowledging that Gartner predictions are self-fulfilling, you better get moving into LinkedIn, Xing, Viadeo, Twitter, Facebook, (forget MySpace), Google+ and what’s next.

Hors Catégorie

21st July 201122nd July 2011Henrik Gabs LiliendahlLeave a comment

Right now the yearly paramount in cycling sport Le Tour de France is going on and today is probably the hardest stage in the race with three extraordinary climbs. In cycling races the climbs are categorized on a scale from 4 (the easiest) to 1 (the hardest) depending on the length and steepness. And then there are climbs beyond category, being longer and steeper than usually, like the three climbs today. The description in French for such extreme climbs is “hors catégorie“.

Within master data management categorization is an important activity.

We categorize our customer master data for example depending on what kind of party we dealing with like in the list here called Party Master Data Types that I usually use within customer data integration (CDI). Another way of categorizing is by geography as the data quality challenges may vary depending on where the party in question resides.

In product information management (PIM) categorization of products is one of the most basic activities. Also here the categorization is important for establishing the data quality requirements as they may be very different between various categories as told in the post Hierarchical Completeness.

But there are always some master data records that are beyond categorization in order to fulfill else accepted requirements for data quality as I experienced in the post Big Trouble with Big Names.

Big Business

20th July 201120th July 2011Henrik Gabs LiliendahlLeave a comment

In a recent blog post called Hamsterdam and Data Anarchy by Phil Simon on The Data Roundtable it is described how rules, policies, and procedures sometimes are suspended in an unusual situation and how dangerous that may be.

I remember being part of such a situation back in the 80’s. The situation also included that I as an IT guy became “the business” and the situation could have been big business for me – or big time jail for that matter.

Quick-and-dirty

My first real job was at the Danish Tax Authorities. The government is always looking for new ways of collecting taxes and at that time a new kind of tax was invented, as a new law enforced taxation on the big money piling up in pension funds.

As the tax revenue was needed quickly the solution was a simple construction for the first year and a more complex permanent construction for the following years.

The burden in implementing the collection on the authority’s side wasn’t that big, so the operating team was basically a legal guy and me, being an IT guy. We collected the names and addresses of a few hundred companies in financial services that might administer pension funds and sent them a letter with instructions about calculating their contribution for the first year and turning over the money.

Money on the table

Because no one else in the organization was involved in the one off solution for the first year the returned statements and checks ended at my desk. So at that time my morning drill was opening envelopes with:

A statement that I registered in a little data silo I controlled myself and then passed on to the archive
A check that I passed on to the treasury

Some of the checks were pretty big – as I remember what resembles more than 50 million Euros.

So I did consider an alternative workflow for just one of the big ones. It could have been as this:

Deleting the company in the data silo I controlled myself
Archiving the statement in my kitchen bin at home
Cashing the check for myself

Well, probably I would have been handcuffed when executing activity number three.

Managing Client On-Boarding Data

19th July 201115th April 2012Henrik Gabs Liliendahl3 Comments

This year I will be joining FIMA: Europe’s Premier Financial Reference Data Management Conference for Data Management Professionals. The conference is held in London from 8^th to 10^th November.

I will present “Diversities In Using External Registries In A Globalised World” and take part in the panel discussion “Overcoming Key Challenges In Managing Client On-Boarding Data: Opportunities & Efficiency Ideas”.

As said in the panel discussion introduction: The industry clearly needs to normalise (or is it normalize?) regional differences and establish global standards.

The concept of using external reference data in order to improve data quality within master data management has been a favorite topic of mine for long.

I’m not saying that external reference data is a single source of truth. Clearly external reference data may have data quality issues as exemplified in my previous blog post called Troubled Bridge Over Water.

However I think there is a clear trend in encompassing external sources, increasingly found in the cloud, to make a shortcut in keeping up with data quality. I call this Data Quality 3.0.

The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

Lately I have worked with a concept called instant Data Quality. The idea is to make simple yet powerful functionality that helps with hooking up with many external sources at the same time when on-boarding clients and making continuous maintenance possible.

One aspect of such a concept is how to exploit the different opportunities available in each country as public administrative practices and privacy norms varies a lot over the world.

I’m looking forward to present and discuss these challenges and getting a lot of feedback.

Troubled Bridge Over Water

17th July 201118th July 2011Henrik Gabs Liliendahl9 Comments

In the recent blog post A pain in the… I described my summer holiday fun being a cycling tour round the Baltic coast.

You meet a lot of data quality issues on such a tour.

One experience was when we arrived in the Polish town Świnoujście. I planned the tour using Google Maps. According to the plan we would arrive in Świnoujście from the west, cross the bridge over the river Świna and reach the ferry to Sweden on the east bank close to the railway station.

Nice plan. Only thing: Opposite to what’s shown on Google Maps – and told in the route planner, there is no bridge across the river in the real world.

Fortunately there was a free ferry service across the river. So we did catch the once a day big ferry to Sweden in time.

PS: The road name on the bridge on Google Maps is by the way Wodna. Wodna is Polish for (something with) water.

Mutating Platforms or Intelligent Design

16th July 201127th March 2012Henrik Gabs Liliendahl2 Comments

How do we go from single-domain master data management to multi-domain master data management? Will it be through evolution of single-domain solutions or will it require a complete new intelligent design?

The MDM journey

My previous blog post was a book review of “Master Data Management in Practice” by Dalton Servo and Mark Allen – or the full title of the book is in fact “Master Data Management in Practice: Achieving True Customer MDM”.

The customer domain has until now been the most frequent and proven domain for master data management and as said in the book, the domain where most organizations starts the MDM journey in particular by doing what is usually called Customer Data Integration (CDI).

However some organizations do start with Product Information Management (PIM). This is mainly due to the magic numbers being the fact that some organizations have a higher number of products than customers in the database.

Sooner or later most organizations will continue the MDM journey by embracing more domains.

Achieving Multi-Domain MDM

John Owens made a blog post yesterday called “Data Quality: Dead Crows Kill Customers! Dead Crows also Kill Suppliers!” The post explains how some data structures are similar between sales and purchasing. For example a customer and a supplier are very similar as a party.

Customer Data Integration (CDI) has a central entity being the customer, which is a party. Product Information Management (PIM) has an important entity being a supplier, which is a party. The data structures and the workflows needed to Create, Read, Update and perhaps Delete these entities are very similar, not at least in business-to-business (B2B) environments.

So, when you are going from PIM to CDI, you don’t have to start from scratch, not at least in a B2B environment.

The trend in the master data management technology market is that many vendors are working their way from being a single domain vendor to being a multi-domain vendor – and some are promoting their new intelligent design embracing all domains from day one.

Some other vendors are breeding several platforms (often based on acquisition) from different domains into one brand, and some vendors are developing from a single domain into new domains.

Each strategy has its pros and cons. It seems there will be plenty of philosophies to choose from when organizations are going the select the platform(s) to support the multi-domain MDM journey.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph