SOA components – Liliendahl on Data Quality

Service Oriented MDM

6th May 2014Henrik Gabs Liliendahl2 Comments

puzzle Much of the talking and doing related to Master Data Management (MDM) today revolves around the master data repository being the central data store for information about customers, suppliers and other parties, products, locations, assets and what else are regarded as master data entities.

The difficulties in MDM implementations are often experienced because master data are born, maintained and consumed in a range of applications as ERP systems, CRM solutions and heaps of specialized applications.

It would be nice if these applications were MDM aware. But usually they are not.

As discussed in the post Service Oriented Data Quality the concepts of Service Oriented Architecture (SOA) makes a lot of sense in deploying data quality tool capacities that goes beyond the classic batch cleansing approach.

In the same way, we also need SOA thinking when we have to make the master data repository doing useful stuff all over the scattered application landscape that most organizations live with today and probably will in the future.

MDM functionality deployed as SOA components have a lot to offer, as for example:

Reuse is one of the core principles of SOA. Having the same master data quality rules applied to every entry point of the same sort of master data will help with consistency.
Interoperability will make it possible to deploy master data quality prevention as close to the root as possible.
Composability makes it possible to combine functionality with different advantages – e.g. combining internal master data lookup with external reference data lookup.

Developing LEGO® bricks and SOA components

13th August 2012Henrik Gabs LiliendahlLeave a comment

These days the Lego company is celebrating 80 years in business. The celebration includes a Youtube video telling The LEGO® Story.

As I was born close to the Lego home in Billund, Denmark, I also remember having a considerable amount of Lego bricks to play with as a child in the 60’s.

In computer software the use of Lego bricks is often used as a metaphor for building systems with Service Oriented Architecture (SOA) components as discussed for example in this article called Can SOA and architecture really be described with ‘Lego blocks’?

Today using SOA components in order to achieve data quality improvement with master data is a playground for me.

As described in the post Service Oriented Data Quality SOA components have a lot to offer:

• Reuse is one of the core principles of SOA. Having the same data quality rules applied to every entry point of the same sort of data will help with consistency.

• Interoperability will make it possible to deploy data quality prevention as close to the root as possible.

• Composability makes it possible to combine functionality with different advantages – e.g. combining internal checks with external reference data.

Pulling Data Quality from the Cloud

7th June 201215th January 2013Henrik Gabs Liliendahl1 Comment

In a recent post here on the blog the benefits of instant data enrichment was discussed.

In the contact data capture context these are some examples:

Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data.
Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

However, if you are doing business in many countries it is a daunting task to connect with the best of breed sources of big reference data. Add to that, that many enterprises are doing both business-to-business (B2B) and business-to-consumer (B2C) activities including interacting with small business owners. This means you have to link to the best sources available for addresses, companies and individuals.

A solution to this challenge is using Cloud Service Brokerage (CSB).

An example of a Cloud Service Brokerage suite for contact data quality is the instant Data Quality (iDQ™) service I’m working with right now.

This service can connect to big reference data cloud services from all over the world. Some services are open data services in the contact data realm, some are international commercial directories, some are the wealth of national reference data services for addresses, companies and individuals and even social network profiles are on the radar.

Proactive Data Governance at Work

28th July 201119th May 2015Henrik Gabs Liliendahl1 Comment

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

Data entry directly in the master data hub
Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Complicated Matters

6th September 20106th September 2010Henrik Gabs Liliendahl2 Comments

A while ago I wrote a short blog post about a tweet from the Gartner analyst Ted Friedman saying that clients are disappointed with the ability to support wide deployment of complex business rules in popular data quality tools.

Speaking about popular data quality tools; on the DataFlux Community of Experts blog Founder of DataQualityPro Dylan Jones posted a piece this Friday asking: Are Your Data Quality Rules Complex Enough?

Dylan says: “Many people I speak to still rely primarily on basic data profiling as the backbone of their data quality efforts”.

The classic answers to the challenge of complex business rules are:

Relying on people to enforce complex business rules. Unfortunately people are not as consistent in enforcing complex rules as computer programs are.
Making less complex business rules. Unfortunately the complexity may be your competitive advantage.

In my eyes there is no doubt about that data quality tool vendors has a great opportunity in research and development of tools that are better at deploying complex business rules. In my current involvement in doing so we work with features as:

Deployment as Service Oriented Architecture components. More on this topic here.
Integrating multiple external sources. Further explained here.
Combining the best algorithms. Example here.

Data Quality is an Ingredient, not an Entrée

9th July 20109th July 2010Henrik Gabs Liliendahl8 Comments

Fortunately it is more and more recognized that you don’t get success with Business Intelligence, Customer Relationship Management, Master Data Management, Service Oriented Architecture and many more disciplines without starting with improving your data quality.

But it will be a big mistake to see Data Quality improvement as an entrée before the main course being BI, CRM, MDM, SOA or whatever is on the menu. You have to have ongoing prevention against having your data polluted again over time.

Improving and maintaining data quality involves people, processes and technology. Now, I am not neglecting the people and process side, but as my expertise is in the technology part I will like to mention some the technological ingredients that help with keeping data quality at a tasty level in your IT implementations.

Mashups

Many data quality flaws are (not surprisingly) introduced at data entry. Enterprise data mashups with external reference data may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

External ID’s

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

Error tolerant search

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for introducing duplicates.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

What is Data Quality anyway?

17th March 201022nd July 2011Henrik Gabs Liliendahl21 Comments

The above question might seem a bit belated after I have blogged about it for 9 months now. But from time to time I ask myself some questions like:

Is Data Quality an independent discipline? If it is, will it continue to be that?

Data Quality is (or should) actually be a part of a lot of other disciplines.

Data Governance as a discipline is probably the best place to include general data quality skills and methodology – or to say all the people and process sides of data quality practice. Data Governance is an emerging discipline with an evolving definition, says Wikipedia. I think there is a pretty good chance that data quality management as a discipline will increasingly be regarded as a core component of data governance.

Master Data Management is a lot about Data Quality, but MDM could be dead already. Just like SOA. In short: I think MDM and SOA will survive getting new life from the semantic web and all the data resources in the cloud. For that MDM and SOA needs Data Quality components. Data Quality 3.0 it is.

You may then replace MDM with CRM, SCM, ERP and so on and here by extend the use of Data Quality components from not only dealing with master data but also transaction data.

Next questions: Is Data Quality tools an independent technology? If it is, will it continue to be that?

It’s clear that Data Quality technology is moving from being stand alone batch processing environments, over embedded modules to, oh yes, SOA components.

If we look at what data quality tools today actually do, they in fact mostly support you with automation of data profiling and data matching, which is probably only some of the data quality challenges you have.

In the recent years there has been a lot of consolidation in the market around Data Integration, Master Data Management and Data Quality which certainly is telling that the market need Data Quality technology as components in a bigger scheme along with other capabilities.

But also some new pure Data Quality players are established – and I think I often see some old folks from the acquired entities at these new challengers. So independent Data Quality technology is not dead and don’t seem to want to be that.

Deploying Data Matching

18th February 20102nd July 2010Henrik Gabs Liliendahl2 Comments

As discussed in my last post a core part of many Data Quality tools is Data Matching. Data Matching is about linking entities in or between databases, where these entities are not already linked with unique keys.

Data Matching may be deployed in some different ways, where I have been involved in the following ones:

External Service Provider

Here your organization sends extracted data sets to an external service provider where the data are compared and also in many cases related to other reference sources all through matching technology. The provider sends back a “golden copy” ready for uploading in your databases.

Some service provider’s uses a Data Matching tool from the market and others has developed own solutions. Many solutions grown at the providers are country specific equipped with a lot of tips and tricks learned from handling data from that country over the years.

The big advantage here is that you gain from the experience – and the reference data collection – at these providers.

Internal Processing

You may implement a data quality tool from the market and use it for comparing your own data often from disparate internal sources in order to grow the “golden copy” at home.

Many MDM (Master Data Management) products have some matching capabilities build in.

Also many leading Business Intelligence tool providers supplement the offering with a (integrated) Data Quality tool with matching capabilities as an answer to the fact, that Business Intelligence on top of duplicated data doesn’t make sense.

Embedded Technology

Many data quality tool vendors provide plug-ins to popular ERP, CRM and SCM solutions so that data are matched with existing records at the point of entry. For the most popular such solutions as SAP and MS CRM there is multiple such plug-in’s from different Data Quality technology providers. Then again many implementation houses have a favorite combination – so in that way you select the matching tool by selecting an implementation house.

SOA Components

The embedded technology is of course not optimal where you operate with several databases and the commercial bundling may also not be the actual best solution for you.

Here Service Oriented Architecture thinking helps, so that matching services are available as SOA components at any point in your IT landscape based on centralized rule setting.

Cloud Computing

Cloud computing services offered from external service providers takes the best from these two worlds into one offering.

Here the SOA component resides at the external service provider – in best case combining an advanced matching tool, rich external reference data and the tips and tricks for your particular country and industry in question.

Driving Data Quality in 2 Lanes

23rd September 20091st July 2010Henrik Gabs LiliendahlLeave a comment

Yesterday I visited a client in order to participate in a workshop on using a Data Quality Desktop tool by more users within that organisation.

This organisation makes use of 2 different Data Quality tools from Omikron:

The Data Quality Server, a complete framework of SOA enabled Data Quality functionality where we need the IT-department to be a critical part of the implementation.
The Data Quality Desktop tool, a user friendly piece of windows software installable by any PC user, but with sophisticated cleansing and matching features.

During the few hours of this workshop we were able to link several different departmental data sources to the server based MDM hub, setting up and confirming the business rules for this and reporting the foreseeable outcome of this process if it were to be repeated.

Some of the scenarios exercised will continue to run as ad hoc departmental processes and others will be upgraded into services embraced by the enterprise wide server implementation.

As I – for some reasons – went to this event going by car over a larger distance I had the time to compare the data quality progress made by different organisations with the traffic on the roads where we have:

Large busses with persons and large lorries with products being the most sustainable way of transport – but they are slow going and not too dynamic. Like the enterprise wide server implementations of Data Quality tools.
Private cars heading at different destinations in different but faster speeds. Like the desktop Data Quality tools.

I noticed that:

One lane with busses or lorries works fine but slowly.
One lane with private cars is bit of a mess with some hazardous driving.
One lane with busses, lorries and private cars tends to be mortal.
2 (or more) lanes works nice with good driving habits.

So, encouraged by the workshop and the ride I feel comfortable with the idea of using both kind of Data Quality tools to have coherent user involved agile processes backed by some tools and a sustainable enterprise wide solution at the same time.

	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Conrad Greer on SAP and Master Data Manag…
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Michael Fieg, Parsio… on SAP and Master Data Manag…
	Asifa on Data Fabric and Master Data…
	Henrik Gabs Lilienda… on Data Fabric and Master Data…