When to cleanse in Data Migration?

Tibor Bossanyi 2nd September 2009 / 22:44

Hi,

in my opinion: cleansing in the legacy system means a separate project. –> more cost + effort
during the migration: here I would put into 2 parts: before the mapping or after the mapping. After my experience a hibrid solution is also acceptable: the mapping can filter the metadata problems (simple syntax checking), and after the mapping we can clean the data which is already in the new structure.
That means the iterations enhances not only the mapping quality, but also the data (semantic) quality.

Reply
Olivier Mathurin 3rd September 2009 / 10:01

In a Customer-Hub service-oriented implementation using a matching engine, we simply migrated the data using the production interface, meaning that we passed legacy data one-by-one to the Hub through a messaging interface. It was a good opportunity for:

1) Saving costs and time by:
a- Not implementing a dedicated migration architecture that would have been thrown away after the initial load of the CDH (licenses+design+infrastructure+implementation+etc.)

b- Not having to implement twice validation and matching rules in two different tools (ETL and Match Engine).

2) Testing and optimizing our validation rules and our matching rules with real and tricky duplicate records, enhancing the overall quality controls of the Hub,

3) Testing the infrastructure on high volumes,

Our volumetry was less than 1M recordsengaging 3 legacy-systems, approx. 15% of duplicate records, and we had to test our procedure several times. But it worked well.

Reply
Dylan Jones 3rd September 2009 / 10:31

Thanks for reminding me about this Henrik, I’ll add some comments on the forum too.

The simple answer is – it depends.

The key here is that you simply cannot take a universal standpoint and say “We will cleanse in the target (after the migration”. This sounds bizarre but I see it in so many migrations, companies are unwilling to foot the bill for DQ but they end up paying far more in delays or failure.

The point is that irrespective of where you resolve DQ defects you still need to assess data quality thoroughly right at the start of the project.

So many people view DQ as data migration testing, they create their mapping logic and then do some basic testing close to runtime, only when they execute live do they hit the problems.

By performing a DQ assessment up front you will be able to classify the various defects as…

a) Must be resolved in the legacy system
b) Cannot be resolved in legacy so must be resolved in a staging area
c) Can be resolved “in-flight” using DQ software or ETL tools etc.
d) Can be resolved in the target or some further downstream processing

Remember also that some data defects mean that the data won’t physically migrate at all, another danger point.

Hope that helps.

Reply
Alan Snow 4th September 2009 / 14:53

In my view, as Dylan says, ‘It depends’, however I have been involved in several legacy to ERP migrations and I know what often happens. In many cases, responsibility for data quality (let alone defining what is ‘acceptable’ DQ), is not explicitly assigned. The result is that Migration Teams, who usually have to migrate to a strict timetable, don’t allow enough time for assessing DQ and making a plan on how to address the issues uncovered. The result is that they ‘do the best they can in the allotted time’, but this usually means that data is incorrectly migrated (or not migrated at all). It may be several weeks or months before the knowledge workers raise the alarm (that data in the new system is missing or incorrect), and by that time the old legacy system may have been switched out, and its data is not easily accessible. Result: a new system with lower data quality than the old one. All you succeed in doing is getting the wrong answer faster! The moral is: NEVER wait until after the migration, its too late then to recover.
Should data always be ‘corrected at source’? I don’t think so, sometimes it is not viable due to unclear business rules or relationships in legacy systems, and if you are migrating soon, what’s the point? It will be a lot of effort and expense for a short-lived benefit. Ultimately a pragmatic balance has to be found between the reletive cost and practicality of addressing DQ at source, and addressing it further downstream.

Reply
Henrik Liliendahl Sørensen 2nd September 2010 / 21:23

One year later (shame on me): Thanks for the input.

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

5 thoughts on “When to cleanse in Data Migration?”

Leave a comment Cancel reply

Related

5 thoughts on “When to cleanse in Data Migration?”

Leave a comment Cancel reply