Big Data and Data Matching

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality — The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way. Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Richard Ordowich 2nd April 2013 / 17:00

Before looking at matching it is critical to understand why the matching is required. What are the uses of the matched data? What is the required quality of the matched data? For each use is there a need for an identical, similar, equivalent or corresponding match? Then it is necessary to define what factors determine what constitutes an identical, similar or corresponding match? This is a process we refer to as harmonizing data.

Once each use and the required match type are defined, then solutions to meet these requirements can be examined. What is the workflow of the data to achieve the match? This involves both technology and human activities. No technology matching solutions work perfectly.

Once this work is done, there will be a realization that achieving the desired results is challenging. Exceptions will occur. The desired match quality may not be achievable. Changes to the environment will occur and matching may degrade.
Adding data to the environment is not necessarily a better solution. The quality of big data should be suspect. Match conflicts will increase resulting in more labor.

KnoGimmicks Social Media & Web Design™ 2nd April 2013 / 20:23

Reblogged this on KnoGimmicks Social Media & Web Design™.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Big Data and Data Matching

Related

2 thoughts on “Big Data and Data Matching”

Leave a comment Cancel reply