Matching Light Bulbs

15th December 201016th December 2010Henrik Gabs Liliendahl

This morning I noticed this lightbulb joke in a tweet from @mortensax:

Besides finding it amusing I also related to it since I have used an example with light bulbs in a webinar about data matching as seen here:

The use of synonyms in Search Engine Optimization (SEO) is very similar to the techniques we use in data matching.

Here the problem is that for example these two product descriptions may have a fairly high edit distance (very different character by character), but are the same:

Light bulb, A 19, 130 Volt long life, 60 W
Incandescent lamp, 60 Watt, A19, 130V

while these two product descriptions have an edit distance of only one substitution of a character, but are not the same product (though being same category):

Light bulb, 60 Watt, A 19, 130 Volt long life
Light bulb, 40 Watt, A 19, 130 Volt long life

Working with product data matching is indeed very enlightening.

Steve Sarsfield 15th December 2010 / 19:24

Always a challenge. I think the key to this is standardization before matching. You have to help the algorithms when you can. The standard description should be either light bulb or incandescent lamp, for example. A reference table would help standardize this and the other attributes. If the data can’t be changed for some reason, it’s not so uncommon to create new attributes that will only be used for matching.

Reply
Henrik Liliendahl Sørensen 15th December 2010 / 22:13

Thanks Steve. I agree. These are exactly the techniques I know is used around for solving challenges in product master data matching.

Reply
John Owens 16th December 2010 / 07:29

Hi Henrik

Building databases containing compound fields such as the examples you have given here is extremely poor design. It is the type of effort that might be expected from a 1st year computing student, one that would give the lecturer the opportunity to point out the plethora of errors it contained and why any good database designer should, and could, avoid them.

Attempting to work on and sort such data is also 1st year student level. It brings to mind the wonderful Irish saying If I wanted to get to Dublin I wouldn’t start from here!

All of the embedded attributes need to be identified and extracted into a properly structured tables, where each column represents an attribute.

The next step is to define which attributes represent the Unique Identifier of the entity and then sort on these to highlight duplicate rows. In a more sophisticated solution, many of these columns will be foreign keys to other reference and domain entities.

Jim Harris has done an excellent video showing how to start this process at http://www.youtube.com/watch?v=bdzD4Kp7weg

Thanks for the example, Henrik.

Reply
Henrik Liliendahl Sørensen 16th December 2010 / 07:50

Thanks John. I agree about the data modeling practice we need in handling this kind of data.

But I have come across multiple situations where even commercial software packages and home grown solutions made by veterans handle these data in compound fields. I have also been involved in the requirements needed in the best product master data hubs in the world. Not because the hub doesn’t handle the breakdown – it does. But data is coming in from outside your jurisdiction in great numbers from the solutions mentioned above (and spreadsheets and flat files of all kind).

And oh yes, I like Jim’s video too.

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Matching Light Bulbs

Related

4 thoughts on “Matching Light Bulbs”

Leave a comment Cancel reply