Three Ways of Finding a Product

One goal of Product Information Management (PIM) is to facilitate that consumers of product information can find a product they are looking for. Facilitating that includes feasible functionality and optimal organization of data.


There is a whole industry making software that helps with searching for products as touched in the post Search and if you are lucky you will find.

However, even the best error tolerant and super elastic search engines are dependent on the data to search on and are challenged by differences in the taxonomy used by the one who searches and the taxonomy used in the product data.

As we are being better at providing more and more data about products that also makes issues in searching, as we are getting more and more hits of which many are irrelevant for the intention of a given search.

Drill down

You can start by selecting in what main group of products you are looking for something and then drill down through a more and more narrow classification.

Again, this approach is challenged by different perspectives of product grouping and even if we are looking for standards, there are too many of them as described in the post Five Product Classification Standards.


The term traverse has (or will) become trendy with the introduction of graph technology. By using graph technology in Product Information Management (PIM) you will have a way of overcoming the challenges related to using search or drill down when looking for a product.

Big HammerFinding a product has in many use cases the characteristic of that we know some pieces of information and want to find a product that match those pieces of information, but often expressed in a different way. This fit very well with the way graph technology works by having a given set of root nodes from where we traverse through edges and nodes (also called vertices) until we end at reachable nodes of the wanted type.

In doing that we will be able to translate between different wording, classifications and languages.

At Product Data Lake we are currently exploring – or should I say traversing – this space. I will very much welcome your thoughts on this subject.

Painting WWII Bombers and Product Data: It Is All in the Details

Today’s guest blog post is from Dan O’Connor, a United States based product data taxonomy guru. Here are Dan’s thoughts on product data quality:

I have had a few days off this past week while I transition to a new role. During that time, I’ve had time to reflect on many things, as well as pursue some personal interests. I talked with peers and former co-workers, added a fresh coat of paint to my basement, and worked on some WWII era bomber models I purchased before Christmas but never had time for.

bomberpic1The third pursuit was a rather interesting lesson in paying attention to details. The instructions would say to paint an individual piece one color, but that piece would comprise of several elements that should never be painted a single color. For example, the flight yokes on Mitchell were planned to be painted black, but in viewing pictures online I saw that certain parts were white, red and aluminum. I therefore painted them appropriately. These yokes are less than an inch long and a couple millimeters wide, but became much more impressive with an appropriate smattering of color.

Flight Yokes and Product Taxonomies

It is this attention to detail that made me think about how product taxonomies are developed. Some companies just follow the instructions, and end up with figurative “black flight yokes”. These taxonomies perform adequately, allowing a base level of product detail to be established. Web sites and catalogs can be fed with data and all is well.

Other companies see past the black flight yokes. They need the red buttons, the white grips, and the silver knobs because they know these data points are what make their product data more real. They could have followed the instructions, but being better than the instructions was more important.

Imagine for a second that the instructions were the mother of the data and the plane itself was the father. According to the mother plain black flight yokes are sufficient. The father, while capable of being so much more, ends up with the dull data the mother provides. Similarly, if the plane/father has no options that allow it to be more colorful the instructions from the mother are meaningless beyond the most basic interpretations.

The Mother and Father of Product Data

To some my analogy might be a stretch, but think of it in these terms: Your product taxonomy is the mother of your product data, and the architecture that supports that taxonomy is the father. If your taxonomy only supports a generic level of data, the architecture supporting it cannot add more detail. If the architecture is limited the most robust product taxonomy will still only support the most basic of data. Your product data quality is limited by the taxonomy you build and the systems you use to manage it. If both are well-developed beautiful product data is born. If one or both is limited your product data will be an ugly mess.

Why is this important? Product data does more than validate the image has the right color on a web site, or make sure an item will fit in your kitchen or TV room. Product data feeds faceting experiences so that customers to your web site can filter down to the perfect product. Without facets customers have to search manually through more products, and may get frustrated and leave your web site before finding the item they want.

Product data also can feed web site search, allowing customers to find your products using product descriptors instead of just product numbers and short descriptions. These search options also filter out unnecessary results, allowing a customer to find the perfect product faster.

Product data might also be used by the marketplaces that sell your data, your catalogs, product data sheets, and even your shelf tags in your retail locations. Having one consistent source of data for those usages avoids customer confusion when they approach your business from an omni-channel perspective. Having to find a product on a shelf when the mobile experience has a different description is painful and leads to bad customer experiences.

Lastly, moving data between your business and others is problematic at the best of times. Poor product data leads to bad data dissemination, which leads to bad customer experiences across your syndication channels. If you cannot represent your data in a single logically message internally your external message will be chaotic and confusing for your guests.

The Elements of a Product Data Program

Therefore, creating a good product taxonomy is not just about hiring a bunch of taxonomists and having them create a product taxonomy. It is about taxonomy best practices, data governance, and understanding your entire product data usage ecosystem, both internally and externally. It is understanding what role Product Information Management systems play in data management, and more importantly what role they do not.

Therefore, in the analogy of a mother product taxonomy and a father architecture creating data, there are siblings, aunts, uncles, and other relatives to understand as well. A lack of understanding in any one of these relationships can cause adverse data quality issues to shine through. It is estimated that companies lose an average of $8 Million US dollars a year (ROI on Data Quality, 2014) due to data quality issues. Can your business afford to keep ignoring your product data issues?

Dan O’Connor is a Product Taxonomy, Product Information Management (PIM), and Product Data Consultant and an avid blogger on taxonomy topics. He has developed taxonomies for major retails as well as manufacturers and distributors, and assists with the development of product data models for large and small companies. See his LinkedIn bio for more information.

We Need Better Search

Often we have all the information we need. What we don’t have is the right means to search in and make sense of all the information.

It’s now been a little more than a year since the terrible terrorist attacks in Norway carried out by a right-wing extremist.

Since then an investigation have been done in order to find out if the tragic incident could have been avoided. A report is due for tomorrow, but bits and pieces are already flowing in the press now.

Today the Norwegian newspaper Aftenposten has an article telling about the inadequate searching features available to the Norwegian Police Intelligence. Article in Norwegian here.

As I understand it the Police Intelligence did have a few registrations about suspicious activities by the terrorist. Probably not enough to act upon before the tragedy. But even if they had more information they wouldn’t have been able to match it with the technology available and prevent the attacks.

It’s a shame.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

The Big Search Opportunity

The other day Bloomberg Businessweek had an article telling that Facebook Delves Deeper Into Search.

I have always been advocating for having better search functionality in order to get more business value from your data. That certainly also applies to big data.

In a recent post called Big Reference Data Musings here on the blog, the challenge of utilizing large external data sources for getting better master data quality was discussed. In a comment Greg Leman pointed out, that there often isn’t a single source of the truth, as you for example could expect from say a huge reference data source as the Dun & Bradstreet WorldBase holding information about business entities from all over the world.

Indeed our search capabilities optimally must span several sources. In the business directory search realm you may include several sources at a time like supplementing the D&B  WorldBase with for example EuroContactPool, if you do business in Europe, or the source called Wiki-Data (under rename to AvoxData) if you are in financial services and wants to utilize the new Legal Entity Identifier (LEI) for counterparty uniqueness in conjunction with other more complete sources.

As examined in Search and if you are lucky you will find combining search on external reference data sources and internal master data sources is a big opportunity too. In doing that you, as described the follow up piece named Wildcard Search versus Fuzzy Search, must get the search technology right.

I see in the Bloomberg article that Facebook don’t intend to completely reinvent the wheel for searching big data, as they have hired a Google veteran, the Danish computer scientist Lars Rasmussen, for the job.

Bookmark and Share

Wildcard Search versus Fuzzy Search

My last post about search functionality in Master Data Management (MDM) solutions was called Search and if you are lucky you will find.

In the comments the use of wildcards versus fuzzy search was touched.

The problem with wildcards

I have a company called “Liliendahl Limited” as this is the spelling of the name as it is registered with the Companies House for England and Wales.

But say someone is searching using one of the following strings:

  • “Liliendahl Ltd”,
  • “Liliendal Limited” or
  • “Liljendahl Limited”

Search functionality should in these situations return with the hit “Liliendahl Limited”.

Using wildcard characters could, depending on the specific syntax, produce a hit in all combinations of the spelling with a string like this: “lil?enda*l l*”.

The problem is however that most users don’t have the time, patience and skills to construct these search strings with wildcard characters. And maybe the registered name was something slightly else not meeting the wildcard characters used.  

Matching algorithms

Tools for batch matching of name strings have been around for many years. When doing a batch match you can’t practically use wildcard characters. Instead matching algorithms typically rely of one, or in best case a combination, of these techniques:

The same techniques can be used for interactive search thus reaching a hit in one fast search.

Fuzzy search

I have worked with the Omkron FACT algorithm for batch matching. This algorithm has morphed into being implemented as a fuzzy search algorithm as well.

One area of use for this is when webshop users are searching for a product or service within your online shop. This feature is, along with other eCommerce capabilities, branded as FACT-Finder.

The fuzzy search capabilities are also used in a tool I’m involved with called iDQ. Here external reference data sources, in combination with internal master data sources, are searched in an error tolerant way, thus making data available for the user despite heaps of spelling possibilities.

Bookmark and Share

Search and if you are lucky you will find

This morning I was following the tweet stream from the ongoing Gartner Master Data Management (MDM) conference here in London, when another tweet caught my eyes:

This reminded me about that (error tolerant) search is The Overlooked MDM Feature.

Good search functionality is essential for making the most out of your well managed master data.

Search functionality may be implemented in these main scenarios:

Inside Search

You should be able to quickly find what is inside your master data hub.

The business benefits from having fast error tolerant search as a capacity inside your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

MDM inside search capabilities applies to multiple domains: Party, product and location master data.

Search the outside

You should be able to quickly find what you need to bring inside your master data hub.

Data entry may improve a lot by having fast error tolerant search that explores the cloud for relevant data related to the entry being done. Doing that has two main purposes:

  • Data entry becomes more effective with less cumbersome investigation and fewer keystrokes.
  • Data quality is safeguarded by better real world alignment.

Preferably the inside and the outside search should be the same mash-up.

Searching the outside is applies especially to location and party master data.

Search from the outside

Website search applies especially to product master data and in some cases also to related location master data as described in the post Product Placement.

Your website users should be able to quickly find what you publish from your master data hub be that description of physical products, services or research documents as in the case of Gartner, which is an analyst firm.

As said in the tweet on the top of this post, (good) search makes the life of your coming and current customers much easier. Do I need to emphasize the importance of good customer experience?

Bookmark and Share

Reference Data at Work in the Cloud

One of the product development programs I’m involved in is about exploiting rich external reference data and using these data in order to get data quality right the first time and being able to maintain optimal data quality over time.

The product is called instant Data Quality (abbreviated as iDQ ™). I have briefly described the concept in an earlier post called instant Data Quality.

iDQ ™combines two concepts:

  • Software as a Service
  • Data as a Service

While most similar solutions are bundled with one specific data provider the iDQ ™ concept embraces a range data sources. The current scope is around customer master data where iDQ ™ may include Business-to-Business (B2B) directories, Business-to-Consumer (B2C) directories, real estate directories, Postal Address Files and even social media network data from external sources as well as internal master data at the same time all presented in a compact mash-up.

The product has already gained a substantial success in my home country Denmark leading to the formation of a company solely working with development and sales of iDQ ™.

The results iDQ ™ customers gains may seem simple but are the core advantages of better data quality most enterprises are looking for, like said by one of Denmark’s largest companies:

“For DONG Energy iDQ ™ is a simple and easy solution when searching for master data on individual customers. We have 1,000,000 individual customers. They typically relocate a few times during the time they are customers of us. We use iDQ ™ to find these customers so we can send the final accounts to the new address. iDQ ™ also provides better master data because here we have an opportunity to get names and addresses correctly spelled.

iDQ ™ saves time because we can search many databases at the time. Earlier we had to search several different databases before we found the right master data on the customer. ”

Please find more testimonials here.

I hope to be able to link to testimonials in more languages in the future.

Bookmark and Share

Matching Light Bulbs

This morning I noticed this lightbulb joke in a tweet from @mortensax:

Besides finding it amusing I also related to it since I have used an example with light bulbs in a webinar about data matching as seen here:

The use of synonyms in Search Engine Optimization (SEO) is very similar to the techniques we use in data matching.

Here the problem is that for example these two product descriptions may have a fairly high edit distance (very different character by character), but are the same:

  • Light bulb, A 19, 130 Volt long life, 60 W
  • Incandescent lamp, 60 Watt, A19, 130V

while these two product descriptions have an edit distance of only one substitution of a character, but are not the same product (though being same category):

  • Light bulb, 60 Watt, A 19, 130 Volt long life
  • Light bulb, 40 Watt, A 19, 130 Volt long life

Working with product data matching is indeed very enlightening.

Bookmark and Share

The Overlooked MDM Feature

When engaging in the social media community dealing with master data management an often seen subject is creating a list of important capabilities for the technical side of master data management. I have at some occasions commented on such posts by adding a feature I often see omitted from these lists, namely: Error tolerant search functionality. Examples from the DataFlux CoE blog here and the LinkedIn Master Data Management Interest Group here.

Error tolerant search (also called fuzzy search) technology is closely related to data matching technology. But where data matching is basically none interactive, error tolerant search is highly interactive.

Most people know error tolerant search from googling. You enter something with a typo and google prompts you back with: Did you mean…? When looking for entities in master data management hubs you certainly need something similar. Spelling of names, addresses, product descriptions and so on is not easy – not at least in a globalized world.

As in data matching error tolerant search may use lists of synonyms as the basic technology. But also the use of algorithms is common going from an oldie like the soundex phonetic algorithm over more sophisticated algorithms.

The business benefits from having error tolerant search as a capacity in your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

Error tolerant search has been one of the core features in the master data management implementations where I have been involved. What about you?

Bookmark and Share