Product Information Management (PIM) have over the recent years emerged as an important technology enabled discipline for every company taking part in a supply chain. These companies are manufacturers, distributor, retailers and large end users of tangible products requiring a drastic increased variety of product data to be used in ecommerce and other self-service based ways of doing business.
At the same time we have seen the raise of big data. Now, if you look at every single company, product data handled by PIM platforms perhaps does not count as big data. Sure, the variety is a huge challenge and the reason of being for PIM solutions as they handle this variety better than traditional Master Data Management (MDM) solutions and ERP solutions.
The variety is about very different requirements in data quality dimensions based on where a given product sits in the product hierarchy. Measuring completeness has to be done for the concrete levels in the hierarchy, as a given attribute may be mandatory for one product but absolutely ridiculous for another product. An example is voltage for a power tool versus for a hammer. With consistency, there may be attributes with common standards (for example product name) but many attributes will have specific standards for a given branch in the hierarchy.
Product information also encompasses digital assets, being PDF files with product sheets, line drawings and lots of other stuff, product images and an increasing amount of videos with installation instructions and other content. The volume is then already quite big.
Volume and velocity really comes into the game when we look at eco-systems of manufacturers, distributors and retailers. The total flow of product data can then be described with the common characteristics of big data: Volume, velocity and variety. Even if you look at it for a given company and their first degree of separation with trading partners, we are talking about big data where there is an overwhelming throughput of new product links between trading partners and updates to product information that are – or not least should have been – exchanged.
Within big data we have the concept of a data lake. A key success factor of a data lake solution is minimizing the use of spreadsheets. In the same way, we can use a data lake, sitting in the exchange zone between trading partners, for product information as elaborated further in the post Gravitational Collapse in the PIM Space.