I can’t help making analogies between data quality and food and drink even that I am actually not on any kind of diet these days.
Today’s subject is the similarities between metadata and meatballs.
Metadata is loosely defined as data about data. Some data describing what is meant to be in a dataset and a data element, what the purpose is and what standards are used.
The problem with metadata is if everybody understands the same when you use a certain term when creating metadata. Despite best intensions there will probably always be someone, somewhere getting something different from your wordings.
That’s where meatballs come into the context.
If you read the article about meatballs on Wikipedia you’ll get the picture. Yes, meatballs have some common characteristics around the world. Some minced meat (or fish (if not vegetarian style)) mixed with some additional ingredients exposed to heat in some way and served with something different depending on where on earth you are.
Having a metadata repository is good for data and information quality.
The challenge in filling out a metadata repository is the balancing between describing how meatballs should be (your mom’s recipe) and how meatballs could be.
I always enjoy your food analogies.
Regarding data, too many projects are “Metadata driven” – i.e. they proceed on the basis of what the metadata says “should be in the data”. This of course is a recipe for disaster.
As you know Henrik, one must always profile the data first to find out what is “actually in the data”.
Thanks Ken. Good point. Data profiling is the way of discovering what could be in your data. Boiled ground meat served with spaghetti and so on…
Building on Ken’s point, if meta data efforts focus too much on what “should” be, versus what “is” true about the data, then there can be a big gap in knowledge about the current state of the data assets, which in turn makes progress harder to make.
Data profiling creates a different kind of “data about your data”, which is that current state reality, and which is so crucial to getting closer to the “should be” state…
Jaime, thanks for commenting. I like the “data about your data” variant.