There has been a quiz running on this blog with the question: What is the name of the current Pope of the Catholic Church?. Find the current standing of answers in the figure to the right.
It’s good to see a lot of different answers and indeed, a problem with the quiz is that all answers may be correct. While Francis is the name as pope in English chosen by Jorge Mario Bergoglio, the pope has other names in other languages as Frans in Danish and Norwegian, François in French, Franziskus in German and Francesco in Italian.
The quiz is actually bad as it has not included other good answers as Franciscus, the latin name, Francisco, the Spanish name, and Franciszek, the Polish name. The question in the quiz is too simple. What is meant by “the name” should be clarified: Is it the birth name, the chosen name as Pope in a given language or what?
Such problems are in fact very common related to what we often see as bad data quality, as it reflects two frequent issues which aren’t about the raw data:
- Data models are too simple. In this case we could be able to reflect different types of names: Birth name and what (sorry, believers) resembles a screen name. And names in various languages.
- Metadata is too weak. In this case it could be more precise what name we are collecting, if it is only one of the name types we need, for example chosen name in English. More about metadata on Wikipedia.
What other issues have you encountered seen as bad data quality, but which isn’t bad raw data?