Some days ago Copenhagen was hit by the most powerful cloudburst ever measured here.
More powerful cloudbursts may be usual in warmer regions on the earth, but this one was very unusual at 55 degrees north.
Fortunately there was only material damage, but the material damage was very extensive. When you take a closer look you may divide the underground constructions into two categories.
The first category is facilities constructed with the immediate purpose of use in mind. Many of these facilities are still out of operation.
The second category is facilities constructed with the immediate purpose of use in mind but also designed to resist heavy pouring rain. These facilities kept working during the cloudburst. One example is the metro. If the metro was constructed for only the immediate purpose of use, being circling trains below ground, it would have been flooded within minutes, with the risk of lost lives and a standstill for months.
We have the same situation in data management. Things may seem just fine if data are fit for the immediate purpose of use. But when a sudden change in conditions hit, then you know about data quality.
It appears that the technology is now able to measure these localized weather events at a greater level of granularity than in the past. The analogy with data is we are also able to measure the quality of data because of technology and as a result expose many more anomalies than in the past. With weather one of the primary goals is to provide warnings to avoid loss of life. Loss of property can only be addressed with better building codes, techniques and materials. The architecture and design must be developed to withstand the forces. If not, putting plywood over the windows is of little value. With data, the architecture and systems must be designed to ensure data quality or else we too will be “putting up plywood”. Data quality must be addressed from an architectural and design perspective to be effective.
Another aspect of weather are long term patterns. Global warming being one such pattern. What events can we anticipate as a result of global warming? Understanding the patterns and anticipating the potential consequences is critical. Although weather related catastrophic events can occur without warning, anticipating and planning can provide warning and reduce the impact.
The most significant data quality problems tend to be preceded by patterns. But our current data quality practices are oriented towards “find and fix”. We must begin to look at data quality as a systemic problem and develop techniques to provide warning and reducing impact. If the data architecture is robust and data quality is addressed systemically we can reduce the impact and in some cases prevent the data anomalies from catastrophically affecting the organization.
Thanks a lot Richard for all the good analogies here. I really like the saying: “Data quality must be addressed from an architectural and design perspective to be effective.”