When travelling with the London Underground I have several times noticed that the onboard passenger information system is set wrong, typically as if we are going in the opposite direction as what was announced on the station and where the train actually is heading.
The reaction among the passengers to this data quality flaw varies. Most people who seem to be frequent commuters don’t seem to bother but keeps calm and carries on. Tourists on the other hand get confused and immediately try to appoint the culprit among them who apparently got them on the wrong train.
As the information system keeps on announcing the next station as the one we just left everyone not being new passengers keeps calm and carries on in the opposite direction of the data presented.
Big data quality issues
The problem with wrong journey settings in data collection within public transportation has actually been a challenge I have worked with a lot.
Besides confusing the passengers if presented on the onboard passenger information display and voicing, the data collection may also be corrupted leading to data quality issues when data is stored in a data warehouse or by other techniques in order to facilitate analysis of passenger travel patterns, how well the services applies to schedules and other reporting based on these big numbers of transaction data collected every day.
Aligning with master data
The challenge is to correctly join the transaction data with the right master data entities. A vehicle stop, and in some cases the passenger boarding and alighting, must be associated with the right product being a given journey on a given service according to a given time schedule.
Many other exploitations of big data shares the same basic data quality challenge. If we don’t get the transaction data joined correctly with the master data entities involved, any analysis and reporting may be going in the wrong direction.