Valuable Inaccuracy

These days I’m involved in an activity in which you may say that we by creating data with questionable quality are making better information quality.

The business case is within public transit. In this particular solution passengers are using chip cards when boarding busses, but are not using the cards when alighting. This is a cheaper and smoother solution than the alternative in electronic ticketing, where you have both check-in and check-out. But a major drawback is the missing information about where passengers alighted, which is very useful information in business intelligence.

So what we do is that we where possible assume where the passenger alighted. If the passenger (seen as a chip card) within a given timeframe boarded another bus at a stop point which is on or near a succeeding stop point on the previous route, then we assume alighting was at that stop point though not recorded.

Two real life examples of doing so is where the passenger makes an interchange or where the passenger later on a day goes back from work, school or other regular activity.

An important prerequisite however is that we have good data quality regarding stop point locations, route assignments and other master data and their relations.    

Bookmark and Share

Real World Alignment

I am currently involved in a data management program dealing with multi-entity (multi-domain) master data management described here.

Besides covering several different data domains as business partners, products, locations and timetables the data also serves multiple purposes of use. The client is within public transit so the subject areas are called terms as production planning (scheduling), operation monitoring, fare collection and use of service.

A key principle is that the same data should only be stored once, but in a way that makes it serve as high quality information in the different contexts. Doing that is often balancing between the two ways data may be of high quality:

  • Either they are fit for their intended uses
  • Or they correctly represent the real-world construct to which they refer

Some of the balancing has been:

Customer Identification

For some intended uses you don’t have to know the precise identity of a passenger. For some other intended uses you must know the identity. The latter cases at my client include giving discounts based on age and transport need like when attending educational activity. Also when fighting fraud it helps knowing the identity. So the data governance policy (and a business rule) is that customers for most products must provide a national identification number.

Like it or not: Having the ID makes a lot of things easier. Uniqueness isn’t a big challenge like in many other master data programs. It is also a straight forward process when you like to enrich your data. An example here is accurately geocoding where your customer live, which is rather essential when you provide transportation services.

What geocode?

You may use a range of different coordinate systems to express a position as explained here on Wikipedia. Some systems refers to a round globe (and yes, the real world, the earth, is round), but it is a lot easier to use a system like the one called UTM where you easily may calculate the distance between two points directly in meters assuming the real world is as flat as your computer screen.

Bookmark and Share

Multi-Entity Master Data Quality

Master Data is the core entities that describe the ongoing activities in an organization being:

  • Business partners (who)
  • Products (what)
  • Locations (where)
  • Timetables (when)

Many Master Data Management and Data Quality initiatives is in first place only focused on a single entity type, but sooner or later you are faced with dealing with all entity types and the data quality issues that arises from combining data from each entity type.

In my experience business partner data quality issues are in many ways similar cross all different industry verticals while product master data challenges may be different in many ways when comparing companies in various industry verticals. The importance of location data quality is very different, so are the questions about timetable data quality.

A journey in a multi-entity master data world

My latest experience in multi-entity master data quality comes from public transportation.

The most frequent business partner role here is of course the passengers. By the way (so to speak): A passenger may be a direct customer but the payer may also be someone else. But it doesn’t really change anything with the need for data quality whether the passenger is defined as a customer or not, you will regardless of that have to solve problems with uniqueness and real world alignment.

The product sold to a passenger is in the first place a travel document like a single ticket or an electronic card holding a season pass. But the service worth something for the passenger is a ride from point A to point B, which in many cases is delivered as a trip consisting of a series of rides from point A via point C (and D…) to point B. Having consistent hierarchies in reference data is a must when making data fit for multiple purposes of use in disciplines as fare collection, scheduling and so on.

Locations are mainly stop points including those at the start and end of the rides. These are identified both by a name and by geocoding – either as latitude and longitude on a round globe or by coordinates in a flat representation suitable for a map (on a screen). The distance between stops is important for grouping stops in areas suitable for interchange, e.g. bus stops on each side of a road or bus stops and platforms at a rail station. Working with the precision dimension of data quality is a key to accuracy here.

Timetables changes over time. It is essential to keep track of timetable validity in offline flyers, websites with passenger information, back office systems and on-board bus computers. Timeliness is as ever vital here.

Matching transactions made by drivers and passengers in numerous on-board computers, by employees in back office systems and coming from external sources with the various master data entities that describes the transaction correctly is paramount in an effective daily operation and the foundation for exploiting the data in order to make the right decisions for future services.

Bookmark and Share