A Place in Time

I remember when I had the first chemistry lesson in high school our teacher told us that we should forget all about the chemistry we had learned in primary school, because this was a too simply model not reflecting how the real world of chemistry actually work.

Since I have started working with data quality and master data management I have a pet peeve in data modeling, namely being the probably most common example of doing data modeling: The classic customer table. Example from a SQL tutorial here:

Compared to how the real world works this example has some diversity flaws, like:

  • state code as a key to a state table will only work with one country (the United States)
  • zipcode is a United States description only opposite to the more generic “Postal Code”
  • fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
  • The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

  • fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
  • company name should belong to an own party entity acting in the role as customer
  • address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

Now I know this is just a simple example from a tutorial where you should not confuse by adding too much complexity. Agreed.

However many home grown solutions in business life and even many commercial ready-made applications use that kind of a data model to describe one of the most important business entities being our customers.

It may be that such a model does fit the purpose of use in some operations. Sometimes yes, sometimes no. But when reusing data from such a model on enterprise level and when adding business intelligence you are in big trouble. That is why we need master data hubs and why we need to transform data coming into the master data hub.

From such a customer record we don’t create just one golden record. We make or link several different related multi-domain entities as:

  • The contact as a person in our party domain – maybe we knew her before
  • The company in our party domain – maybe we knew the sister as a supplier before
  • The address in our place (location) domain – maybe we knew that address as a place in time before

Bookmark and Share

6 thoughts on “A Place in Time

  1. Karen Lopez 22nd January 2011 / 15:42

    Ah, the curse of bad examples. Your post fits well with my usual caveat that ALL examples are wrong. They are usually wrong for the reason you state: that the real world is too complex. I agree, but there is no valid reason for authors to make most of the mistakes you point out here.

    When I do manuscript reviews, this is a common criticism of mine.

    We need to stop teaching bad habits.

    • Henrik Liliendahl Sørensen 22nd January 2011 / 17:41

      Thanks Karen for dropping by this place. Not completely sure if I want the honor having you reviewing some of my tutorials 🙂

  2. vishagashe 23rd January 2011 / 02:25

    Henrik,

    You are spot on with your observation….even though example is simplistic it makes a real and practical point.
    Another dimension of what you are saying is one of my pet peeves (since we are talking about pet peeves, thought I would add it here :))….while defining data models for operational systems, operational system developers/designers should consider impact on Analytical usage of the data. Very little or no attention is paid to the analytical usage of operational data….and this is where many of the legacy (and some newly developed) data models come short in satisfying holistic data/analytical requirements of the business.
    I had made similar argument around data quality (in the context of building operational applications) in one of my posts Inject Data Quality before It Becomes Data Quality Issue

    Vish Agashe

    • Henrik Liliendahl Sørensen 23rd January 2011 / 09:19

      Thanks Vish. Indeed, better data models upstream is far better than introducing the right data model downstream.

  3. Jackie Roberts 24th January 2011 / 14:11

    I just worked on a data migration where in the zip code field if patient was from other than the US, it was noted as “foreign”.

    Any data migration projects, one of the first questions I ask is “What are your data work arounds?”

    • Henrik Liliendahl Sørensen 24th January 2011 / 16:57

      Good question Jackie: “What are your data work arounds?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s