Going Upstream in the Circle

One of the big trends in data quality improvement is going from downstream cleansing to upstream prevention. So let’s talk about Amazon. No, not the online (book)store, but the river. Also as I am a bit tired about that almost any mention of innovative IT is about that eShop.

A map showing the Amazon River drainage basin may reveal what may go to be a huge challenge in going upstream and solve the data quality issues at the source: There may be a lot of sources. Okay, the Amazon is the world’s largest river (because it carries more water to the sea than any other river), so this may be a picture of the data streams in a very large organization. But even more modest organizations have many sources of data as more modest rivers also have several sources.

By the way: The Amazon River also shares a source with the Orinoco River through the natural Casiquiare Canal, just as many organizations also shares sources of data.

Some sources are not so easy to reach as the most distant source of the Amazon being a glacial stream on a snowcapped 5,597 m (18,363 ft) peak called Nevado Mismi in the Peruvian Andes.

Now, as I promised that the trend on this blog should be about positivity and success in data quality improvement I will not dwell at the amount of work in going upstream and prevent dirty data from every source.

I say: Go to the clouds. The clouds are the sources of the water in the river. Also I think that cloud services will help a lot in improving data quality in a more easy way as explained in a recent post called Data Quality from the Cloud.

Finally, the clouds over the Amazon River sources are made from water evaporated from the Amazon and a lot of other waters as part of the water cycle. In the same way data has a cycle of being derived as information and created in a new form as a result of the actions made from using the information.

I think data quality work in the future will embrace the full data cycle: Downstream cleansing, upstream prevention and linking in the cloud.

Bookmark and Share

6 thoughts on “Going Upstream in the Circle

  1. John Owens 21st July 2010 / 08:58

    An excellent analogy, Henrik.

    In far too many enterprises the number of ways in which information can trickle, or indeed flood, into the mainstream of data is innumerable and, mostly uncontrolled – like the innumerable streams and rivers flowing into the Amazon.

    It is tempting to think that the Cloud (again, another great analogy you used) could solve all such problems. Alas, it would actually make things worse as rain can fall uncontrolled anywhere in the catchment area.

    What is needed in order to control the flow of data is the equivalent of canals and waterways, where the points of entry and the rate of flow can be controlled and the water (data) can be fed in a controlled manner to where it is needed when it is needed.

    The canals an waterways of data are provided by properly implemented business functions, business rules and data structures.


    • Henrik Liliendahl Sørensen 21st July 2010 / 09:17

      Thanks John. No doubt: Properly implemented business functions, business rules and data structures are, and have always been, essential.

      And of course the analogy stops at how far we should go with straighten out nature opposite to business processes.

  2. Dylan Jones 21st July 2010 / 09:16

    The biggest problem in most large businesses is information and application bloat.

    Too many overlapping systems, creating too much redundant information, creating too many information chains, creating too many opportunities for defective data.

    I think the cloud does have a role here, by eliminating systems, keeping integration and proliferation to a minimum I agree with Henrik that data quality can improve.

  3. Steve Sarsfield 21st July 2010 / 19:15

    So true. We’ve been all about cleaning the lake at the end of the river, which has been accumulating pollution for years. We’ve got to move upstream.

  4. Nigel Thomas 22nd July 2010 / 11:04

    Hi Henrik,
    For sure ‘clouds’ and reference data play a role, however not all data comes from the clouds (there is plenty of unstructrued data that is created). I think it is misleading to talk of clouds as a panacea for data quality. Clouds are a solution with certain benefits, however the more fundamental strategy is simply the use of reference data and trusted source data, irrespective of where it is implemented.

    Anyway, i like the analogy of the river and polluted lake, and canals as creating structure. It helps the layman / our business sponsors relate to our landscape.
    Best regards, Nigel

  5. Henrik Liliendahl Sørensen 22nd July 2010 / 11:49

    Thanks Dylan, Steve and Nigel.

    Certainly there is no panacea for solving the data quality conundrum. Streamlining business processes is a preferable way, but not always as easy done as said; downstream cleansing takes away the worst pain; upstream prevention helps a lot and the cloud will provide easier and more affordable ways of getting external reference data into (better) business processes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s