Man versus Computer

Jim Harris 15th October 2009 / 15:58

Excellent post Henrik,

In data quality, I definitely vote for “Man” over “Computer.”

Risking (pardon the pun) the mixture of metaphors, I have blogged about how “There are no Magic Beans for Data Quality”:

http://www.ocdqblog.com/home/there-are-no-magic-beans-for-data-quality.html

And in Phil Simon’s recent post “Kranzberg’s Six Laws of Technology”:

http://philsimonblog.com/2009/09/30/kranzberg_six/

On Law # 6: “Technology is a very human activity,” I commented about how many talk about how “people, process, technology” are all important for successful initiatives, but without people, process and technology are useless.

Although incredible advancements continue, technology alone cannot provide the solution.

Best Regards…

Jim

Reply
Vish Agashe 15th October 2009 / 23:30

Henrik,

Excellent post as always. I would say that it is the human who makes the ultimate decision but as you say in the article …. we have to make wise decisions about using computers/technology where ever appropriate to reduce the cost and to increase scalability of humans. Use humans to make the final/critical decision but let the machine do most of the grunt work based on patterns and algorithms.

Regards

Vish

Reply
Francisco Correia 18th October 2009 / 23:08

In Portuguese, dices and data share the same word: dados. But dices are more trustful …

Reply
Henrik Liliendahl Sørensen 19th October 2009 / 06:25

Jim, thanks. Your post about Magic Beans – and its commendable comments – is really worth reading. Perhaps we should have a law that every data quality vendor offering must have this blog post text included. A bit – but not completely – like warnings on cigarettes.

Thanks Vish, your perception is so close to mine. My mission is trying to make and configure technology that accurately helps people with the hard and repeatable work in data quality and makes room for people advancing on more challenges.

Francisco – what a quote!! So lucky I now have that one on my blog. Made my day. Thanks.

Reply
kenoconnordataconsultant 19th October 2009 / 13:18

Henrik,

This is an important debate, thank you for starting it. Jim’s Magic Beans post, and the comments – are very informative.

Like you, I agree with Vish Agashe “Use humans to make the final/critical decision but let the machine do most of the grunt work based on patterns and algorithms.”

I have worked on the development of Anti Money Laundering (AML) systems. AML systems perform Financial Transaction Monitoring. They could not function without analytics. They monitor Transaction Activity on millions of accounts. The purpose of the analytics is to identify “Transaction Activity that is unusual when compared to an account holder’s peers”. The AML system alerts a human to study the unusual transaction activity. The human then seeks to “explain away” the unusual activity as ‘normal’, e.g. Once off sale of an asset. If the human cannot find a good reason for the unusual transaction activity, they report it to the authorities as “Suspicious”.

In my opinion, AML systems provide a good example of the pragmatic combining of analytics and humans – for the good of society.

Having said the above, the best AML system in the world cannot provide meaningful AML alerts if the quality of the underlying data does not permit the identification of peer groups (for example).

This brings us right back to your question “So, what about data quality? Is it man or computer who is best at solving the matter.”

I believe we face two distinct data quality challenges:
1. How to ‘stop the rot” to prevent more Garbage (poor quality data) entering systems.
2. How to clean up the existing Garbage.

Both solutions need to apply the same ‘business rules’. I believe Man is best at devising, and proving the solutions. Once tried and proven, Computer is best at allowing the solution to be applied at scale.

Rgds Ken

Reply
Henrik Liliendahl Sørensen 19th October 2009 / 14:30

Thanks Ken. When you talk about large volumes of transaction data that has to analyzed, identified with and measured based on master data I am working with exactly the same kind of challenges in a project within public transportation. Solving the data quality issues is needed before any meaningful decisions can be made upon these data. This couldn’t be done without computers doing the hard work.

My approach in such a project is not for man to settle the rules and controls and then apply the technology. It’s an iterative process where we start with the known main pain, put the computer at work, we evaluate the results, consider the options and probably apply some more knowledge into the computer and then goes around the circle once again.

Reply
kenoconnordataconsultant 20th October 2009 / 10:27

Henrik, if I understand you correctly, your iterative process is something like:

1. Use what we know about the business rules – e.g. we expect that datafield1 should contain values X,Y,Z

2. Perform Data Profiling to find out what datafield1 actually contains.

3. Update the business rules to incorporate the new knowledge about ‘valid exceptions’ etc.

Repeat

Rgds Ken

Reply
Henrik Liliendahl Sørensen 20th October 2009 / 15:34

Ken, yes – and

• The tasks (in further iterations) you will have the computer doing also includes standardization, correction, matching, linking, enriching
• You may use the computer for settling the obvious situations (based on business rules) and flagging the dubious for human intervention

The iteration goes for:

• Batch processing of initial data inventory where you discover the bulk of challenges
• Ongoing prevention where you may act on new challenges
• Often new rule inventions may require batch reprocessing on current data inventory
• Even new technology (or reference data) may be applied on issues not solvable before

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

8 thoughts on “Man versus Computer”

Leave a comment Cancel reply

Related

8 thoughts on “Man versus Computer”

Leave a comment Cancel reply