Also for this year I have made this New Year resolution: I will try to avoid stupid mistakes that actually are easily avoidable.
Just before Christmas 2009 I made such a mistake in my professional work.
It’s not that I don’t have a lot of excuses. Sure I have.
The job was a very small assignment doing what my colleagues and I have done a lot of times before: An excel sheet with names, addresses, phone numbers and e-mails was to be cleansed for duplicates. The client had got a discount price. As usual it had to be finished very quickly.
I was very busy before Christmas – but accepted this minor trivial assignment.
When the excel sheet arrived it looked pretty straight forward. Some names of healthcare organizations and healthcare professionals working there. I processed the sheet in the Omikron Data Quality Center, scanned the result and found no false positives, made the export with suppressing merge/purge candidates and delivered back (what I thought was) a clean sheet.
But the client got back. She had found at least 3 duplicates in the not so clean sheet. Embarrassing. Because I didn’t ask her (as I use to do) a few obvious questions about what will constitute a duplicate. I have even recently blogged about the challenge that I call “the echo problem” I missed.
The problem is that many healthcare professionals have several job positions. Maybe they have a private clinic besides positions at one or several different hospitals. And for this particular purpose a given healthcare professional should only appear ones.
Now, this wasn’t a MDM project where you have to build complex hierarchy structures but one of those many downstream cleansing jobs. Yes, they exist and I predict they will continue to do in the decade beginning today. And sure, I could easily make a new process ending in a clean sheet fit for that particular purpose based on the data available.
Next time, this year, I will get the downstream data quality job done right the first time so I have more time for implementing upstream data quality prevention in state of the art MDM solutions.