When matching database records holding data about a person we traditionally use string attributes as Citizen/Tax ID, Name, Address, Phone, Email.
Today I stumbled over a company called Polar Rose that specialize in recognition of peoples faces on pictures. Current use is tagging people on Facebook pictures, but really, this technology could make Data Matching, Identity Resolution and Deduplication better.
We already know fuzzy matching with names and addresses have plenty of challenges with false positives and false negatives. Surely I also do imaging same issues with facial recognition. But we also know from comparing with strings that the more different information we may gather, the better we are at avoiding false matching. So combining fuzzy string matching and facial recognition (where picture is available) could add more human mimic to matching technology reliability.
Right now I am considering whether to add this feature to Data Quality 2.0 or leave it for Data Quality 3.0.
Indeed this software exits and causes a true privacy and personal security issue. Also for this reason, face recognition, we recommand our clients to stay away from face book etc.
Regards, Ronald
The more properties you have to match on, the better that match can be (in terms of false positives, but also in terms of false negatives). We, at WCC, refer to that concept as ‘multi modal’.
Multi modal not ncessarily implies the inclusion of biometrics into the mix, although it is commonly used in the industry to indicate a match on different biometrics.
We achieve very good results with applying multi modal fusion, in terms of quality improvement, but also in performance improvement. The latter is relevant in genuinely very large scale applications.
It is certainly true what Ronald writes, that it may be seen as privacy invasion. However, several companies work very hard, as we speak, on solving that issue, including priv-ID, Genkey and Anonymous Recognition.
As a last note on multi modal fusion, in the ideal world all properties are captured and are of good enough quality. In practice this typically is not the case. Multi modal fusion allows to take the available data of whatever quality and intelligently combine that to come up with the best possible matches.
Hope this is informative, Peter