I have earlier written about how search terms are a way people gets to my blog in the post Picture This.

Another way is being referred from other sources. Lately WordPress, which is my blog service, improved the statistics so the referring sources are consolidated which gives you much more meaningful information about your referrers.

My current all time statistics looks like this:

At the time the total number of pageviews was 46,263.

LinkedIn seems to be my main supplier of readers. I am regularly sharing my posts as status updates and as news items in different LinkedIn groups.

But I do think that the figures for Twitter is lying though as they are counted based on where from the tweets and re-tweets are read. Twitter is probably only the twitter site. Hootsuite is another way of reading and clicking on links to a blog in a tweet. People who read and click via TweetDeck is as I understand it not counted as a referring source as TweetDeck is a desktop application.

Though I write in English I do from time to time post user blogs and comments with links on Danish language sources as the local Computerworld and another IT online news site called Version2.   

When someone, which in my case mainly is Rich Murnane I think, StumblesUpon a blog post you sometimes get a lot of pageviews within an hour or so.

Else Jim Harris’s blog called OCDQ Blog is a constant source of referring either due to Jim’s kind links to my blog posts or my self-promoting links in my comments on Jim’s blog posts.

Bookmark and Share

Matching Down Under

As a data matching geek I always love reading about how others have made the great but fearful journey into the data matching world.

This week Wayne Colless of the Australian Attorney-General’s Department kindly made a document about data matching public on the DataQualityPro site. The full title is “Improving the Integrity of Identity Data – Data Matching Better Practice Guidelines, 2009”. Link here.

As Wayne explains in a discussion in the LinkedIn Data Matching group: Australia has no national unique identifier for individuals (such as the US SSN or the number recorded on national ID cards used in many other countries) that can be used, so the matching has to involve only non-unique values such as name, address and dates of birth.

The document gives a very thorough step by step guidance into matching individual’s names, addresses and birthdays. As the document says you may either build all the logic yourself or you may buy commercial software that does the same. But anyway you have to understand what the software does in order to tune the processes and set the thresholds meaningful to you.

As Australia is a nation mainly born through immigration the challenges with adapting the ruling Anglo-Saxon naming conventions to the reality of name formats coming from all over the world is very apparent. I like that the diversity issues is given a good thought in the document.

I also like that the document addresses a subject not mentioned as often as it should be, namely the challenges with embracing historical values in settling a match as seen in this figure taken from the document:

Whether you think you already know the dos and don’ts in data matching (and I guess you never know that) I really find the document worth reading.   

Bookmark and Share

My Secret

Yesterday I followed a webinar on DataQualityPro with ECCMA ISO 8000 project leader Peter Benson.

Peter had a lot of good sayings and fortunately Jim Harris as a result of his live tweeting has documented a sample of good quotes here.

My favorite:

“Quality data does NOT guarantee quality information, but quality information is impossible without quality data.”

I have personally conducted an experiment that supports that hypothesis. It goes as this:

First, I found a data file on my computer. Lots of data in there being numbers and letters. And sure, what is interesting is the information I can derive for different purposes.

Then I deleted the data file and tried to see how much information was left behind.

Guess what? Not a bit.

I first published that experiment as a comment to one of Jim’s blog posts: Data Quality and the Cupertino Effect.

As documented in the comments on this blog post the subject of data (quality) versus information (quality) is ever recurring and almost always guarantees a fierce discussion among data/information management professionals.

So, I’ll just tell you this secret: My work in achieving quality information is done by fixing data quality.

And guess what? I have disabled comments on this blog post.

Bookmark and Share

What’s in a Blog Post Title?

I don’t know about you. But I am a slave to numbers and statistics and can’t help following my WordPress statistics telling me about pageviews – not at least pageviews per post.

There are huge differences in the number of visitors who views the different posts. The post with highest number of views on my blog has +2.500 views and the post with the lowest number has only 15 views.

To be honest, the ones with over 500 views are mainly visited due to some image search circumstances explained here, so views actually related to data quality varies between 15 and approximately 500. That’s still a huge difference.

I have still to find out precisely what makes the difference.

It can’t be the content, can it? Basically people don’t know the content before opening.

No doubt that time of posting – not to mention time of telling about posting on sites as Twitter and LinkedIn has something to say. On twitter the re-tweet action is important I have noticed. And of course re-tweet action relies on time and that the first readers found the content worth a re-tweet.

There is surely also a relation between number of comments and numbers of views. I see that in my numbers.

Obviously the title of the blog must be important. But from my numbers I can’t figure out how, except from an observation about that a technical title seem to rule over philosophical stuff as discussed here last year on DataQualityPro.

So, the title of this post is not the preface of explaining it all but a genuine question to you who by some reason came by:  What’s in a Blog Post Title?

Bad word?: Data Owner

When reading a recent excellent blog post called “How to Assign a Data Owner” by Rayk Fenske I once again came to think about how I dislike the word owner in “Data Owner” and “Data Ownership”.

I am not alone. Recently Milan Kucera expressed the same feelings on DataQualityPro. I also remember that Paul Woodward from British Airways on MDM Summit Europe 2009 said: Data is owned by the entire company – not any individuals.

My thoughts are:

  • Owner is a good word where we strive for fit for a single purpose of use in one silo
  • Owner may be a word of choice where we strive for fit for single purposes of use in several silos
  • Owner is a bad word where we strive for fit for multiple purposes of use in several silos

Well, I of course don’t expect all the issues raised by Rayk will disappear if we are able to find a better term than “Data Owner”.

Nevertheless I will welcome better suggestions for coining what is really meant with “Data Ownership”.

Bookmark and Share

Master Data Survivorship

A Master Data initiative is often described as making a “golden view” of all Master Data records held by an organization in various databases used by different applications serving a range of business units.

In doing that (either in the initial consolidation or the ongoing insertion and update) you will time and again encounter situations where two versions of the same element must be merged into one version of the truth.

In some MDM hub styles the decision is to be taken at consolidation time, in other styles the decision is prolonged until the data (links) is consumed in a given context.

In the following I will talk about Party Master Data being the most common entity in Master Data initiatives.

mergeThis spring Jim Harris made a brilliant series of articles on DataQualityPro on the subject of identifying duplicate customers ending with part number 5 dealing with survivorship. Here Jim describes all the basic considerations on how some data elements survives a merge/purge and others will be forgotten and gives good examples with US consumer/citizens.

Taking it from there Master Data projects may have the following additional challenges and opportunities:

  • Global Data adds diversity into the rule set of consolidation data on record level as well as field level. You will have to comprise on simple global rules versus complex optimized rules (and supporting knowledge data) for each country/culture.
  • Multiple types of Party Master Data must be handled when Business Partners includes business entities having departments and employees and not at least when they are present together with consumers/citizens.
  • External Reference Data is becoming more and more common as part of MDM solutions adding valid, accurate and complete information about Business Partners. Here you have to set rules (on field level) of whether they override internal data, fills in the blanks or only supplements internal data.
  • Hierarchy building is closely related to survivorship. Rules may be set for whether two entities goes into two hierarchies with surviving parts from both or merges as one with survivorship. Even an original entity may be split into two hierarchies with surviving parts.

What is essential in survivorship is not loosing any valuable information while not creating information redundancy.

An example of complex survivorship processing may be this:

A membership database holds the following record (Name, Address, City):

  • Margaret & John Smith, 1 Main Street, Anytown

An eShop system has the following accounts (Name, Address, Place):

  • Mrs Margaret Smith, 1 Main Str, Anytown
  • Peggy Smith, 1 Main Street, Anytown
  • Local Charity c/o Margaret Smith, 1 Main Str, Anytown

A complex process of consolidation including survivorship may take place. As part of this example the company Local Charity is matched with an external source telling it has a new name being Anytown Angels. The result may be this “golden view”:

ADDRESS in Anytown on Main Street no 1 having
• HOUSEHOLD having
– CONSUMER Mrs. Margaret Smith aka Peggy
– CONSUMER Mr. John Smith
• BUSINESS Anytown Angels having
– EMPLOYEE Mrs. Margaret Smith aka Peggy

Observe that everything survives in a global applicable structure in a fit hierarchy reflecting local rules handling multiple types of party entities using external reference data.

But OK, we didn’t have funny names, dirt, misplaced data…..

Bookmark and Share

Guerrilla Data Quality

Estatua_La_GalanaOh yes, in my crazy berserkergang of presenting stupid buzzword suggestions it’s time for “Guerrilla Data Quality”. And this time there is no previous hits on google to point at as the original source.

But I noticed that “Guerrilla Data Governance” is in use and as Data Governance and Data Quality are closely related disciplines, I think there could be something being “Guerrilla Data Quality”.

Also recently an article called “How to set data quality goals any business can achieve” was published by Dylan Jones on DataQualityPro. Here the need for setting short term realistic goals is emphasised in contrast to making a full size enterprise wide all domain massive initiative. This article sets focus on the people and process side of what may be “Guerrilla Data Quality”.

Recently I wrote a blog post called “Driving Data Quality in 2 Lanes” focussing on the tool selection for what may be “Guerrilla Data Quality” and the enterprise wide follow up.

Actually I guess most Data Quality activity going on is in fact “Guerrilla Data Quality”. The problem then is that most literature and teaching on Data Quality is aimed at the massive enterprise wide implementations.

Any thoughts?

LinkedIn Group Statistics

LinkedInI am currently a member of 40 LinkedIn groups mostly targeted at Master Data Management, Data Quality and Data Matching.

As I have noticed that some groups covers the same topic I wondered if they have the same members.

So I did a quick analysis.

With Master Data Management the largest groups seems to be:

Using the LinkedIn Profile Organizer I found that 907 are members at both groups. This is not as many as I would have guessed.

With Data Quality the largest groups seems to be:

Using the LinkedIn Profile Organizer I found that 189 are members at both groups. This is not as many as I would have guessed despite the renaming of the last group.

As for Data Matching I have founded the Data Matching group. The group has 235 members where:

  • 77 are members in the two large Master Data Management groups also.
  • 80 are members in the two large Data Quality groups also.

Also this is not as many as I would have guessed.

You may find many other similar groups on my LinkedIn profile – among them:

Bookmark and Share

Follow Friday Master Data Hub

Social Networking needs Master Data Management.

brownbird_leftA recurring event every Friday on Twitter is the #FollowFriday with the acronym #FF, where people on Twitter tweets about who to follow.

I do it too and as every one else sometimes I perhaps forget someone, and then (s)he gets angry and don’t #FF me and that’s bad. Bad Data Management. Bad #mdm.

So now I have started building a Master Data Hub fit for the purpose of doing consistent #FF. I do see other purposes for this as well as I recognize the advantages of combining data sources, so I did a #datamatching with LinkedIn connections to improve #dataquality through Identity Resolution.

This is as far I am now (very convenient that WordPress lets me edit my blog posts):

@ReferenceData where is Staff Writer

@KenOConnorData is

@ocdqblog is a blog where is blogger-in-chief

@dataqualitypro is a community founded by

Dylan was a @Datanomic partner where @SteveTuck is

@InitiateSystems has a CTO = @wmmarty who is

@VishAgashe is

@KeithMesser is running @GlobalMktgPros

@fionamacd is at @TrilliumSW as seen here

So is @stevesarsfield being

Trillium is owned by Harte-Hanks where @MarkGoloboy also was

@biknowledgebase is operated by

@Dataexperts has a managing director who is

@IDResolution (Infoglide) has several Data Matching members in including

@rdrijsen is with possible duplicate

@grahamrhind is

@omathurin is

@zzubbuzz is probably

@CharlesBurleigh is

@wesharp is doing @dqchronicle

@decisionstats has an editor being

@jeric40 is my colleague at Omikron as shown here