Big Data Quality and Open Government Data

ChristiansborgYesterday I participated in an information meeting at the Danish Ministry for Business and Growth related to an initiative around using open government data within business intelligence in the private sector.

Using open government data is already an essential part of the instant Data Quality concept I’m working with right now and I have earlier written about the state of open government data in Denmark in the posts Government Says So and Making Data Quality Gangnam Style.

At the meeting some well-known questions came up:

Is this big data?

The answer was, that it isn’t exactly big data mainly because the data are well structured and thereby looks more as the traditional data sources that we have been used to working with for many years.

Personally I, if we have to use the big word, like to see these data as big reference data as told in the post Four Flavors of Big Reference Data.

What about data quality?

The answer here was a hope about that the fact that these data was made open for the private sector will create some data quality feedback resulting in that the public sector would improve quality of the data to the benefit of both public sector and private sector data consumers.

Bookmark and Share

Please Retweet

Many moons ago I wondered how my social influence is measured as told in the post Klout Data Quality.

Since then my Klout has dropped a bit from 59 to 57. It does not ruin my day, but I wonder why. A thing that strikes me is from where I get my Klout. It seems Twitter is the place as it counts for 73 % of my Klout. LinkedIn is only 8 %. Personally, I would give them opposite importance.

Klout Network Breakdown

Recently I noticed I was included in a list called Top 200 Thought Leaders in Bigdata Analytics. Honorable maybe. However, I am afraid it merely is a count of how many #Bigdata tags I have used on Twitter relative to others.

What matters to me in social influence seems to be out of scope for Klout, as it is readers and comments on this blog.

What about you. Do you have the right Klout? Is it measured the right way?

Bookmark and Share

Multi-Domain MDM Uptake

Within Master Data Management (MDM) doing multi-domain MDM has been trending for a couple of years. Yesterday Gartner (the analyst firm) had a chat session on twitter preceding the upcoming Gartner MDM summits around the world.

Along the way @BillOKane of @Gartner_inc revealed some numbers about multi-domain MDM from the Gartner camp:

Multi-Domain 1

Multi-Domain 2

So, stating these numbers using the MoSCoW method we have that among companies considering MDM:

  • 3 % sees multi-domain MDM as a MUST have now
  • 10 % thinks they SHOULD have multiple-domain MDM now
  • 17 % regards multi-domain MDM as something they COULD have now
  • 70 % WONT have multi-domain MDM now

Bookmark and Share

Sharing Big Location Reference Data

In the post Location Data Quality for MDM the different ways of handling location master data within many companies was examined.

A typical “as is” picture could be this:

Location1

Location data are handled for different purposes using different kinds of systems. Customer data may be data quality checked by using address validation tools and services, which also serves as prerequisite for better utilization of these data in a Geographical Information System (GIS) and in using internal customer master data in marketing research for example by utilizing demographic classifications for current and prospective customers.

Often additional external location data are used for enrichment and for supplementing internal master data downstream in these specialized systems. It may very well be that the external location reference data used at different points does not agree in terms of precision, timeliness, conformity and other data quality dimensions.

A desired “to be” picture could be this:

Location2

In this set-up everything that can be shared across different purposes are kept as common (big) reference data and/or are accessible within a data-as-a-service environment maintained by third party data providers.

Bookmark and Share

Unique Data = Big Money

In a recent tweet Ted Friedman of Gartner (the analyst firm) said:

ted on reference data

I think he is right.

Duplicates has always been pain number one in most places when it comes to the cost of poor data quality.

Though I have been in the data matching business for many years and been fighting duplicates with dedupliaction tools in numerous battles the war doesn’t seem to be won by using deduplication tools alone as told in the post Somehow Deduplication Won’t Stick.

Eventually deduplication always comes down to entity resolution when you have to decide which results are true positives, which results are useless false positives and wonder how many false negatives you didn’t catch, which means how much money you didn’t have in return of your deduplication investment.

Bringing in new and be that obscure reference sources is in my eyes a very good idea as examined in the post The Good, Better and Best Way of Avoiding Duplicates.

Bookmark and Share

Who needs a data governance tool?

Recently Sunil Soares has released a Research Report being An In-Depth Review of Data Governance Software Tools. Link to the place to download the complimentary report is here.

The report examines what a data governance software tool should do and mentions a range of tools from vendors stretching from:

  • A pure play data governance tool vendor as Collibra
  • A one-stop-shopping vendor within data management as Informatica
  • A none-stop-shopping vendor within everything IT as IBM

MDMDG 2013 wordleAs touched in the latest post on this blog, how far a tool should go in covering additional disciplines related to the core discipline is an ever-recurring question. Data governance should for example definitely be a part of a Master Data Management (MDM) programme, here using the British English way of spelling programme versus program to emphasise what MDM should be. As data governance is very much about people and processes and not so much about technology, do you need a tool at all? If you do, do you need a separate best-of-breed tool for the data governance part or will it be preferable to have it as an integrated part of the MDM solution?

Bookmark and Share

So, we have four and a half multi-domain MDM vendors

It has been discussed a few times on this blog if Gartner should make a single (multi-domain) Master Data Management quadrant latest in the post MDM for Product Data Quadrant: No challengers. A half visionary.

Well, if Gartner will not Forrester will, as Forrester has just released their MDM wave focusing on multi-platform platforms for MDM. Yep, there is nothing like analyst firms insisting in using their wording as multi-entity, multi-domain or multi-platform MDM within their special visualization as quadrant, landscape, wave and other bulls…. eye stuff to blur things a bit.

If you are a Forrester client or otherwise like to pay money to Forrester, you can get the report here. If you would like to feed one of your eMail addresses into the Informatica marketing machine, you can get the report here.

MDM Brands
This is not the wave. Just some names.

There are only four and half multi-platform MDM vendors in the universe according to Forrester. Three and a half of them are no surprise. Maybe Talend is. I guess one or two more of the Trois acteurs français dans le marché du MDM would like to be there as well.

Bookmark and Share

Where is the Asset?

The pevious post on this blog was called A Master Data Mind Map. In there I tried to map some different examples of master data entities within some known master data domains as party, product, location, financial and calendar.

In the comments and in a following twitter conversation, the placement of the asset entity was questioned.

Asset Master Data

Prash Chan (@MDMGeek) said: “There is another type of asset I came across once, mapped to location domain, largely because the assets were identified by the place where they were installed.”

Jamie Watters (@jamietwatters) said: “Don’t disagree, but think there are other areas of asset data that makes it unique instead of using product or location.  Areas such as engineering, design, construction, maintenance, commissionin make managing asset data pretty unique.”

Indeed, I am actually also on the fence here.

There are certainly often strong relations to a location for an asset. But that is also true for parties as well and some products too as told in the post Product Placement.

Assets can be many things. Gartner (the analyst firm) has the concept of the things domain encompassing products and assets.

Asset may in fact just be its own domain. Property too perhaps as the Real Estate Domain. Services could claim independence from the other products someday. Customer is today often seen as its own domain not belonging to party as examined in the post MDM for Customer Data Quadrant…

Andrew White of Gartner did address these naming issues in a blog post back in 2009 here.

Bookmark and Share

A Master Data Mind Map

A challenge within many disciplines is easily to explain what the discipline is about and that certainly is true for Master Data Management (MDM) too as we often have the question: What is master data?

A good short explanation is:

“The description of the who, what and where in transaction data”.

It could also, with help from Wikipedia, be:

“Information that is key to the operation of a business”.

From Gartner (the analyst firm) we have:

“The consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise”.

The latter one I would not try on friends and relatives though.

Examples are often a good way to go. Visualization is great too. So, therefore I have played with a mind map of what master data entities may be:

Master Data

Bookmark and Share

From B2B and B2C to H2H

I stumbled upon an article from yesterday by Bryan Kramer called There is no more B2B or B2C: It’s Human to Human, H2H.

H2H

The article is about the implications for marketing caused by the rise of social media which now finally seems to eliminate what we have known as business-to-business (B2B) and more or less merges B2B and business-to-consumer (B2C).

As discussed here on the blog several times starting way back in 2009 in the post Echoes in the Database a problem with B2B indeed is that while business transactions takes place between legal entities a lot of business processes are done between employees related to the selling and buying entities. You may call that employee-to-employee (E2E), people-to-people (P2P) or indeed human-to-human (H2H).

Related to databases, data quality and Master Data Management (MDM) this means we need real world alignment with two kinds of parties:

While B2B and B2C may melt together in the way we do messaging the distinction between B2B and B2C will be there in many other aspects. Even in social media we see it as for example two of the most used social networks being FaceBook and LinkedIn clearly belongs mainly to B2C and B2B respectively for marketing and social selling purposes.

The different possibilities with B2B and B2C in the H2H world was touched in an interview on DataQualityPro last year: What are the Benefits of Social MDM?

Bookmark and Share