New Standards

This morning people in the United States will not wake up to the date being 04/01/2013. Instead the date will be 01/04/2013 as it is in the rest of the world. The days of the mm/dd/yyyy date format are counted.

In a related statement a US government representative writes: What can be standardized must be standardised.

celcius fahrenheitThis is only the first step in a plan for the US to adapt to other more commonly used standards world-wide. The Fahrenheit temperature scale will be changed to Celsius by the 04/01/2014 for degrees below 0 Celsius (formerly 01/04/2014 and 32 degrees Fahrenheit).  When spring comes along at the 01/04/2014 (formerly 04/01/2014) the change will be due also for all warm degrees.

In another move the United Kingdom has released plans for changing from driving in the wrong side of the road to driving in the right side of the road. There will be a phased implementation starting with lorries, then black London Taxis and red double-decker busses and finally all other vehicles.

The phased implementation is explained by a UK government spokesman by saying: We don’t believe in a big bang implementation.

Bookmark and Share

Why You shouldn’t go to the MDM Summit Europe 2013

The weather in London has been awful this March. The forecast for the first week of April doesn’t meet historical standards either. The MDM Summit Europe 2013 will be in London 15th to 17th April. You shouldn’t go there because of the weather based on the trend in the weather forecast:

London Forecast April 2013

On the other hand, it could heat up indoor.

There are quite a lot of exciting sessions, including the ones about:

And hey, it has happened before that the weather has suddenly improved.

Bookmark and Share

Small Data with Big Impact

In an ongoing discussion on LinkedIn there are some good points on: How important is data quality for big data compared to data quality for small data?

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data.

The small data we are talking about here is first and foremost master data.

Master Data Challenges with Big Data

As with traditional transaction data master data is also describing the who, what, where and when of big data.

If we are having issues with completeness, timeliness and uniqueness in our master data any prediction based on big data matched with master data is going to be as chaotic as weather forecasts.

big small dataWe also need to expand the range of entities embraced by our master data management implementations as exemplified in the post Social MDM and Future Competitive Intelligence.

Matching Big Data with Master Data

Some of the issues in matching big data with master data I have stumbled upon are:

  • Who: How do we link the real world entities reflected in our traditional systems of record with the real world entities behind who’s talking in systems of engagement? This question was touched in post Making Sense with Social MDM.
  • What: How do we manage our product hierarchies and product descriptions so they fulfill both (different) internal purposes and external usage? More on this in the post Social PIM.
  • Where: How do we identify a given place? If you think this is easy, why not read the post Where is the Spot?
  • When: Date and time comes in many formats and relating events to the wrong schedule may have us  Going in the Wrong Direction.

How: You may for example follow this blog. Subscription is in the upper right corner 🙂

Bookmark and Share

New LinkedIn Group: Big Data Quality

BigDataQualityDo we need a LinkedIn group for this and that? It’s always a question. There are already a lot of LinkedIn groups for Big Data and a lot of LinkedIn groups for Data Quality.

However I think we do see targeted discussions and engagement in the niche groups on LinkedIn, so therefore I created a new group about the intersection of Big Data and Data Quality yesterday. The group is called Big Data Quality.

It’s good to see a stampede of people joining (well, 39 within first 24 hours) and see discussions and comments starting.

So, if you haven’t joined already, please do so here.

And why not take part in the fun, maybe just by voting on the question: How important is data quality for big data compared to data quality for small data?

Bookmark and Share

Sharing is the Future of MDM

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like the Mydex service mentioned by Dylan.
  • Social network profiles.

instant Single Customer ViewMy guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Bookmark and Share

Social Data vs Sensor Data

Social data sensor data big dataThe two predominant kinds of big data are:

  • Social data and
  • Sensor data

Social data are data born in the social media realm such as facebook likes, linkedin updates, tweets and whatever the data entry we as humans do in the social sphere is called.

Sensor data are data captured by devices of many kinds such as radar, sonar, GPS unit, CCTV Camera, card reader and many more.

There’s a good term called “same same but different” and this term does also in my experience very well describe the two kinds of big data: The social data coming directly from a human hand and the sensor data born by a machine.

Of course there are humans involved with sensor data as well. It is humans who set up the devices and sometimes a human makes a mistake when doing so. Raw sensor data are often manipulated, filtered and censored by humans.

There is indeed data quality issues associated with both kinds of big data, but in slightly different ways. And you surely need to apply master data management (MDM) in order to make some sense of both social data and sensor data as examined in the post Big Data and Multi-Domain Master Data Management.

What is your experience: Is social data and sensor data just big data regardless of source? Is it same same but different? Or are social data and sensor data two separated data worlds just both being big?

Bookmark and Share

Coma, Wetsuit and Dedoop

The sehr geehrte damen und herren at Universität Leipzig (Leipzig University) are doing a lot of research in the data management realm and puts some good efforts in naming the stuff.

Here are some of the inventions:

COMA is a system for flexible Combination Of schema Matching Approaches. Let’s hope the thing is still alive.

WETSUIT (Web EnTity Search and fUsIon Tool) is a new powerful mashup tool – and what a nice seven letter abbreviation not sticking only to the first letters.

Tilia_tomentosaDedoop (Deduplication with Hadoop) is a prototype for entity matching for big data. Big phonetic Dedupe will be around of course.

Well, you should expect fuzzy abbreviations from this city, as Leipzig means “settlement where the linden trees stand”.

Bookmark and Share

Fuzzy Social Identities in the Data Quality Realm

In the past years social networks has emerged as a new source of external reference data for Master Data Management (MDM). But surely, there are challenges with the data quality related to this source.

Let’s look at a few examples from inside the data quality tool vendor space.

Who is head of Informatica in the social sphere?

There is a twitter account owned by Sohaib Abbasi:

Sohaib Abbasi

Informatica is one of the leading data quality tool vendors and the CEO there is Sohaib Abbasi.

So, is this the real world individual behind the twitter handle @sabbasi the head of Informatica?

A social graph should indicate so: There’s a bunch of Informatica accounts and people following the handle (though that’s not worth the trouble as there is no tweets coming from there).

What about the one behind Data Ladder?

Data Ladder is another data quality tool provider, thought with a fraction of revenue compared to Informatica.

In a recent post I stumbled upon a strange situation around this company. In the social sphere the company for the last seven years has been represented by a guy called Simon as seen here on LinkedIn:

Simon aka Nathan

But I have reasons to believe that his real world identity is Nathan as explored in the comments to this post.

Hmmmm….

Data Quality tool vendors: It’s time to get real.

Bookmark and Share

Making sense with Social MDM

A few days ago Jeff Jonas of IBM made a new blog post called Master Data Management (MDM) vs. Sensemaking.

iDQ microscopeHerein Jeff Jonas ponders the differences in the data matching algorithms we use in traditional MDM, predominately name and address matching, and the kind of identity resolution we need when we for example try to listen to and make sense of the signals in the social media data streams.

Jeff Jonas says: “Different missions, different tools.  Some organizations will use one or the other; most organizations will want both.”  

I tend to disagree slightly with Jeff Jonas. As told in the post The New Year in Identity Resolution I think we will need a connection between the old systems of record and the new systems of engagement.

Indeed the algorithms will be used differently and indeed we need different thresholds of confidence for different tasks. But I think we will have to make the integration story a bit more complicated in order to make sensible decisions across the two missions.

Bookmark and Share

Data Management in the Cloud

We are seeing more and more data management services offered in the cloud.

dnblogo2As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

  • Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
  • Data enrichment embracing a wide range of data attributes
  • Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

Bookmark and Share