External reference data are going to play an increasing role in data quality improvement and a recent trend around the world helps a lot: Governments are unlocking their data stores.
Some available initiatives in English are the US data.gov and the UK “show us a better way”.
Today I attended a “Workshop on the use of public data in the private sector” arranged by the Danish National IT and Telecom Agency as part of the similar initiative in my home country.
The initiatives around the world are a bit different in focus areas and on which data to be released depending on the administrative traditions and local privacy policies.
As an organisation you may integrate with such public reference data either directly or through services from private vendors who add value by reformatting, merging, enriching and bundling with other services. One add on service on the international scene will be supplying consistency – as far as possible – between the datasets from each country.
One way or the other public reference data will become a part of the data architecture in most organisations. Applications in the cloud will probably be (actually are) first movers in this field.
Public reference data will bring operational databases and data warehouses closer to that “one version of the truth” that we talk so much about but have so much trouble achieving and even define. Now some of the trouble can be solved by: Government says so.
Hi, I found my way here via Christian Lanng of ITST.dk who pointed to http://www.version2.dk/artikel/11455-slip-de-offentlige-data-fri where you commented. I don’t speak danish but my educated guess is you pointed out that a lot of information needs to be available in a way that interest from outside Denmark can be satisfied as well. I quite agree. There are a lot of initiatives and examples around opening up government data for free re-use by the public, but a lot of those initiatives are invisible to eachother because of language differences.
As a spin-off of my work with the Dutch government around open gov data, I am planning to start building a list of pointers (to initiatives, to data sources, to examples of reuse of data) for European countries, so that at least people know what is there. Currently finding things is the hard part. We will most likely host the list of pointers at http://ourdata.eu (but there’s nothing on-line yet. Will try and do that in the weekend)
I just posted about demographic vs. firmagraphic data – http://bit.ly/VKxdJ – and much of that is based on public information already available from governments. The two clear examples are address tables provided by government postal agencies and industry codes, but it goes beyond that depending on the country. The problem is getting the data in a usable format. I’ve found that the best firmagraphic data providers use government chamber of commerce information, e.g., Company House for UK or KvK for Netherlands (agreeing with previous comment), as their base and then build up from there.
For more, check out http://www.markgoloboy.com or find me on twitter http://www.twitter.com/markgoloboy.
Thanks Ton and Mark for your comments.
@Ton: Yes, I did suggest that the Danish initiative also should be documented in English (lingua franca today), like the Dutch and other initiatives where English isn’t an official language. Fortunately my suggestion has already been adapted as a possibility: http://digitaliser.dk/forum/374605 (in Danish).
@Mark: It’s very true that possibilities with public reference data already exists – and I have also worked a lot with those. I also agree with you that the quality are better when you directly or indirectly use the local government source.