Do we need a LinkedIn group for this and that? It’s always a question. There are already a lot of LinkedIn groups for Big Data and a lot of LinkedIn groups for Data Quality.
However I think we do see targeted discussions and engagement in the niche groups on LinkedIn, so therefore I created a new group about the intersection of Big Data and Data Quality yesterday. The group is called Big Data Quality.
It’s good to see a stampede of people joining (well, 39 within first 24 hours) and see discussions and comments starting.
So, if you haven’t joined already, please do so here.
And why not take part in the fun, maybe just by voting on the question: How important is data quality for big data compared to data quality for small data?
I think having a specific group to focus on the intersection of data quality and big data is a fine idea. Doing this by helping customers gain value from the unknown (big data) by building on the known (data quality) IMO can do nothing but good.
I’m not convinced that “more is better” in the world of data. Although there maybe various data quality best practices that are specifically applicable to big data, I don’t think the industry has sufficiently adopted data quality best practices for “small” data and now we will be confusing them with big data quality?
Of course this will lead to the debate of what constitutes big data and we will be off discussing the esoteric of the term big data rather than trying to help organizations to establish fundamental data quality best practices.
I believe that for any discipline there are 7 fundamental best practices that MUST be adopted before an organization is qualified or mature enough to progress from Level 1 to Level 2. In data quality (small or big) these are:
1. Does the organization measure quality of products, services, projects? Data quality dimensions and thresholds
2. Does the organization record non-conformance to these quality measures? Data error incident management
3. Does the organization reward people for preventing errors? Pay for prevention
4. Does the organization have standards? Naming, definitions, metadata, business glossary, repositories, business process notation
5. Does the organization define use cases for data? Data uses cases embedded in SDLC processes
6. Does the organization train all the staff in the use of metadata principles? Semantics, ontology, taxonomy, ISO 11179
7. Does the organization apply the above practices enterprise wide? Data Governance
Unless the organization has adopted and become proficient in the best practices above, calling it big data or small data is irrelevant. It’s the basics that count most.
In my opinion the term big data is for marketing and entertainment. If that is the purpose of creating yet another channel I guess that’s business. But to me its like having hundreds of cable channels yet nothing to watch. A distraction from the real basics of data quality.
Thanks for commenting Dave and Richard. Indeed there are good, bad and ugly things around big data.