Where GDPR Still Becomes National

EU GDPRThe upcoming application of the EU General Data Protection Regulation (GDPR) is an attempt to harmonize the data protection and privacy regulations across member states in the European Union.

However, there is room for deviance in ongoing national law enforcement. Probably article 87 concerning processing of the national identification number and article 88 dealing with processing in the context of employment is where we will see national peculiarities.

National identification numbers are today used in different ways across the member states. In The Nordics, the use of an all-purpose identification number that covers identification of citizens from cradle to grave in public (tax, health, social security, election and even transit) as well as private (financial, employment, telco …) registrations have been practiced for many years, where more or less unlinked single purpose (tax, social security, health, election …) identification numbers are the norm most places else.

How you treat the employment force and the derived ways of registering them is also a field of major differences within the Union, and we should therefore expect to be observant of national specialties when it comes to mastering the human resource part of the data domains affected by GDPR.

Do you see other fields where GDPR will become national within the Union?

GDPR Data Portability and Master Data Sharing

PortabilityOne of the controversial principles in the upcoming EU GDPR enforcement is the concept of data portability as required in article 20.

In legal lingo data portability means: “Where the data subject has provided the personal data and the processing is based on consent or on a contract, the data subject shall have the right to transmit those personal data and any other information provided by the data subject and retained by an automated processing system, into another one, in an electronic format which is commonly used, without hindrance from the controller from whom the personal data are withdrawn.”

In other words, if you are processing personal data provided by a (prospective) customer or other kind of end user of your products and services, you must be able to hand these data over to your competitor.

I am sure, this is a new way of handling party master data to almost every business. However, sharing master data with your competitor is not new when it comes to product master data as examined in the post Toilet Seats and Data Quality.

Sharing party master data with your competitor will be yet a Sunny Side of GDPR.

The Sunny Side of GDPR

Happy SunDon’t panic about GDPR. Don’t neglect either. Be happy.

Recently Ditte Brix Andersen of Stibo Systems wrote a blog post called Preparing for GDPR – Burden or Opportunity?

As Ditte writes, the core implication of GDPR is: “Up until now, businesses have traditionally ‘owned’ the personal data of their customers, employees and other individuals. But from May 25th, 2018 individuals will be given several new personal data rights, putting the ownership right back in to the hands of each individual”.

I agree with Ditte that the GDPR coming into force can be seen as an opportunity for businesses instead of a burden. Adhering to GDPR will urge you to:

  • Have a clear picture about where you store personal data. This is not bad for business too.
  • Express a common understood idea about why you store personal data. Also very good for business.
  • Know who can access and update personal data. A basic need for risk handling in your business.
  • Document what kind of personal data you handle. Equally makes sense for doing your business.
  • Think through how you obtain consent to handle personal data. Makes your business look smart as well.

In fact, after applying these good habits to personal data you should continue with other kind of party master data and all other kinds of master data. The days of trying to keep your own little secret, even partly to yourself, versions of what seems to be the truth is over. Start working in the open as exemplified in the concept of Master Data Share.

Country of Origin: An Increasingly Complex Data Element

When you buy stuff one of the characteristics you may emphasis on is where the stuff is made: The country of origin.

Buying domestic goods has always been both a political issue and something that in people’s mind may be an extra quality sign. When I lived in The UK I noticed that meat was promoted as British (maybe except from Danish bacon). Now when back in Denmark all meat seems to be best when made in Denmark (maybe except from an Argentinian beef). However, regulations have already affected the made in marking for meat, so you have to state several countries of origins in the product lifecycle.

Luxury shoes
Luxury shoes of multi-cultural origin

For some goods a given country of origin seems to be a quality sign. With luxury goods as fine shoes you can still get away with stating Italy or France as country of origin while most of the work has been made elsewhere as told in this article from The Guardian that Revealed: the Romanian site where Louis Vuitton makes its Italian shoes.

Country of origin is a product data element that you need to handle for regulatory reasons not at least when moving goods across borders. Here it is connected with commodity codes telling what kind of product it is in the custom way of classifying products as examined in the post Five Product Classification Standards.

When working with product data management for products that moves cross border you are increasingly asked to be more specific about the country of origin. For example, if you have a product consisting of several parts, you must specify the country of origin for each part.

Obstacles to Product Information Sharing

In a recent poll on this blog we had this question about how to share product information with trading partners:

As a manufacturer: What is Your Toughest Product Information Sharing Issue?

The result turned out as seen below:

Survey final

Product information flow in supply chains will typically start with that manufacturers shares the detailed hard facts about products to the benefit of downstream partners as examined in the post Using Internal and External Product Information to Win.

This survey points to that the main reason why this does that take place is that manufacturers need to mature in handling and consolidating product information internally, before they are confident in sharing the detailed data elements (in an automated way) with their downstream partners. This subject was elaborated in the post Product Information Sharing Issue No 1: We Need to Mature Internally.

Another obstacle is the lack of a common standard for product information in the business ecosystem where the manufacturer is a part as further examined in the post Product Information Sharing Issue No 2: No Viable Standard.

Issue no 3 is the apparent absence of a good solution for sharing product information with trading partners that suites the whole business ecosystem. I guess it is needless to say to regular readers of this blog that, besides being able to support issue no 1 and issue no 2, that solution is Product Data Lake.

Master Data or Shared Data or Critical Data or What?

What is master data and what is Master Data Management (MDM) is a recurring subject on this blog as well as the question about if we need the term master data and the concept of MDM. Recently I read two interesting articles on this subject.

Andrew White of Gartner wrote the post Don’t You Need to Understand Your Business Information Architecture?

In here, Andrew mentions this segmentation of data:

  • Master data – widely referenced, widely shared across core business processes, defined initially and only from a business perspective
  • Shared application data – less widely but still shared data, between several business systems, that links to master data
  • Local application data – not shared at all outside the boundary of the application in mind, that links to shared application and master data

Teemu Laakso of Kone Corporation has just changed his title from Head of Master Data Management to Head of Data Design and published an article called Master Data Management vs. Data Design?

In here, Teemu asks?

What’s wrong in the MDM angle? Well, it does not make any business process to work and therefore doesn’t create a direct business case. What if we removed the academic borderline between Master Data and other Business Critical data?

The shared sentiment, as I read it, between the two pieces is that you should design your “business information architecture” and the surrounding information governance so that “Data Design Equals Business Design”.

My take is that you must look from one level up to get the full picture. That will be considering how your business information architecture fits into the business ecosystem where your enterprise is a part, and thereby have the same master data, shares the same critical data and then operates your own data that links to the shared critical data and business ecosystem wide master data.

Master Data or

Product Information Sharing Issue No 2: No Viable Standard

A current poll on sharing product information with trading partners running on this blog has this question: As a manufacturer: What is Your Toughest Product Information Sharing Issue?

Some votes in the current standing has gone to this answer:

There is no viable industry standard for our kind of products

Indeed, having a standard that all your trading partners use too, will be Utopia.

This is however not the situation for most participants in supply chains. There are many standards out there, but each applicable for a certain group of products, geography or purpose as explained in the post Five Product Classification Standards.

At Product Data Lake we embrace all these standards. If you use the same standard in the same version as your trading partner, linking and transformation is easy. If you do not, you can use Product Data Lake to link and transform from your way to the way your trading partners handles product information. Learn more at Product Data Lake Documentation and Data Governance.

Attribute Types
The tagging scheme used in Product Data Lake attributes (metadata)

Product Information Sharing Issue No 1: We Need to Mature Internally

A current poll on sharing product information with trading partners running on this blog has this question: As a manufacturer: What is Your Toughest Product Information Sharing Issue?

The most votes in the current standing has gone to this answer:

We must first mature in handling our product information internally

PDL MenuSolving this issue is one of the things we do at Liliendahl.com. Besides being an advisory service in the Master Data Management (MDM) and Product Information Management (PIM) space, we have a developing collaboration with companies providing consultancy, cleansing and, when you come to that step, specialized technology for inhouse MDM and PIM. Take a look at Our Business Ecosystem.

If you are a manufacturer with a limited need for scaling the PIM technology part and already have much of your needs covered by an ERP and/or Product Lifecycle Management (PLM) solution, you may also fulfill your inhouse PIM capabilities and the external sharing needs in one go by joining Product Data Lake.

Using Internal and External Product Information to Win

When working with product information I usually put the data into this five level model:

Five levels

The model is explained in the post Five Product Data Levels.

As a downstream participant in supply chains being a distributor or retailer your success is dependent on if you can do better than other businesses (increasingly including marketplaces) of your kind fighting over the same customer prospects. One weapon in doing that is using product information.

Here you must consider where you should use industry wide available data typically coming from the manufacturer and where you should create your own data.

I usually see that companies tend to use industry wide available data in the blue section below:

Internal and external product information

The white area, the internally created data, is:

  • Level 1: Basic product data with your internal identifiers as well as supplier data that reflects your business model
  • Level 5: Competitive data with your better product stories, your unique up-sell and cross-sell opportunities and your choice of convincing advanced digital assets
  • Level 3 in part: Your product description (perhaps in multiple languages) that is consistent with other products you sell and a product image that could be the one provided by the manufacturer or one you shoot yourself.

Obviously, creating internal product data that works better than your competitor is a way to win.

For the blue area, the externally created data, your way of winning is related to how good you are at on-boarding this data from your upstream trading partners being manufacturers and upstream distributors or how good you are in exploiting available product data pools and industry specific product data portals.

In doing that, connect is better than collect. You can connect by using Product Data Pull.

5 Data Management Mistakes to Avoid during Data Integration Projects

mistake-876597_1920

I am very pleased to welcome today’s guest blogger. Canada based Maira Bay de Souza of Product Data Lake Technologies shares her view on data integration and the mistakes to avoid doing that:

Throughout my 5 years of working with Data Integration, Data Migration and Data Architecture, I’ve noticed some common (but sometimes serious) mistakes related to Data Management and Software Quality Management. I hope that by reading about them you will be able to avoid them in your future Data Integration projects.

 1 Ignoring Data Architecture

Defining the Data Architecture in a Data Integration project is the equivalent of defining the Requirements in a normal (non-data-oriented) software project. A normal software application is (most of the times) defined by its actions and interactions with the user. That’s why, in the first phase of software development (the Requirements Phase), one of the key steps is creating Use-Cases (or User Stories). On the other hand, a Data Integration application is defined by its operations on datasets. Interacting with data structures is at the core of its functionality. Therefore, we need to have a clear picture of what these data structures look like in order to define what operations we will do on them.

 It is widely accepted in normal software development that having well-defined requirements is key to success. The common saying “If you don’t know where you’re going, any road will get you there” also applies for Data Integration applications. When ETL developers don’t have a clear definition of the Data Architecture they’re working with, they will inevitably make assumptions. Those assumptions might not always be the same as the ones you, or worse, your customer made.

(see here and here for more examples on the consequences of not finding software bugs early in the process due to by badly defined requirements)

 Simple but detailed questions like “can this field be null or not?” need to be answered. If the wrong decision is made, it can have serious consequences. Most senior Java programmers like me are well aware of the infamous “Null Pointer Exception“. If you feed a null value to a variable that doesn’t accept null (but you don’t know that that’s the case because you’ve never seen any architecture specification), you will get that error message. Because it is a vague message, it can be time-consuming to debug and find the root cause (especially for junior programmers): you have to open your ETL in the IDE, go to the code view, find the line of code that is causing the problem (sometimes you might even have to run the ETL yourself), then find where that variable is located in the design view of your IDE, add a fix there, test it to make sure it’s working and then deploy it in production again. That also means that normally, this error causes an ETL application to stop functioning altogether (unless there is some sort of error handling). Depending on your domain that can have serious, life-threatening consequences (for example, healthcare or aviation), or lead to major financial losses (for example, e-commerce).

 Knowing the format, boundaries, constraints, relationships and other information about your data is imperative to developing a high quality Data Integration application. Taking the time to define the Data Architecture will prevent a lot of problems down the road.

2 Doing Shallow Data Profiling

Data profiling is another key element to developing good Data Integration applications.

 When doing data profiling, most ETL developers look at the current dataset in front of them, and develop the ETL to clean and process the data in that dataset. But unfortunately that is not enough. It is important to also think about how the dataset might change over time.

 For example, let’s say we find  a customer in our dataset with the postal code in the city field. We then add an instruction in the ETL for when we find that specific customer’s data, to extract the postal code from the city field and put it in the postal code field. That works well for the current dataset. But what if next time we run the ETL another customer has the same problem? (it could be because the postal code field only accepts numbers and now we are starting to have Canadian customers, who have numbers and letters in the postal code, so the user started putting the postal code in the city field)

Not thinking about future datasets means your ETL will only work for the current dataset. However, we all know that data can change over time (as seen in the example above) – and if it is inputted by the user, it can change unpredictably. If you don’t want to be making updates to your ETL every week or month, you need to make it flexible enough to handle changes in the dataset. You should use data profiling not only to analise current data, but also to deduce how it might change over time.

Doing deep data profiling in the beginning of your project means you will spend less time making updates to the Data Cleaning portion of your ETL in the future.

 3 Ignoring Data Governance

 This point goes hand-in-hand with my last one.

 A good software quality professional will always think about the “what if” situations when designing their tests (as opposed to writing tests just to “make sure it works”). In my 9 years of software testing experience, I can’t tell you how many times I asked a requirements analyst “what if the user does/enters [insert strange combination of actions/inputs here]?” and the answer was almost always “the user will never do that“. But the reality is that users are unpredictable, and there have been several times when the user did what they “would never do” with the applications I’ve tested.

The same applies to data being inputted into an ETL. Thinking that “data will never come this way” is similar to saying “the user will never do that“. It’s better to be prepared for unexpected changes in the dataset instead of leaving it to be fixed later on, when the problem has already spread across several different systems and data stores. For example, it’s better to add validation steps to make sure that a postal code is in the right format, instead of making no validation and later finding provinces in the postal code field. Depending on your data structures, how dirty the data is and how widespread the problem is, the cost to clean it can be prohibitive.

This also relates to my first point: a well-defined Data Architecture is the starting point to implementing Data Governance controls.

 When designing a high quality Data Integration application, it’s important to think of what might go wrong, and imagine how data (especially if it’s inputted by a human) might be completely different than you expect. As demonstrated in the example above, designing a robust ETL can save hours of expensive manual data cleaning in the future.

 4 Confusing Agile with Code-And-Fix

 A classic mistake in startups and small software companies (especially those ran by people without a comprehensive education or background in Software Engineering) is rushing into coding and leaving design and documentation behind. That’s why the US Military and CMU created the CMMI: to measure how (dis)organized a software company is, and help them move from amateur to professional software development. However, the compliance requirements for a high maturity organization are impractical for small teams. So things like XP, Agile, Scrum, Lean, etc have been used to make small software teams more organized without getting slowed down by compliance paperwork.

Those techniques, along with iterative development, proved to be great for startups and innovative projects due to their flexibility. However, they can also be a slippery slope, especially if managers don’t understand the importance of things like design and documentation. When the deadlines are hanging over a team’s head, the tendency is always to jump into coding and leave everything else behind. With time, managers start confusing agile and iterative development with code-and-fix.

 Throughout my 16 years of experience in the Software Industry, I have been in teams where Agile development worked very well. But I have also been in teams where it didn’t work well at all – because it was code-and-fix disguised as Agile. Doing things efficiently is not the same as skipping steps.

Unfortunately, in my experience this is no different in ETL development. Because it is such a new and unpopular discipline (as opposed to, for example, web development), there aren’t a lot of software engineering tools and techniques around it. ETL design patterns are still in their infancy, still being researched and perfected in the academic world. So the slippery slope from Agile to code-and-fix is even more tempting.

 What is the solution then? My recommendation is to use the proven, existing software engineering tools and techniques (like design patterns, UML, etc) and adapt them to ETL development. The key here is to do something. The fact that there is a gap in the industry’s body of knowledge is no excuse for skipping requirements, design, or testing, and jumping into “code-and-fix disguised as Agile“. Experiment, adapt and find out which tools, methodologies and techniques (normally used in other types of software development) will work for your ETL projects and teams.

5 Not Paying Down Your Technical Debt

The idea of postponing parts of your to-do list until later because you only have time to complete a portion of them now is not new. But unfortunately, with the popularization of agile methodologies and incremental development, Technical Debt has become an easy way out of running behind schedule or budget (and masking the root cause of the problem which was an unrealistic estimate).

As you might have guessed, I am not the world’s biggest fan of Technical Debt. But I understand that there are time and money constraints in every project. And even the best estimates can sometimes be very far from reality – especially when you’re dealing with a technology that is new for your team. So I am ok with Technical Debt, when it makes sense.

However, some managers seem to think that technical debt is a magic box where we can place all our complex bugs, and somehow they will get less complex with time. Unfortunately, in my experience, what happens is the exact opposite: the longer you owe technical debt (and the more you keep adding to it), the more complex and patchy the application becomes. If you keep developing on top of – or even around – an application that has a complex flaw, it is very likely that you will only increase the complexity of the problem. Even worse, if you keep adding other complex flaws on top of – or again, even around – it, the application becomes exponentially complex. Your developers will want to run away each time they need to maintain it. Pretty soon you end up with a piece of software that looks more like a Frankenstein monster than a clean, cohesive, elegant solution to a real-world problem. It is then only a matter of time (usually very short time) before it stops working altogether and you have no choice but to redesign it from scratch.

This (unfortunately) frequent scenario in software development is already a nightmare in regular (non-data-oriented) software applications. But when you are dealing with Data Integration applications, the impact of dirty data or ever-changing data (especially if it’s inputted by a human), combined with the other 4 Data Management mistakes I mentioned above, can quickly escalate this scenario into a catastrophe of epic proportions.

So how do you prevent that from happening? First of all, you need to have a plan for when you will pay your technical debt (especially if it is a complex bug). The more complex the required change or bug is, the sooner it should be dealt with. If it impacts a lot of other modules in your application or ecosystem, it is also important to pay it off sooner rather than later. Secondly, you need to understand why you had to go into technical debt, so that you can prevent it from happening again. For example, if you had to postpone features because you didn’t get to them, then you need to look at why that happened. Did you under-estimate another feature’s complexity? Did you fail to account for unknown unknowns in your estimate? Did sales or your superior impose an unrealistic estimate on your team? The key is to stop the problem on its tracks and make sure it doesn’t happen again. Technical Debt can be helpful at times, but you need to manage it wisely.

 I hope you learned something from this list, and will try to avoid these 5 Data Management and Software Quality Management mistakes on your next projects. If you need help with Data Management or Software Quality Management, please contact me for a free 15-min consultation.

Maira holds a Bsc in Computer Science, 2 software quality certifications and over 16 years of experience in the Software Industry. Her open-mindedness and adaptability have allowed her to thrive in a multidisciplinary career that includes Software Development, Quality Assurance and Project Management. She has taken senior and consultant roles at Fortune 20 companies (IBM and HP), as well as medium and small businesses. She has spent the last 5 years helping clients manage and develop software for Data Migration, Data Integration, Data Quality and Data Consistency. She is a Product Data Lake Ambassador & Technology Integrator through her startup Product Data Lake Technologies.