Forget ‘Blue Sky Thinking’ – Where Data Quality is Concerned it’s All About ‘Life Cycle Thinking’
by Brian Rutherford
Operations Director, Eyecademy
With corporate data growing at over 40% each year, and with up to 25% of that data corrupted, there are teams sitting in meeting rooms up and down the country stating, “We know we have a data quality problem, but we just don’t know what to do about it”.
“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.” – Lewis Carroll, Alice in Wonderland
Where to start? It’s an obvious enough question, but the answer can be more elusive. You know where the issues are. For example, when examining a customer account you discover duplication, balances are misassigned and credit ratings misapplied. The knock-on effects are leading to unsatisfied customers and damage to your reputation. You know Root Cause Analysis is needed, but your data goes through so many stages before it’s actually encountered by the end-users, that getting to the root of the problem looks like it’s going to take an exhaustive investment of both time and budget.
One approach to help bring clarity to the problem is ‘Life Cycle Thinking’. Using a Life Cycle approach refocuses your analysis from the data itself to the data’s journey throughout your organisation, and the departments/functions who interact with that data. Life Cycle Thinking helps analyse and segment activities in such a way that you can identify what’s happening, and at what stage of the life cycle those activities are taking place. The data life cycle has six stages although it’s very likely that only a few functions will encounter them all.
- Plan: Prepare for the data.
- Obtain: Acquire the data.
- Store & Share: Hold the data electronically or in hardcopy and share it through some kind of distribution method.
- Maintain: Ensure the data continues to work properly.
- Apply: Use the data to accomplish your goals.
- Delete: Discard the data that is no longer in use.
A Reusable Resource
Data quality is affected by activities in all the phases of the lifecycle. All stages of the Data Lifecycle have a cost but it’s only when you Apply the data that you get value from it. This means when you Apply bad data you will either have a negative or reduced impact. Not only that – since data is a re-useable resource that negative impact could be applied again and again. Given that information increases in value the more you use it, then the converse is also true. The more you use bad data, the greater the negative effect.
Beginning At The Beginning
The good news is that, by applying Data Life Cycle Thinking, you identify the activities that impact data quality and make the start that has so far eluded you.
For example, imagine you are responsible for the Customer Credit Data in a large financial institution. The head of Credit Decision Monitoring is concerned about the quality of the customer credit data that supports his department. If he were to describe the organisation to you, then you would need to consider the teams involved in each area.
Which teams have input into the planning process for the customer information? Which obtain the data? Who uses or applies the customer credit information? Who maintains the data and who can dispose of it?
At a high level, you might end up with the diagram to the right.
With a quick 10 minute conversation, you can identify that the Electronic Lending Platform (ELP) team and the IT team both Obtain information in various ways and both maintain it. This makes sense as often the data is obtained from customers directly by the ELP team through telephone conversations and occasional face-to-face meetings. Meanwhile, IT Obtain their data through a large customer information Data warehouse. Therefore, to avoid duplicate customer records there needs to be a process for identifying existing customers when applying new customers through the ELP.
Do all of the teams ‘obtaining’ data for your organisation receive the same data entry training? Do they work to the same set of data entry standards?
If the answer is ‘No’, then you have identified a data quality problem – you just don’t know how big it is or what pieces of data are most affected. To use our finance example again, you can see the ELP team Obtains, Maintains and Applies the data, but they are not involved in the Planning stage. This suggests that important requirements are being missed and could be impacting on data quality. The ELP and IT teams are both able to Maintain and Dispose of customer records. Unless there are clear guidelines strictly followed by both teams simultaneously then once again, you likely have identified a second data quality issue.
Are all of the needs of those entering data being met? For example, during a data project for a large financial institution, I discovered one team were able to create customer numbers manually and would often add extra letters at the end of the Customer Number to indicate whether the customer was retail or non-retail. This led to a non-standard customer number and a mountain to climb when they needed to link this record to customer data elsewhere. In our example, the ELP team might be doing something very similar. If so, then we have discovered yet another data quality problem.
Applying Data Life Cycle Thinking
The process of mapping life cycle thinking to the business can be as high level or as detailed as you want. Although we have only used a financial example here, there are numerous areas where the same process can be applied to your business data, such as mapping out where your company has interactions with customers, especially for organisations that deal with customers online and offline. Your business might have multiple communications or touchpoints with your customer, multiple ways to store customer data and various ways of applying it. Some of those communications might also involve re-contacting the customer. Applying the Data Lifecycle method by ‘tagging’ the different interfaces with the customer provides a clear, structured way to identify where your data quality issues might be.
There is no magic bullet, but with data quality management becoming an increasingly important and integral part of business success, Data Life Cycle Thinking provides a clear and consistent methodology to tackle the ever-increasing challenges that the data era poses.
“Begin at the beginning with Data Lifecycle Thinking”.
This article was originally published as a guest blog written by Eyecademy’s Operations Director Brian Rutherford on The Data Lab.
For future news & updates, follow us:
More From Our Blog
We are delighted to announce that we have been approved to supply Cloud Support services through the G-Cloud 11 framework for the 7th consecutive year!read more
We are pleased to announce Eyecademy’s official expansion into Newcastle, North East England, strengthening our provision of data and analytics services!read more
In Part 2 of his series on Free Text Analysis, our Data Consultant Jamie illustrates how R can be used to analyse the positive or negative sentiment of any written text, drawing inspiration from the works of Brontë & Tolstoy.read more