Data Quality – What It Is & Why It Matters For You
In a perfect world, your data would be well organised with no missing values, duplicates, errors, or out-of-date information. Metadata would always be complete, and you would never have trouble finding the information you need.
Unfortunately, we don’t live in a perfect world with ever-pristine data sets. Instead, data is collected and entered by real people using real systems, meaning that messy data sets are the norm for most businesses.
As a data professional, you know this is a serious problem. Good data is critical for your organisation’s success, and the cost of bad data is high. The vast majority of business leaders agree with you, but the problem of poor data quality persists.
Part of the problem is that ‘data quality’ isn’t well understood. Everyone knows bad data is generally bad for business, but they don’t have a deep understanding of the concept of data quality, the tangible impacts of poor data quality on your business, or how to effectively assess and manage data quality.
To help remedy the situation and give you the tools you need to improve the quality of your data, we’ve dedicated this page to this important topic.
What is data quality?
The first question we need to address is what is data quality? It’s a simple question, but finding a single answer is surprisingly difficult. The definition of data quality depends on your perspective. It’s different for data managers, data providers, data scientists, and data users, but there are some common guiding principles.
At its core, data quality is a comparison of the current state of your data vs. the desired state based on user expectations, usage requirements, and defined quality standards. When data quality is good, your data is fit for its intended purpose and conforms to the standards you’ve set. When the quality of your data is poor, it’s out of compliance with established standards and not fit to be used in operational and decision-making processes.
The basic idea of data quality has been around for a long time, but the discipline as it’s practised now was inspired by total quality management (TQM). TQM became popular in the 1970s and 80s as a way for manufacturers to improve product quality by applying a standardised and comprehensive approach to quality control. Companies that embraced TQM manufactured higher quality products and experienced huge increases in sales and market share.
Motivated by the success of TQM in improving product quality, many organisations sought to apply a similar systemic approach to data quality. The idea quickly became popular, and today maintaining data quality is widely recognised as essential for digital transformation and future growth.
Understanding the importance of data quality
Your data is more than a collection of useful numbers and letters — it’s a foundational enabler of both your current processes and the decisions that will lead you to future growth. As such, the importance of data quality cannot be overstated.
To get value from your data, you must be able to use it confidently as a basis for your critical processes and key decisions. Think of it like this: good results come from good decisions based on high-quality information. High-quality information is simply good data integrated and evaluated in the context of your business goals and processes.
In other words, good results for your business start with good data, which is an excellent reason to make data quality one of your highest priorities.
The cost of poor data quality
As we mentioned, good data quality supports smart decisions and leads to improved returns for your business, but what about the cost of poor data quality? On a large scale, IBM estimates that bad data costs the US economy $3.1 trillion every year. The cost of poor data quality is high for individual companies, too, and impacts your business at every level.
First and most importantly, bad data leads to bad decisions. In this way, having a lot of poor-quality data is worse than having no data at all. Using incomplete, inaccurate, or misleading data for critical analyses leads to incorrect conclusions, which leads to wasting time and money pursuing projects doomed to fail.
Another cost of poor quality data is employee frustration and productivity loss. When the data your team relies on is plagued by quality issues, frustrated employees end up spending significant time on manual data verification. If the issues are bad enough, business teams may even opt to invest in collecting new data rather than risk making a bad decision based on the existing data set. If data quality doesn’t improve, this costly cycle of lost productivity and data re-collection repeats indefinitely.
Poor data quality affects your customers, too. Basing your marketing efforts on duplicate or inaccurate data leads to off-target outreach and a frustrating customer experience.
Last but not least, poor data quality has a huge opportunity cost. Without a foundation of high-quality data, you can’t succeed at digital transformation or implement AI or machine learning technology to automate processes. Bad data makes it nearly impossible for you to quickly identify and act on opportunities for improvement and growth as a company.
Common data quality management tools & techniques
The “what” and the “why” of data quality are relatively straightforward, but the “how” is more complex. Good data quality management strategies are difficult to implement and maintain. We’ll use this section to look at some common data quality management tools & principles before briefly introducing the approach we use here at Anmut.
Measuring data quality
Data quality management starts with measuring data quality. There are many ways to measure data quality, some quantitative and others more qualitative. The right approach for you depends on your use case. Common dimensions used to assess and measure data quality include:
- Completeness to check for missing records
- Timeliness to verify data is current
- Uniqueness to avoid duplication
- Consistency across multiple sources or storage locations
- Relevance for the specific use case
- Precision of location or decimal values
- Accuracy to ensure data is correct
- Integrity of data relationships and attributes
- Validity of format, type, range, etc.
Data quality frameworks
One approach to data quality management is adopting a data quality framework. The two most common are the total information quality management (TIQM) and total data quality management (TDQM) frameworks.
TDQM was developed based on the practical experiences of manufacturers with using TQM to improve product quality, while TIQM is a research-based approach to data and information quality.
While the specifics differ, both frameworks are intended to help your organisation define your data quality requirements, implement data quality metrics, discover the root cause of your quality issues, and take action to correct the issues and improve your data quality.
Data quality clean-up solutions
Another common data quality management strategy is to turn to technology for help. There are several tools available to help you understand your data assets, measure the quality against your defined standards, and clean up your data issues. The most popular data profiling and clean-up tools include:
- IBM InfoSphere Quality Stage
- Informatica Quality Data & Master Data Management
- Talend Data Quality
- Data Ladder
Each tool has unique strengths and weaknesses, but they all include capabilities such as deep data profiling, AI-enabled quality problem detection, and rule-based data handling so you can cleanse your data set and correct your data quality issues.
A more complete approach: Data quality through the lens of data asset management.
Data profiling and cleansing tools are useful for correcting issues, and frameworks can help put data quality in perspective, but we don’t recommend either as the basis for your data quality management strategy.
Why?
Because cleaning up your data only fixes symptoms and doesn’t address root-cause issues or correct foundational issues to maintain data quality for the long term.
As for the frameworks, TDQM works well for manufacturing businesses, but it’s difficult to apply across industries and business models. The research-based TIQM is seldom successful when applied to real-life organisations.
So what approach do we recommend for effective data quality management?
At Anmut, we’ve seen the value of managing data quality through the lens of a broader data asset management strategy. Data asset management is a value-based approach to data focused on helping you understand and evaluate your data in the context of your core business goals and requirements.
Rather than approaching data quality management in isolation, our results prove you get more value from your data by including it alongside data governance and data management as part of a holistic data asset management strategy. When you do, your focus shifts from the technical quality of the data to the overall data condition.
This shift enables you to consider the fitness of your data for specific use cases, identify the root causes of your quality issues, and determine the value improving your data quality will bring to your business.
The impact of pivoting from technical data quality management to focusing on data condition as part of a broader data asset management strategy is tremendous. We’ve seen client after client who adopt this approach add real value and build strong foundations for growth and success.
We never get tired of seeing it or of helping companies like yours get started. Contact us to find out more about what we do and how we can help you reach and exceed your data quality goals.
Resources
A Guide To Data Valuation
A Guide To Data Asset Management
A Guide To Data Strategy
A Guide To Data Monetization
A Guide To Data Maturity
A Guide To Data Governance
A Guide To Data Culture
A Guide To Data Condition
Insights
Understanding data as an asset
The benefits of treating data as an asset
The benefits of doing a data maturity assessment
Data maturity models – measuring the health of your data
Different data valuation methodologies
Measuring the value of intangible assets
Data landscape – navigating the data jungle
Data quality vs data condition: the power of context