Improving data quality can provide measurable benefits to an organization’s bottom line. A 2011 report by Gartner, for example, noted that as much as 40% of the anticipated value of all business initiatives is never achieved, and that poor data quality is one of the chief culprits (personally, I suspect that this number is actually on the low end). In addition, according to data my company collected, organizations that are experiencing data quality issues can take a hit in operational efficiency as high as 33%.
Conducting a rigorous data quality assessment can represent a significant effort in terms of time and resources. Fortunately, starting small in profiling data quality is actually a smart way to go. I generally advise clients to start with a focus on a very narrow data set — say, a certain line of business, or a subset of customer data. Another smart tactic is, before you commit to any particular data profiling solution, ask vendors if you can use their tool for a few months while you test it on your small data set.
Once you get started with the data quality assessment process, you’ll need to take five essential steps before you see significant results. Here are the five essential steps, in chronological order.
Choose a data profiling tool.Start out by establishing an enterprise data profiling solution, or selecting a data profiling tool. There are several choices on the market, including Informatica Data Quality, as well as products from IBM, Oracle, SAP and others. The price tags on any of these products can be steep, which is another good reason to take one or more for a ‘test drive’ before you buy it.
Standardize your assessment approach. Establish a standard analytical method for conducting data quality assessments. Then, once you’ve profiled the data, your data analysts can take the profile outputs and look for patterns and anomalies in a common, normalized way.
Create a findings repository. Establish a standard communication forum and mechanism, allowing your analysts to share their findings with non-analytical people (including people from your business units, technologists, architects, and other stakeholders across your data community). It’s very important to have a normalized mechanism that allows analysts to post their findings, and that then facilitates review by non-technical audiences.
(As a side note, one of the deliverables my company usually provides at this stage, while we’re building the client’s Office of Data — is what you might call internal marketing. Whatever templates you create to help your data analysts communicate their findings, we’ve found that it pays to have the templates reflect your internal branding. This raw data can be intimidating to non-technical audiences, and we’ve found that improving the corporate style of these documents makes people more likely to spend time understanding and commenting on them.)
Set up a centralized place for feedback. It’s important to gather what your data community says about any data anomalies you find — and store that feedback in a central location. The benefit of doing so is that it’s highly likely that the same anomaly will appear again somewhere else, and it’s incredibly helpful to know what was said and done about it in previous instances. In addition to capturing the data requirements and insights that people give you, you also need to be able to record any quality rules or algorithms that were built as a result of each requirement. This allows you to track in one centralized location each anomaly that was found, the requirements and insights provided by business users as to why the anomaly existed, and the rules or outcomes that resulted.
Create a common measurement system. The final step is to establish a way to record all the rules, logic and algorithms into a normalized analytical model or framework, showing the interrelationships of the various elements, as well as how each feeds into your aggregation model. Having this shared measurement system allows you to gain a broad, empirical and quantitative view of the health of your data. More importantly, your data quality assessment team can drill down into these measurements and see exactly which components are driving the overall score (comprised of the five critical quality metrics of completeness, conformity, validity, integrity and accuracy).
By the way, in too many instances, I’ve seen companies take only the first of the steps described above — acquiring the data profiling tool — and then stop the process there, for whatever reason. To truly get the results you’re seeking, you need to see the process through to the end.