This is the first blog of a two-part series. In this blog, IDinsight’s Data on Demand team discusses how it constructed a unified, composite data quality index that can be used to assess surveyor performance. Part 2 of this series will discuss how the team used the composite data quality index to create an incentive system for surveyors to encourage improved performance. We hope that the steps described in this blog are useful to other practitioners involved in data collection.
Photo credits: Markus Spiske on Unsplash
The Data on Demand (DoD) team has made significant innovations and investments in its data quality management systems to holistically address different sources of error that could arise during data collection.
The DoD team monitors data quality at each stage of the data collection process. Before a survey launches, we carefully code our survey forms to minimize illogical or unfeasible responses. Additionally, we train surveyors on protocols to ensure that questions are asked and recorded properly. During data collection, we have a dedicated team of monitors to conduct back checks, spot checks, and audio audits while our team runs basic high-frequency checks on the data (see figure 1). Finally, after data collection, we account for data inconsistencies (ex: replacing values above the 95th percentile to be the 95th percentile’s value).
In order to be precise, the daily outputs of our data quality system measure flags for each question. These checks are more actionable as they provide insights into how surveyors can improve their data quality specifically. However, it is difficult to interpret such a variety of data points and understand a given surveyor’s overall performance.
To that end, the DoD team constructed a unified data quality index to quantify surveyor performance during data collection.
The benefits of constructing an index are three-fold:
The aforementioned checks yield 10 data quality indicators. A more detailed description of each of these indicators can be found in the table below:
With 10 data quality indicators, the main challenge we foresaw was that not each indicator would be as important when it comes to surveyor-level data quality. As a result, we created weights for each component using a mix of data-driven strategies and subjective preferences. We decided to proceed with a mixed approach because:
We discuss the five steps we took to create the data quality index below.
We discuss the five steps we took to create the data quality index below.
We first wanted to align internally on which indicators would be the most important to each person on our team (who all had experience with data quality). We employed the budget allocation process method in which different “experts” independently distributed a total of 30 points to different indicators. 3 4 Then, we revealed our preferences to each other and had a team discussion to align on the importance of different indicators.
This discussion revealed that our preferences were based on buckets; indeed, we did not have very strong preferences for the indicators within each bucket. Our teammates largely believed in a hierarchy in which audio audits and spot checks should weigh the most, then back checks, and finally, high frequency checks. This was largely driven by the fact that spot checks and audio audits help us track surveyor-driven errors. In addition to this, audio audits give us a more objective mismatch calculation than back checks because back checks invite the possibility of respondents changing answers when resurveyed. 5 Finally, high frequency checks were weighed the least because they were more reflective of questionnaire framing rather than surveyor performance. We kept this in mind as we continued our approach.
To generate data-driven weights, we used data for the above-defined data quality checks from a previous round of data collection involving 480 surveyors. We compiled data from spot checks, back checks, audio audits, and high-frequency checks for each surveyor for each question flagged for checks. To calculate mismatches, we compared the data inputted by monitors from back checks and audio audits against the main survey data inputted by surveyors by matching on the unique survey unit identifier. We calculated the protocol violations for each question from the spot check and audio audit data. For spot check scores, we calculated question-level averages at the surveyor level. Finally, for the high frequency checks, we collapsed checks at the surveyor level.
We then used these question-level checks to generate our data quality indicators at a surveyor level. For the proportion-based indicators, we added the number of violations across questions and divided them by total questions to generate unified proportions. For the spot check score, we took an average of all the scores received by a surveyor. The final dataset we produced contained all indicators at a surveyor level.
Next, we built correlation matrices at two levels – buckets and indicators (as defined in the table above).
The correlation matrix of data quality buckets was used to derive data-driven weights at a high level. We took an approach called inverse covariance weighting (ICW), in which we produced a correlation matrix of the buckets, inverted the values, summed the row entries for all buckets, and finally, scaled up each sum by a common multiplier to arrive at the final bucket weights.6 For example, if one row in the inverse correlation matrix for a bucket added up to 1.34, we scaled them up by a multiplier of 11.5 to arrive at a weight of 15.
The major finding from the second correlation matrix at the indicator level was that the spot check overall score was highly correlated with the spot check speed score, probing score, comfort score, and protocol score. As a result, we ended up dropping the four granular scores to minimize double penalties to surveyors and used the overall score in our final index.
Now that we had both the subjective and the data-driven weights, the team got together to brainstorm different weighting options.
Ultimately we used the inverse covariance weighting method described in Step 3 to derive the bucket weights. We noticed high frequency checks were weighed higher than in-person and phone back checks through this method, but made a decision to down-weigh them as the team had unanimously agreed that they should be weighed the least. We up-weighed in-person back checks and phone back checks (both of which were weighed equally to start with) because we agreed that both are important measures of surveyor performance, not far behind spot checks and audio audits. Within each bucket, we followed a similar approach to assign weights that would add up to the overall bucket weight. The audio audit mismatch rate was weighed higher than the audio audit protocol violation, and the outlier and logic checks were weighed higher than don’t know and refusal checks. The graphic below summarizes the weights we assigned to each indicator (in yellow) and bucket (light blue).
Before applying the indices to the surveyor level data-set we had compiled, there were two issues that we needed to tackle.
After deriving the index, we calculated what the data quality index score would be for each of our surveyors and analyzed the distribution. On average, surveyors have a mean data quality score of 80.44% and a median of 80.81%. Scores ranged from 59.74% to 92.12%.
We believe that our data quality index score will be a simple and helpful way to assess a surveyor’s data quality. The index takes into account a suite of data quality checks done in each survey and weighs some checks above others according to their importance. We plan on using this score to track surveyor data quality performance over time and create bonus structures to incentivize better performance.
10 November 2020
15 July 2019
7 July 2021
22 October 2020