Skip to content
Blog

Smarter quality at lower cost: A pragmatic framework for data quality optimization in data collection

Akashmegh Sharma 13 June 2025

IDinsight teammates conducting fieldwork in the Philippines. ©IDinsight/Jilson Tiu

Why data quality must be both rigorous and cost-efficient

High-quality data is essential for making good decisions. Whether it is governments designing social protection programs or donors evaluating education investments, representative and accurate survey data ensures that data-informed decisions lead to resources reaching the right people and that programs achieve their intended impact.

But survey budgets are shrinking and ensuring data quality in large-scale surveys can be expensive. With reductions in official development assistance, and philanthropic funding stretched thinner than ever, there is a growing imperative to innovate within constraints. Traditional quality assurance activities like spot checks1 and backchecks2 require significant staff time, field travel, and management oversight. At a time when funding across the development sector is under pressure, these costs can quickly become unsustainable.

In today’s context, organizations must strike a fine balance between maintaining high quality standards and optimizing resources.  At the same time, advances in data systems and automation offer a real opportunity to rethink how we manage quality. We asked a simple question: Can we reduce costs without compromising data integrity?

Our answer, based on detailed analysis and real project experience, is yes.

Our approach: Testing assumptions with data and field experience

The DataDelta team conducted a structured evaluation of data quality processes across four large-scale surveys. These projects represented a mix of sectors, geographies, and implementation styles.

We used a combination of:

  • Quantitative analysis to compare how different types of checks performed
  • Overlap analysis to identify when multiple checks flagged the same issues or surveyors
  • Cost analysis to assess how much each quality check added to the overall budget
  • Interviews with project teams, field managers, and associates to understand operational challenges of implementing data quality checks

We also drew on academic and sector best practices, including protocols from organizations like J-PAL and IPA, to benchmark our assumptions.

What we found: Redundancy, overreach, and untapped tools

1. High-frequency checks3 (HFCs) are powerful but underused

Automated HFCs flagged 65 percent of the surveyors who were later identified as low-performing (posing risks to data quality) by more expensive field-based checks, like spot checks and audio audits. In other words, HFCs alone captured the majority of poor performers early in the data collection process. Despite this, these insights were rarely used to adjust field-based quality efforts.

HFCs offer a high return on investment. They are cheap to run, update daily, and scale easily across survey teams. Yet in many projects, they were treated as just one of many checks, rather than the foundation for targeting other quality assurance activities.

2. Redundancy across checks leads to wasted effort

There was significant overlap between the different types of quality checks. For instance, in one project, 64 percent of poor-performing surveyors identified through audio audits had already been flagged by HFCs. 58 percent of them repeated in spot checks and 56 percent in backchecks. This suggests that field-based checks often confirmed what automated systems had already detected.

Yet field teams were spending large amounts of time and budget conducting spot checks and back-checks uniformly, rather than focusing them where they were most needed.

3. Most checks cover too many survey questions

Many quality checks were designed to review a large number of questions per survey. But we found that cutting this number by 50 percent still retained 90 to 98 percent of the meaningful insights. Most of the benefit came from a small number of well-chosen, high-risk questions. Reviewing more questions added little value but significantly increased review time and costs.

4. Spot checks are the most expensive and least efficient

Spot checks required travel, coordination with field managers, and additional staffing. They were the most expensive type of data quality check and often added limited new insights, especially when used in combination with HFCs and audio audits. In contrast, audio audits could be conducted remotely, reviewed more quickly, and yielded detailed information about how surveys were conducted.

Our framework: Three steps to smarter data quality

Based on these findings, we developed a cost-effective, scalable framework for data quality:

Step 1: Use HFCs to target field-based checks

Run automated high-frequency checks from the start of data collection. Use early results to identify the bottom-performing 50 percent of surveyors. Focus all field-based checks on this group while reducing checks on consistently strong performers.

Step 2: Make audio audits the primary field-based check

Rely on audio audits instead of more expensive and logistically complex spot checks or back checks. Audio audits capture surveyor behavior, probing quality, and adherence to protocol, all without needing to be physically present in the field.

Step 3: Streamline the number of questions and checks

Reduce the number of questions included in each data quality check, including HFCs, but especially for field-based checks like spot checks, back checks, and audio audits. This can be done by making leaner spot check, back check and audio audit forms. Focus only on the most important questions that are complex, subjective, or known to be error-prone. 

Also, reduce the overall percentage of surveys being checked where possible, especially for high-performing surveyors. Additionally, instead of reviewing every instance of each selected question, check a smaller portion of the data to save time and effort without compromising data quality.

What this means for projects

Applying this approach in large-scale survey projects led to cost reductions of 15 to 25 percent in the data quality component of the budget, without any measurable decline in data integrity.

These savings come from:

  • Fewer field staff hours spent on reviews
  • Lower travel and logistics costs
  • Reduced complexity in tech systems
  • More timely detection and resolution of issues

At a time when development budgets are stretched, these savings can be reallocated to improve sample sizes, speed up reporting, expand geographic coverage, or just make high-value sample survey data accessible to more decision-makers.

Looking ahead

Development organizations need to adapt to a new reality of tighter budgets and rising expectations. Ensuring data quality remains non-negotiable, but we must be smarter about how we do it.

Our framework is already being adopted across multiple IDinsight projects and we encourage others in the sector to consider how to strike the best possible balance between rigour and cost. To learn more, please reach out to Doug Johnson.

  1. 1. Spot-checks are when research staff watch surveyors while they are conducting interviews. This helps ensure that the survey is being done properly. Spot-checks are usually carried out by senior members of the field/project team.
  2. 2. A back-check is when a respondent who has already been surveyed is interviewed again by a different surveyor, using a shorter version of the original survey. The answers from the back-check are then compared to the original responses.
  3. 3. High-Frequency Checks (HFCs) are regular checks on incoming data to catch errors early, ensure data quality, and track survey progress. They help identify issues such as broken skip patterns, missing responses, unusual answer trends, and outliers. HFCs also help monitor surveyor performance by looking at how many surveys each person completes, how long they take, and whether they are following the correct process. They can also detect signs of data fraud, such as very short surveys, incorrect GPS locations, or suspicious patterns like too many “no” or “don’t know” responses. These checks allow teams to quickly spot and fix problems, keeping the data accurate and trustworthy.