Skip to content

Empowering districts to improve administrative data quality in India

Jasmeet Khanuja 27 June 2024

Female surveyors participating in a training session before heading out for data collection in India ©Simran Saini/IDinsight

Quality administrative data has always been critical to a well-functioning public sector, but over the years, the government of India has increased its reliance on administrative data to make a wider range of decisions, and more consequential decisions, than ever before. The problem is that the quality of administrative data remains uncertain as its collection, monitoring and analysis are plagued with multiple challenges.

In this post, we discuss a partnership between IDinsight and the Gates Foundation to develop a data quality improvement strategy to enhance the credibility of district-level administrative data in India.

The expanding role of administrative data

Quality administrative data is a crucial tool in effectively serving the public informing daily decisions that impact human health, equity, and development. For example, in India, data on the number of pregnant women in a district determines how much supply of supplementary nutrition and vaccines. Data on anaemia, weight, and blood pressure allows frontline workers to identify and intervene in high-risk pregnancies. Consequently, it is the most cost-effective and readily available data at the administration’s disposal for effective decision-making. 

Increasingly, national programs have put administrative data at the core of their success. For example, under the Aspirational District Program (ADP), the government uses admin data to rank underperforming districts on 81 indicators. It provides financial rewards to the one that records the highest monthly improvement (observed through delta ranking). While the demand for and use case for administrative data has expanded, parallel steps to validate, assess and improve data quality have lagged behind.

(Some) progress in administrative data quality

Factors adversely affecting administrative data quality include the absence of quality control, insufficient training of frontline staff, inadequate record-keeping systems and poor resourcing, resulting in a high workload. Pressure to achieve certain outcomes can also create incentives to over- or under-report indicators, leading to incorrect data reporting. A 2016 paper found “significant differences in reported and evaluated coverage of Maternal and Child Health services” and that “the variables mostly over-reported were the ones for which high levels of coverage were desired but not achieved.”

Various efforts to improve monitoring and data quality are currently underway in India. For example, the introduction of digital data collection tools and web-based monitoring systems is a step in this direction. In addition, there are guidelines to conduct supportive supervision, institute data validation committees, and facilitate social and independent audits. Some states have also made efforts to institutionalize data quality assessments.1 Although guidelines and efforts exist, implementation has been fragmented. Supervision visits are usually conducted on a convenience or need basis. ‘’Non-random” supervisory visits bias the selection of facilities for verification, and remote or seemingly well-performing ones may be left out. Second, findings from data quality visits are not systematically recorded and, therefore, not tracked or assessed over time. Finally, there is insufficient focus and discussion on data quality amongst officials and program staff at district or lower administrative levels.

The Project: Co-developing and introducing verifications

In 2018, IDinsight, Gates Foundation and NITI Aayog partnered to develop a data quality improvement strategy to enhance the credibility of administrative data that ADP relies on.2 The project’s main goal was to develop a sustainable strategy to empower district administrations to produce credible and verified data. It also aimed to address the incentive and accountability concerns in data reporting in the health, nutrition, and education sectors. The project entailed co-developing and introducing a series of verifications to help districts measure and improve their data quality.

A vital outcome of the project was the creation of a digital toolkit that facilitates data verification by the administration (hereafter referred to as ‘admin-led verification toolkit’ or ’toolkit’) in the existing supervisory structure.

The toolkit: tech-enabled and embedded in existing systems

The toolkit consists of two components: digital data collection forms that allow data entry and random sampling at the time of data entry, and a back-end pipeline to analyze and triangulate information entered into it. The digital forms are downloaded on mobile phones and used by supervisory cadres (program managers, monitoring officers, and data entry operators) to verify data. The tool assesses data quality by triangulating data recorded at key junctures of data flow and reconciling the results in action-oriented data quality reports.   

Administrative data flows from the unit level to districts, where it is aggregated. Figure 1 describes the data flow of maternal and child health indicators from the unit level to district totals reported in the Health Management Information Systems (HMIS).34

Figure 1:

The admin-led verification toolkit facilitates verification between three key junctures of the administrative data flow: 

  1. Service users receiving services at the facilities
  2. Facilities maintaining the record of service recipients
  3. Digital portal maintaining the digital record of services delivered by facilities

In addition, it facilitates two types of checks:

  1. Aggregation (whether counts add up correctly) – Supervisors manually aggregate indicator value from facility registers. This value should match the indicator value reported by the facility in the digital portal (see Figure 1). 
  2. Accuracy (whether data tabulated is matched against the source evidence from at the level below). Supervisors record the data of sampled respondents from facility registers. This value should match the value reported by the sampled respondent.

Figure 2 describes the two checks in detail for an institutional delivery indicator:

The two levels and types of checks provide adequate information to understand the source of error in an indicator’s value. The key is that the sample is random and representative of all the data captured for that indicator.

Third-party-led verification 

A vital aspect of the verification toolkit is verification by a third party. The third party can either be local communities for conducting social audits or village committees recommended under the program monitoring guidelines. The third-party can verify a small subset of administration-verified data to incentivize correct reporting amongst officials. This subset can either be a random sample from the administration-verified data or a purposive sample by observing the trend in misreporting. If the supervisors know that the data they verify will be further substantiated, there is less potential for collusion or misreporting by them.

The key features of the admin-led verification toolkit include the following:

1. Leverage existing supervisory structures to conduct verification 

The toolkit utilizes existing supervisory cadres such as district program managers, block program managers, and monitoring officers responsible for improving data reporting and supervision, thereby not increasing the administrative burden. An ideal supervisor should provide supportive feedback based on findings and not feel compelled to collude with staff members to misreport data in the tool. Districts can also explore assigning supervisors from other sectors to verify data for a different sector to disincentivize potential collusion to misreport.

2. Tech-enabled random sampling to enable verifications 

The toolkit provides supervisors with facility and respondent information for data verification via tech-driven random sampling. This occurs at two levels: first, supervisors visit a randomly selected sample of facilities, and then they sample respondents from the registers of those facilities for in-person or remote verifications. Sampling at both levels is tech-enabled, automatic and randomised to provide an unbiased estimate of the data quality. Further, no technical skills are needed for supervisors to access the sample.

3. Configured to work in a resource-constrained setting 

The toolkit is designed to work in resource-constrained environments with limited internet access and is configurable to multiple sectors and data flows. In addition, most of the tool-based verification is remote or via phone calls, making it convenient to use in remote and segmented areas.

4. Facilitate effective interventions through data quality reports  

A quintessential component of the toolkit is the reconciliation of findings from different levels of checks and presenting them in an accessible and understandable manner. The program staff can use the results to identify sources of errors and plan appropriate interventions to improve quality issues at the source.

5. Correct distorted incentives to misreport and establish accountability 

The toolkit allows for elevating “data quality” to a critical performance metric for supervisory and data reporting cadres. Incorporating data quality as an evaluative parameter places it on par with sectoral achievements, motivating personnel to prioritize it. Senior officials can publicly recognize the work of supervisors who conduct regular verifications in their areas. 

Further, the third-party-led verifications enforce the responsibility for accurate reporting. The tool allows capturing location, photos and audio during the verification process. For example, supervisors can take photographs, record conversations, and trace GPS locations, enabling third parties such as local NGOs or village committees to conduct further verifications. Such verifications create an accountability mechanism for the supervisors, thereby improving supervision effectiveness.

Potential scope 

The admin-led verification toolkit provides an innovative supplement to the array of data improvement interventions that ministries, states and districts are already implementing, and has the potential to significantly contribute to advancing public sector data. It adds rigour and structure to the ongoing supervisory work by providing district administration with the resources required to sample and diagnose inconsistencies in the data flow correctly. 

The purpose of the verifications is not to lead to punitive actions. As established earlier, a range of factors contribute to inaccuracies in data entry, which the administration can address through capacity building and other viable interventions. Nonetheless, the toolkit holds the potential to anchor districts in taking an evidence-based approach to bring data quality to the forefront of public sector discourse. It does so by considering multiple district limitations and creating incentives to report data accurately. 

Note: The team piloted the admin-led verification toolkit in a district in August 2023 and will use the findings to build and improve it further. In addition, the DataDelta team is working towards creating an actionable Data Quality service tool with a custom Machine Learning algorithm for quick, efficient quality diagnosis, tailored to the partner’s priorities and capabilities.


I deeply appreciate the in-depth review of this blog from my colleagues at IDinsight. I express my heartfelt thanks to Laura Burke, Will Thompson, Neha Raykar, Vikas Dimble, Shreya More, Akashmegh Sharma and Leonard Tin.

  1. 1.
  2. 2. Under ADP, the government ranks 112 socio-economically underperforming districts on 81 indicators and rewards the one that records the highest improvement in a month (observed through delta ranking). It encourages convergence, collaboration and competition amongst districts to improve their socio-economic outcomes.
  3. 3.!/
  4. 4. In certain states data from registers is aggregated and reported in the HMIS paper format, which is then digitised by the Data entry operator.