DataDelta teammates during data collection in India. ©IDinsight
Administrative data – data gathered to track and inform implementation of public-sector programs, as well as to measure key human development outcomes – is fundamental to effective governance. It enables public institutions to monitor program processes and activities, and track performance. Government schemes and programs often collect such data themselves, enabling easy and quick access to the data and reducing the necessity for additional monitoring efforts by third parties. Yet, local and national governments often struggle with ensuring administrative data is high-quality.
This is where IDinsight’s DataDelta team recently played a supportive role.
In 2018, the Indian government launched a new program that aimed to localize development efforts and reduce poverty in its most disadvantaged districts. The Aspirational Districts Programme (ADP), launched by the National Institution for Transforming India (NITI) Aayog, a government policy think tank, creates incentives for districts to perform well by ranking them based on the progress of different socioeconomic indicators and extending financial rewards to the highest ranking districts.1
While the new program encourages progress through competition and cooperation, NITI Aayog has been concerned about the quality of district data stemming from weak systems and capacity. Another concern has been the potential incentive for districts to inflate their numbers to get a higher rank. These concerns are compounded by the fact that data is not independently collected–– districts report indicators based on their own administrative data, without a strong system to ensure data quality and accurate reporting.
To help address this concern, IDinsight’s DataDelta team worked with NITI Aayog to build a strategy to verify the accuracy and representativeness of district-level data. The strategy is based on the principle that administrative data should be high-frequency and high-coverage and that accurate admin data can provide useful and decision-relevant data to policymakers at a much lower cost than alternatives like household surveys.2
This post provides an overview of the data quality challenges the districts were facing and the data verification strategy that the DataDelta team developed.
The way it has worked is that frontline workers collect data in registers from beneficiaries as part of their jobs. That data is then digitized and aggregated at the block and district level and then reported to NITI Aayog on its data entry portal.
NITI Aayog performs a few manual quality checks on aggregated district data before compiling district rankings. However, the lack of a credible validation exercise to ground the truth of the districts’ self-reported data hinders NITI Aayog’s ability to distinguish between accurate and inaccurate submissions. In addition, the reliance on manual checks limits the scope and effectiveness of detecting potential issues. Resolving these data quality challenges is of paramount for the ADP as it is a necessary condition for reliable evaluations of district performance and for enabling data-driven decision-making for maximum impact.
Although local governments in India have a wealth of data, there are several challenges that hamper data collection, maintenance and quality of data. These challenges include:
This section details the first stage of our work which finally culminated in building an administrative-led data verification toolkit.
Through our engagement on the Aspirational Districts Programme, NITI Aayog supported us in building relationships with district officials (e.g., deputy commissioners/district magistrates). These officials are the main users of district-level data and custodians of the data-flow reporting process. They have the strongest incentive to create robust administrative data systems. We prepared and presented NITI Aayog’s vision and our plan to find the level and extent of data quality errors to the district magistrate, the ADP nodal officer and department heads of relevant sectors. We then got their go-ahead to speak to officials at all levels in the district, from district program managers to frontline workers.
NITI Aayog had asked IDinsight to support them in finalizing the list of ADP 2.0 indicators, with a special emphasis on identifying indicators that are vague or not relevant to ADP priorities.
Our team employed a five-stage indicator analysis framework (SMART-V) to deeply review the full list of indicators to prioritize for verification. We made the process more robust by scoring each indicator based on the reliability and feasibility of the backcheck, creating composite scores on a 0-3 scale.
After presenting our list of indicators to NITI Aayog, we further prioritized a smaller list of indicators across three sectors of health, education and nutrition, based on NITI Aayog’s directive.
1. We identified local stakeholders involved in the data flow Our team conducted extensive research into government schemes by scouring through government documentation while speaking to internal and external sectoral experts to understand the human resource and reporting structure for each sector and indicator.
2. We created interview guides and interviewed officials at all levels
To learn about on-the-ground realities and how they differ from the scheme documentation on the roles and responsibilities of officials that operationalize the entire data flow, our team created bespoke interview guides for each cadre of officials across each level of government. The guides covered questions related to indicator definitions, triangulated sources for the data entered at each level for each indicator, and granular details on the process for movement of a single data point for each indicator from one level to another.
These guides were then used to interview officials at all levels – district, block and frontline workers. Additionally, we attempted to capture any discrepancy or movement in data flows by visiting facilities and officials across two blocks – one in the district center and one farther from it.
We also collected sampling data from these districts simultaneously to check whether we could build our sampling strategy.
3. We visualized the data flows –
After the mini qualitative exercise to map district data flows, we visualized them in the following format:
1. Identifying main data checkpoints
We identified the main data entry and aggregation checkpoints after visualizing the data flow. In the example above, data is entered into a three-tiered local administration structure at the district, facility, and unit level.
2. We thought through each type of check at every level of data flow
Verifying data from the unit level to the district level looks a lot like quality control in an assembly line. Quality control in such a setting involves randomly sampling units of a batch at critical data flow point as they come down the line and ensuring some threshold percentage meets a quality standard. Through such a process, we could identify the quantum of errors and pinpoint where the data flow issues persisted.
Levels in the data-quality assembly line (top-down approach)
The images below illustrate the overall data flow (a) and verification process from creating a district-level match rate to the beneficiary-level match rate (b) for one indicator, the percentage of women that registered for antenatal care (ANC) in the first trimester against the total number of women that registered for ANC.
3. We built out a rigorous sampling strategy – A strategy that is representative at the district level for all indicators and covers all blocks in the district and different types of facilities.
Our DataDelta teammates, Shreya and Isha, describe the detailed processes and systems set up along with the data collection pipeline to ensure data quality and to facilitate a complex data collection exercise at scale in this post.
We kept learning from our processes and built an admin-led data verification toolkit for improving local data quality and governance. Our colleague Jasmeet talks about the process here.
In conclusion, IDinsight’s DataDelta team played a crucial role in gathering knowledge and building a solution to the challenges associated with the quality of administrative data. Through our analysis, we found that data quality issues are distributed across sectors and across the data flow and cannot be attributed to only one administrative level. This debunks the colloquial and on-field notion that the majority of errors creep in at the beginning of the data flow i.e., at the beneficiary level.
Our learnings from this verification exercise have culminated into a tech-based lighter touch admin-led verification toolkit that we’re piloting in Uttar Pradesh to improve data and evidence-driven governance.
We deeply appreciate the comments and reviews on this blog from our colleagues at IDinsight. Thank you, Puneet Kaur, Laura Burke, Will Thompson and Vikas Dimble.
Questions or comments? Please share below!
5 December 2024
4 December 2024
3 December 2024
18 January 2022
27 June 2024
25 February 2022
20 October 2020