Skip to content
Blog

How DataDelta created a data verification strategy to support local development efforts in India

DataDelta teammates during data collection in India. ©IDinsight

Administrative data – data gathered to track and inform implementation of public-sector programs, as well as to measure key human development outcomes  – is fundamental to  effective governance. It enables public institutions to monitor program processes and activities, and track performance. Government schemes and programs often collect such data themselves, enabling easy and quick access to the data and reducing the necessity for additional monitoring efforts by third parties. Yet, local and national governments often struggle with ensuring administrative data is high-quality. 

This is where IDinsight’s DataDelta team recently played a supportive role. 

In 2018, the Indian government launched a new program that aimed to localize development efforts and reduce poverty in its most disadvantaged districts. The Aspirational Districts Programme (ADP), launched by the National Institution for Transforming India (NITI) Aayog, a government policy think tank, creates incentives for districts to perform well by ranking them based on the progress of different socioeconomic indicators and extending financial rewards to the highest ranking districts.1

While the new program encourages progress through competition and cooperation, NITI Aayog has been concerned about the quality of district data stemming from weak systems and capacity. Another concern has been the potential incentive for districts to inflate their numbers to get a higher rank. These concerns are compounded by the fact that data is not independently collected–– districts report indicators based on their own administrative data, without a strong system to ensure data quality and accurate reporting. 

To help address this concern, IDinsight’s DataDelta team worked with NITI Aayog to build a strategy to verify the accuracy and representativeness of district-level data. The strategy is based on the principle that administrative data should be high-frequency and high-coverage and that accurate admin data can provide useful and decision-relevant data to policymakers at a much lower cost than alternatives like household surveys.2

This post provides an overview of the data quality challenges the districts were facing and the data verification strategy that the DataDelta team developed.

Local data quality challenges

The way it has worked is that frontline workers collect data in registers from beneficiaries as part of their jobs. That data is then digitized and aggregated at the block and district level and then reported to NITI Aayog on its data entry portal.  

NITI Aayog performs a few manual quality checks on aggregated district data before compiling district rankings. However, the lack of a credible validation exercise to ground the truth of the districts’ self-reported data hinders NITI Aayog’s ability to distinguish between accurate and inaccurate submissions. In addition, the reliance on manual checks limits the scope and effectiveness of detecting potential issues. Resolving these data quality challenges is of paramount for the ADP as it is a necessary condition for reliable evaluations of district performance and for enabling data-driven decision-making for maximum impact.

Although local governments in India have a wealth of data, there are several challenges that hamper data collection, maintenance and quality of data. These challenges include:

  1. Limited comprehension of indicators of interest across sectors:3
    Data issues may stem from a lack of training and understanding of how to calculate indicators. There’s a need to create uniform indicator definitions and standardize processes related to recording, tabulating and reporting guidance by  ground-level facilities.
  2. Multiple and diverse reporting formats/platforms and lack of validation:4
    Data from one source often contradicts or is inconsistent with data for the same indicator from another. This may be due to different definitions, classifications, differences in frequency of reporting, or simply due to an error in one source.
  3. Personnel responsible for maintaining data flows lack capacity due to inadequate training and overwhelming workloads:5 Delays in recruitment and the presence of inactive and overburdened staff with limited computer literacy and technical understanding results in errors when recording and summarizing data affecting the overall data quality.

Steps to building a practical data verification strategy

This section details the first stage of our work which finally culminated in building an administrative-led data verification toolkit.

1. We worked to get buy-in and generate interest in our work from the main stakeholders involved in the process

Through our engagement on the Aspirational Districts Programme, NITI Aayog supported us in building relationships with district officials (e.g., deputy commissioners/district magistrates). These officials are the main users of district-level data and custodians of the data-flow reporting process. They have the strongest incentive to create robust administrative data systems.

We prepared and presented NITI Aayog’s vision and our plan to find the level and extent of data quality errors to the district magistrate, the ADP nodal officer and department heads of relevant sectors. We then got their go-ahead to speak to officials at all levels in the district, from district program managers to frontline workers.

2. We prioritized indicators and sectors for verification

NITI Aayog had asked IDinsight to support them in finalizing the list of ADP 2.0 indicators, with a special emphasis on identifying indicators that are vague or not relevant to ADP priorities. 

Our team employed a five-stage indicator analysis framework (SMART-V) to deeply review the full list of indicators to prioritize for verification. We made the process more robust by scoring each indicator based on the reliability and feasibility of the backcheck, creating composite scores on a 0-3 scale. 

 After presenting our list of indicators to NITI Aayog, we further prioritized a smaller list of indicators across three sectors of health, education and nutrition, based on NITI Aayog’s directive.

3. We mapped district-level data flows

1. We identified local stakeholders involved in the data flow 

Our team conducted extensive research into government schemes by scouring through government documentation while speaking to internal and external sectoral experts to understand the human resource and reporting structure for each sector and indicator.

2. We created interview guides and interviewed officials at all levels

To learn about on-the-ground realities and how they differ from the scheme documentation on the roles and responsibilities of officials that operationalize the entire data flow, our team created bespoke interview guides for each cadre of officials across each level of government.

The guides covered questions related to indicator definitions, triangulated sources for the data entered at each level for each indicator, and granular details on the process for movement of a single data point for each indicator from one level to another. 

These guides were then used to interview officials at all levels – district, block and frontline workers. Additionally, we attempted to capture any discrepancy or movement in data flows by visiting facilities and officials across two blocks – one in the district center and one farther from it. 

We also collected sampling data from these districts simultaneously to check whether we could build our sampling strategy.

3. We visualized the data flows –

After the mini qualitative exercise to map district data flows, we visualized them in the following format:

4. We built the verification strategy

1. Identifying main data checkpoints

We identified the main data entry and aggregation checkpoints after visualizing the data flow. In the example above, data is entered into a three-tiered local administration structure at the district, facility, and unit level. 

  1. District level – District M&E and District Programme Manager, NHM – check the data in the district Health Management Information System portal before it is reported to NITI Aayog’s Champions of Change data entry portal. 
  2. Block/Facility level – Relevant indicator data for each facility in the block is entered into HMIS from the facility level HMIS paper formats (District Hospital, Primary Health Center, Community Health Center, Sub-Center) by the block data entry officer (DEO). 
  3. Unit level – Frontline workers i.e., ANMs collect and maintain the data in registers or informal diaries. They take unit-level information about beneficiaries and calculate aggregate numbers and enter that into the paper HMIS formats, which are then digitized at block health facilities.

2. We thought through each type of check at every level of data flow

Verifying data from the unit level to the district level looks a lot like quality control in an assembly line. Quality control in such a setting involves randomly sampling units of a batch at critical data flow point as they come down the line and ensuring some threshold percentage meets a quality standard. Through such a process, we could identify the quantum of errors and pinpoint where the data flow issues persisted.

Levels in the data-quality assembly line (top-down approach)

  1. District-level match rate: Match district-level indicator value with data reported to NITI Aayog’s Champions of Change dashboard.
  2. Data-entry match rate: Match facility-level digital data from HMIS with data from facility-level HMIS paper formats.
  3. Facility register totals match rate: Match the totals from each register (for indicators of interest) used to report data from sampled facilities with the data entered into facility HMIS paper formats.
  4. Beneficiary-level match rate: Match the data reported in facility registers by randomly sampling beneficiaries and directly meeting them to confirm whether they received the reported service. Additionally, it involved going through their MCP/immunization cards to confirm the same.

The images below illustrate the overall data flow (a) and verification process from creating a district-level match rate to the beneficiary-level match rate (b) for one indicator, the percentage of women that registered for antenatal care (ANC) in the first trimester against the total number of women that registered for ANC.

a. Overall data flow and checks (1-4)
b. Unit level data flow

3. We built out a rigorous sampling strategy – A strategy that is representative at the district level for all indicators and covers all blocks in the district and different types of facilities.

Key features of our final verification strategy 

  • Sector agnostic: We have been able to employ different versions of this strategy across sectors and indicators as it can plug into multiple contexts with minimal changes based on the data flow. 
  • Covers all levels of the data flow: This strategy captures the breadth and the depth i.e., quantum of data quality issues (match rates at each level). It allows the administration to focus on areas of improvement – what levels require additional resources – human power, additional capacity training, etc.
  • In-field sampling in under-resourced settings: This process utilizes an in-app random number generator to automatically sample beneficiaries to visit directly from the registers used to report data to HMIS.

How we operationalized the verification strategy

Our DataDelta teammates, Shreya and Isha, describe the detailed processes and systems set up along with the data collection pipeline to ensure data quality and to facilitate a complex data collection exercise at scale in this post. 

We kept learning from our processes and built an admin-led data verification toolkit for improving local data quality and governance. Our colleague Jasmeet talks about the process here.

In conclusion, IDinsight’s DataDelta team played a crucial role in gathering knowledge and building a solution to the challenges associated with the quality of administrative data. Through our analysis, we found that data quality issues are distributed across sectors and across the data flow and cannot be attributed to only one administrative level. This debunks the colloquial and on-field notion that the majority of errors creep in at the beginning of the data flow i.e., at the beneficiary level. 

Our learnings from this verification exercise have culminated into a tech-based lighter touch admin-led verification toolkit that we’re piloting in Uttar Pradesh to improve data and evidence-driven governance.

Acknowledgements

We deeply appreciate the comments and reviews on this blog from our colleagues at IDinsight. Thank you, Puneet Kaur, Laura Burke, Will Thompson and Vikas Dimble.

Questions or comments? Please share below!

  1. 1. The first and second ranked districts in overall terms are awarded Rs. 10 crores and Rs. 5 crores respectively. The first ranked district from each of the five sectors as a part of the programme is awarded Rs. 3 crores each.
  2. 2. In the past, household surveys were employed as a means to tackle the problem of ground truthing and data quality. However, conducting household surveys on the necessary scale and frequency for the ADP (i.e., monthly, encompassing over 70 indicators across 112 districts in 27 states) is financially impractical and time consuming. Moreover, it is not always trusted by administrators.
  3. 3. Source: Strengthening the Health Management Information System: Pilot Assessment of Data Quality in Five Districts of India
  4. 4. Source: Administrative Data: Issues, Concerns & Prospects : An Indian Perspective
  5. 5. Source: Examining policy intentions and actual implementation practices: How organizational factors influence health management information systems in Uttar Pradesh, India