IDinsight Field Manager Lead Syed Maqbool (second from right) during data collection in Andhra Pradesh, India ©IDinsight
Anyone who has been a part of any primary data collection exercise is well aware of the ample challenges practitioners face in order to collect high quality information. From hiring the best field team, to setting up the right systems, to managing incoming data to maintaining quality, one needs to account for each of these activities along the process. While primary data collection by itself can be challenging, conducting a survey on a large scale presents unique challenges that require careful planning and execution.
In partnership with the Bill and Melinda Gates Foundation and Niti Aayog, as part of the Aspirational Districts Program (ADP), IDinsight recently conducted primary data collection in 38 districts1 across eight states2 in India. The aim of the data collection exercise was to collect and verify administrative data in the health, education, and nutrition sectors, starting from where data originates (beneficiary-level reports in the health facility and school registers) to where it’s aggregated into paper proformas, to where it’s digitised into the portal. As part of the project team, this presented unique challenges for us, given the scale and scope of the exercise.
Given the different sectors and levels of data collection, our respondents ranged from bureaucrats in the local government (Regional Government officials) to frontline workers at health facilities and Anganwadi centres (government-sponsored child care centres in India) to headmasters in schools to program beneficiaries (which primarily included pregnant and lactating women and parents of school-going children). This required us to conduct in-person as well as phone household surveys in addition to facility and school-level surveys. We conducted the data collection across 1100 schools, 1100 anganwadi centres, 4100 health facilities , 500 local government offices and 8300 households. Overall, the scale, varying sectors, and respondents, the different levels of data collection, and the varied modes of the survey warranted substantial planning to mitigate expected challenges.
In this blog post, we share the detailed processes and systems we set up along the data collection pipeline to ensure quality and facilitate this complex data collection. Our experiences offer valuable insights for others looking to conduct primary data collection at scale, and we hope it serves as a useful resource for researchers facing similar challenges.
1. Timely inter-department collaboration and information exchange: The scope of our survey called for close collaboration with the local administration (district) departments for two main reasons.
However, this collaborative requirement presented a major challenge – the possibility of delays if coordination failed with even one department in one region.
2. Building local teams of expert data collectors from the ground-up: Given that we were hiring survey teams (12 personnel per district) simultaneously across multiple districts, we had to ensure that the processes and tools were standardised. As a result, the ad-hoc traditional tracker, interview questions, etc. used for screening and shortlisting enumerator candidates traditionally would have led to differing candidate quality, mismanagement of personal applicant data and additional resource investment. Secondly, we anticipated encountering a limited pool of qualified candidates in some districts, and the complexity of the survey and its long timelines increased the possibility of attrition. This was going to be a major challenge in about 50% of districts where we were hiring a local survey team for the first time and hence, our team did not have any prior network.
3. Capacity building and management at scale: Training a large number of enumerators can be a daunting task and in the absence of standardisation in delivery, coordination across locations, and maintenance of motivation, our enumerators would have lacked the necessary skills to collect accurate data.
4. Creating management systems to ensure quality data collection: Given the complexity of the survey and size of the survey team, we expected challenges in field management and quality monitoring as well when the data collection activity would kick off. We will talk about how building rigorous survey systems and processes helped us mitigate these challenges in our next blog post.
– Building rapport by leveraging communications: Coordinating with different administrative units can be a challenging task, and therefore, we standardised our communication process to ensure effective coordination. We assigned a Point of Contact (PoC) for 4-5 districts who used an email template for the first line of communication. The email introduced our verification process, upcoming activities, and the required support from the local governments. However, we anticipated the need for further follow-ups, which were done by our field coordinators (with support from the PoC) through phone calls or in-person visits to ensure compliance with the requirements. We were able to collate the required datasets and official letters of support from 37 of the 38 districts within a month, without requiring a major in-person presence in most of them. Each PoC used the same naming convention to encrypt and upload the data in a secure folder. The relevant core team members then performed basic checks on these datasets and followed up with the government department via the PoC, wherever necessary.
Based on calculations of sample size, survey length, nature of the survey, expected daily productivity, and the project budget, we estimated a requirement of about 12 personnel per district for data collection, including 10 enumerators and 2 supervisors. In all, we were looking at hiring 400+ candidates across all districts. In order to efficiently manage this high-volume hiring which would pose a diverse range of challenges, we put in place the following measures.
– Decentralised execution of a standardised hiring process: For screening and selecting the survey team, we followed a three-staged approach across all districts which included shortlisting applications, phone based interviews followed by a thorough assessment through in-person classroom and field training.
The materials required to run the above-mentioned steps (such as the application form, job description, and interview guide) and the tools (such as district hiring trackers and a dashboard) were developed by the central team with inputs and feedback from the field team. Taking into consideration the profile of the intended users, the application form and interview questions were translated into three local languages- Odia, Assamese and Hindi.
However, the execution of the process was highly decentralised with the respective district’s field team, i.e. Cluster Coordinator (CC)3 and Regional Coordinator (RC)4, leading most of the processes. For instance, while scoring and shortlisting of the applicants across the three stages were mostly automated (as explained below), the coordinators were able to manually override whenever there was a clear rationale as well as change the cut-off scores whenever required. Similarly, depending on the local hiring scenarios, the coordinators also shortlisted a few buffer candidates at every stage. Such provisions proved to be of great benefit to tackle district-specific challenges such as limited availability of candidates, low success rate, and last-minute dropouts.
With this arrangement, we were able to give control back to the CC, thus allowing them to make quick hiring decisions with oversight from the central team in the form of standardised processes, materials, and tools as well as regular supervision.
– Hiring a hyper-local survey team: From our experience, we have learned that local enumerators are much more familiar with district local dialects, cultural context, locations, and road networks, and hiring them goes a long way in improving survey productivity and data quality. Moreover, for the current project, we expected facilities and beneficiaries to be spread out across blocks in a district. Thus, adding another layer of localisation- hiring at a block level seemed like a reasonable strategy to optimise budgets and time. But as one would expect, such a hyper-local recruitment strategy was not feasible across all districts. In such cases, exceptions were made allowing the selection of candidates from neighbouring districts and/or blocks.
Through all these efforts we received about 6700 applications, which were assessed and screened through interviews and training – both of which were conducted by the coordinators in the district’s local language.
– Automating most repetitive and time-consuming tasks: For any high-volume recruitment, automation of repetitive admin tasks not only saves the coordinator’s time and allows them to focus on more complex tasks, but also helps with maintaining transparency and consistency across the hiring processes. We were able to achieve this automation through custom-built trackers developed by SurveyStream.5
With the help of these automated trackers, we were able to hire a survey team of about 430 local candidates across 38 districts in a timespan of about a month. By automating various tasks such as manual data entry, de-duplication, filtering, and publishing district-specific information, we saved a significant amount of time. This streamlined approach allowed us to efficiently manage various user groups, including district and state coordinators and central team members while also reducing the risk of human errors.
For our survey, we were looking at training 37 cluster coordinators and 8 regional coordinators, along with a survey team of 430 enumerators across Hindi, Odiya and Assamese-speaking districts. To effectively train such a diverse field team, we followed a cascade model, wherein we trained our RCs-CCs online. The CCs (with support from RCs) then trained the surveyor team in person, in their respective districts.
To maintain the quality of training across 37 districts, we carefully systematised the processes throughout the training pipeline.
There were 3-4 districts that had to extend their training by a day, to conduct retraining sessions as necessary. However, we had already budgeted for ample buffers throughout the training and data collection period. This was essential to maintain the same training quality across multiple districts. Another thing to note is that since this was a four month long data collection exercise, there would be CCs who would go on leaves, in which case we had RCs substitute for them during the leave period and take up training conduction or data collection management.
In our upcoming blog post, we’ll delve into how establishing effective systems and assembling a well-suited team as detailed above helped us overcome field management and quality monitoring challenges during a large-scale survey. Stay tuned for more details!
We deeply appreciate the comments and reviews on this blog from our colleagues at IDinsight. Thank you, Emily Coppel, Girish Tripathi, Leonard Tin, Lipika Biswal, Jahnavi Meher, Jasmeet Khanuja, Pramod Kumar, and Puneet Kaur.
16 April 2025
14 April 2025
10 April 2025
7 April 2025
1 April 2025
24 March 2025
18 March 2025
18 April 2024
3 May 2024
15 May 2024
25 February 2022