Part two in our series of lessons learned based on a 6,000 person survey in India and a 600 person survey in Kenya.
During COVID-19, rapid and accurate data collection on economic and physical health is vital to ensuring the best policy response. In our last post, we described the hiring and training processes we introduced to run a ~6,000 person phone survey in India and a ~600 person phone survey in Kenya. In this post, we share our experiences with data collection daily management: ensuring that the survey stays on pace, encouraging high levels of data quality, effectively managing feedback loops, and communicating with surveyors. Many of these practices are also useful for large scale in-person surveys that are being managed remotely.
Since our survey was short, we needed to prepare productivity and data quality measures ahead of time so that any findings could be introduced immediately. We created a dashboard to receive live updates on productivity, high-frequency checks, and audio audit scores on both an enumerator and district level.
We used a SurveyCTO dataset to connect to our trackers (see more in our post on reaching respondents), which provided information on when to call certain respondents based on previous attempts. In these trackers, we calculated surveyor-level statistics on the number of surveys that were completed, half-completed, not-reached, and refused. We compiled these productivity numbers so that District Coordinators could compare collective district-wide performance and the performances of individual surveyors. If a district seemed to be underperforming, we could try to uncover why and take steps to help boost productivity. For example, after seeing a specific district underperform, we learned that a surveyor had become ill. We were able to re-assign a high-performing surveyor from a high-performing district in the same language group to help complete surveys for the underperforming district.
High-frequency checks allow us to look for suspicious patterns in the data, to ensure quality. After looking through the questionnaire, we noted down a few places the data we collected could be inaccurate and flagged them. These included flags for outliers where responses are certainly possible, but rare (for example, a respondent stating their house has 12 rooms), logic inconsistencies (for example, a respondent claiming they did not have a bank account, but later stating they had received direct benefit transfers through the bank), and counts of the number of times a surveyor had marked a respondent answered “don’t know” and “refuse to respond”. We created a server dataset in SurveyCTO which outputted the variables we were interested in into a Google Sheet, where we added together the number of times each surveyor had inputted a value that we flagged. We were able to present this information to District Coordinators in a dashboard so that they could see which question received the most flags overall and which surveyors had the most flags on a question level.
We asked for consent to record phone calls (which we were able to record by linking SurveyCTO to an API of a call-recording software). After the first set of phone surveys were submitted, we systematized a Google Sheet that was populated with links to the audio recordings of the surveys. Our monitors were tasked to listen to these recordings while filling out an Audio Audit form. In this form, we copied a random assortment of questions from the main form and the monitor answered the questions on the form as if they were the surveyor filling out the initial survey. This allowed us to calculate mismatches between what respondents said and what was entered by the surveyors in the form. We also asked the monitor some qualitative questions to rank the surveyor’s speed, adherence to protocols, level of engagement etc. Since we did not want to call respondents back to ask essentially the same survey again, this was the main method in which we monitored data quality. In the survey we recently completed, monitors listened to 40 per cent of the consented recordings concurrently with data collection.
In India, we collected all of the productivity and data quality indicators in a dashboard for our team. With this dashboard, we were able to monitor district performance and data quality at a high level. Our State and District Coordinators also had access to the dashboard and referenced it during debriefs. Having a regularly updated dashboard allowed us to take immediate action if we found that productivity or quality were not up to standards, which was vital given the short timeframe of the survey.
11 February 2025
16 January 2025
19 December 2024
16 December 2024
6 December 2024
5 December 2024
4 December 2024
22 October 2020
16 February 2022
13 November 2020
8 March 2022