Skip to content

Avoiding bias while locating farmers in Zambia

This post explores how IDinsight addressed the tendency of program staff to want their most successful participants surveyed.

IDinsight surveyor Zewelanji Phiri conducts a post-harvest endline survey in Zambia’s Southern Province. ©IDinsight/ Kondwani-yobe Mumba


It can be a huge challenge identifying survey respondents in sparsely populated regions with limited information about where people live. This is especially the case when participants are recruited outside of the home: for example at a clinic, school, or as we’ll cover in this post, an agricultural extension training. For a process evaluation in Zambia, we needed to find hundreds of randomly sampled farmers — but all we had was a list of people who attended training. Many of them were impossible to track down without someone from the program we were evaluating supporting the effort to find where they lived. But once we started working with program staff, they were interested in taking us to their strongest participants. This post shares how we overcame the tendency of program staff to only want their most successful trainers or farmers to be surveyed — or to otherwise positively influence the findings of the survey.

Our two takeaways:

  1. Sometimes it is better to be slow when sampling if it means avoiding bias.
  2. It is always important to account for implementers’ conflicting incentives, especially when it may be in their best interest for an evaluation to show their program’s success.
Pilot: sampling and locating farmers in low-density areas with little information

We wanted to understand whether prominent, volunteer farmers in a given community (lead farmers) could and would attend a centralized training and then, in turn, effectively deliver that training to farmers in their respective communities, sharing recommended practices for post-harvest crop management practices. Would these newly trained farmers, then, be willing and able to change their practices? (The short answer is yes, yes, and no ). We interviewed 347 farmers and 53 lead farmers. We focused only on farmers who were recorded to have received the training because we sought to understand whether the program logic held. (Check out the addendum for more on the context, sampling strategy, and results).

Once we selected farmers from the training attendance lists, we had to find them. The list of trainees contained, at most, the name, sex, age, village/district name, and phone number for each trained farmer — but this information came with challenges. Most village names are unmarked. The registers generally provided farmers’ government names, which may differ from the name they commonly use. We were not provided with most farmers’ phone numbers, and of those we received, few had both a working phone and network coverage.

The low population density of Zambia’s Southern and Central provinces made this work even more difficult: villages can be sprawling and widely dispersed, spread up to hundreds of kilometres away from district centres.

The combination of all these factors meant that a specific person’s household and farm were nearly impossible to find on our own, on an efficient timeline, without either a working mobile number or a local interlocutor.

Given this situation, once we randomly chose farmers from these lists, during the pilot, the implementer’s permanent field staff helped us to physically locate the farms of the selected lead and trained farmers during piloting. Implementing field staff were easy to contact, met regularly with the lead farmers, and knew where the villages were located. When we located a farm, we would approach the farmer, with the implementer in tow, and announce that would be asking questions about the training.

However, the seemingly efficient pilot strategy presented challenges in getting an accurate picture of what happened following training.

Challenge: finding farmers who had genuinely attended the original trainings, when our interlocutors had different incentives

Since the implementing organization mandated each lead farmer to train 200 farmers, there was a risk the lead farmer had inflated the number of farmers he/she trained by adding names to their lists. This turned out to be true: at the end of surveying, we found that only 70% of the reportedly trained farmers we surveyed had actually been trained1

We worried about bias from two possible sources: lead farmers and implementing field staff. Both may have faced incentives to guide us to farmers who had been trained and could demonstrate their training — lead farmers may want to look good in front of implementing staff, while implementing staff may want the evaluation data to look good for their employers. This proved to be the case during our pilot and survey.

Some implementing staff and lead farmers asked why we had selected some poor performing (lead) farmers, by their measure, and left out known top performers.

Additionally, during piloting, lead farmers would bring us to only the subset of selected farmers who had been trained (by telling us that selected farmers who had not been trained were unavailable or had moved), or attempting to train other farmers on the spot.

Though we did not directly observe it, we also hypothesized that implementing field staff were calling lead farmers in advance of our visits, providing lead farmers time to do additional trainings. While any kind of information dissemination about good post-harvest practices could be beneficial to farmers, we wanted to talk to farmers who had really attended the planned trainings to see if the model was scalable without the attention (threat?) of an evaluator.

Overcoming challenges: revision to locating and questioning farmers

We realized that in our effort to locate farmers as efficiently as possible, and despite random selection, our sample was being biased in favour of ‘good’ farmers.

We realized the survey would have to serve an additional purpose as an auditing function: we wanted to see whether the farmers reported as trained had actually been trained.

During piloting, we quickly addressed this in two ways: changing the way we found farmers as well as the way we introduced our purpose and questions.

To locate farmers for our endline survey, we opted to bypass the implementing organization. Rather, we connected more directly with lead farmers and farmers, through one of three possible means, presented here in descending order of preference:

  • When we had farmer phone numbers and the calls went through, we directly asked farmers how to find their farms. This rarely happened.
  • If we did not have a working number for the farmer, then we called lead farmers to help us find selected farmers’ houses. Because lead farmers often serve as contacts for farmers through multiple organizations and agriculture programs, they did not know which specific program or training we might be asking about right away. Recall that during piloting, the implementer tagged along to interviews and we revealed up-front the training of interest, which identified untrained farmers quickly but led to other challenges. We wanted to be sure that, instead, we interviewed farmers alone. To avoid having the lead farmer present, we dropped off surveyors one-by-one at each house the lead farmer led us to, and then interviewed the lead farmer at his or her own house.
  • When we did not have a farmer phone number and when the lead farmer did not know the farmer (for example, some lead farmers trained whole farmer cooperatives and did not personally know all the farmers they had trained), we would go to the village and ask at least three people if they knew the name before resampling the farmer.

We made additional adjustments to how we introduced ourselves to farmers and who was present during the interviews, to mitigate pro-program, courtesy, and social desirability biases. When speaking with farmers, we returned to a plan we had considered pre-pilot, to frame the purpose of our survey to be about agriculture in general and did not mention the training specifically until the end, when we asked questions about the training itself. This way, farmers were forthcoming about if they did not receive or remember the training.

This approach still presented some challenges, including:

  1. Adding time per farmer sought and found. This took longer than using the implementing organization, because in many cases we needed to ask random people to find the listed farmers’ houses, and in some cases even the selected villages. However, since we had budgeted for additional time and expenses before field work began (expecting the unexpected)2,this was not an issue and we stayed within budget.
  2. Potential for mistaking identities. Without cross-verification from the lead farmer about farmers attending the training and the farmers’ names, it was possible that we would interview a farmer with a similar name who was not on the list (i.e., not reported by the lead farmer to have been trained). However, this concern was mitigated because we had been provided the name, village, sex, and age for most respondents. Villages are small enough that it was unlikely two people would match on all these characteristics.
  3. Continued challenges locating farmers: We could not locate roughly a quarter of farmers and needed to resample to reach our target sample size. This may have been a new source of bias: farmers we were unable to reach may be systematically different from those we did. However, we thought this new source of potential bias — if actualized — would implicate the results less than the bias from continuing to locate farmers through the implementers.

Time efficiency losses are worth it when the potential costs of bias could ruin your survey. Independent evaluators need to take implementers’ conflicting incentives into account when locating survey respondents.

Addendum: Context and Results

This survey took place in rural Zambia, in the context of a program encouraging improved post-harvest crop management practices. Farmers in Sub-Saharan Africa lose between 4–18% of their grain crop (varying by country and crop) every year during post-harvest processes, such as drying and storing. This directly leads to reduced income and lower food security for farmers. Improved post-harvest management techniques can mitigate these agricultural losses. As an example, many farmers in Zambia use open-weave grain bags to store harvested crops. Open-weave grain bags leave grain exposed to insects and moisture, leading to crop loss and rot. Hermetically sealed grain bags and silos, by contrast, store grain more securely and protect against spoilage, thereby benefitting nutrition and income.

Most farmers in Zambia have not been trained in post-harvest management — prior to training, less than 5% of farmers and only 10% of lead farmers were aware of hermetic storage. This is partly because agricultural extension is challenging in low-density contexts. In-person agricultural training in Zambia is costly because of low population density; the long distances extension workers must travel from one farmer to another within their assigned region impose significant costs. Our process evaluation focused on an alternative, cascading approach to delivering agricultural information in Zambia’s Southern and Central Provinces.

This model sought to trim extension costs by using a cascade model of training on post-harvest crop management. Our approach included observing trainings, pre-post tests around training, and quantitative surveys and qualitative interviews with lead farmers and farmers.

We selected a stratified random sample of lead farmers and farmers to survey, considering district and sex, from a sampling frame based on lists of trained farmers. These lists were generated by each lead farmer to register who attended their training; lead farmers provided the lists to the implementer, who then shared them with us.

We conducted rapid pre- and post-training assessments on lead farmers and farmers’ knowledge of key post-harvest practices and followed up with an endline survey about these practices and other outcomes of interests six months after the trainings. We found the trainings were, overall, successful in transferring knowledge to both lead farmers and farmers, but this knowledge transfer for the most part did not translate into changes in practice within six months.

  1. 1. Whether this is good or bad depends on your expectations — given the lead farmers were not being paid to train farmers and had incentive to overreport, most stakeholders thought this was a good number. In any case, if you are doing a study with a similar set-up, keep this inflation number in mind. The absolute lower bound on this number is about 55%, if none of the resampled farmers got trained. However, that the true percentage of farmers trained was this low is highly unlikely, even if the resampled farmers were systematically different from those we did find.
  2. 2. IDinsight guidelines state: “Start by calculating the number of surveys that you expect to be completed per day, then work up from there to determine how many days will be needed to finish data collection. Be extremely conservative in your time estimates. Budget sufficient time for transport to remote places, for finding respondents, and for identifying replacements if your selected respondents are not available. Don’t forget leave days and holidays. Include buffer days to account for contingencies, such as extreme weather events and times when enumerators [surveyors] are more likely to fall sick. Look for creative ways to save time without compromising quality or asking too much of your survey team. Also, come up with a plan to keep abreast of local issues, like possible strikes, etc.” We followed these guidelines for budgeting conservatively on FtMA, estimating the max number of surveys we could do in a day as 4 surveys per enumerator per day (6 days a week), and then cutting that number in half for our max budget.