DataDelta teammates during field work ©IDinsight/Jilson Tiu
Household surveys are an indispensable tool for understanding the conditions and experiences of families. One of the most challenging parts of a household survey is sampling households. Oversimplifying only a bit, there have traditionally been two main options for sampling in household surveys. The first option, which is expensive but rigorous, is to use what we refer to as “conventional two-stage sampling.” In this sampling approach, the researcher first selects areas (typically census enumeration areas), then conducts a complete household listing in each area, and then samples households from these lists. This approach is used for most large-scale, rigorous surveys like the Demographic and Household Surveys and Living Standards Measurement Surveys. The second option is to use one of the many “random walk” approaches to sampling households like the WHO’s “spin-the-pen” method or the “right-hand rule.” As with conventional two-stage sampling, the researcher first selects areas but then, within each area, selects households through some pseudo-random process rather than by first listing all the households in the area. These options are typically much cheaper than conventional two-stage sampling but, up until recently, there was little research on whether these results led to representative samples (and the few studies that did exist were not encouraging).12
As part of IDinsight’s DataDelta, we set out to answer the question: “Are there any alternative sampling approaches which are cheaper than conventional two-stage sampling but which also result in representative samples?” In addition, we set out to identify promising sampling approaches which could be used even in cases where conventional two stage sampling is not feasible. In our quest to find cheap, yet rigorous household sampling approaches we explored a range of different options including voter roll sampling, right-hand rule sampling, rooftop sampling, and grid-based sampling. With the exception of right-hand rule sampling, all of these approaches yielded promising results.
One of the first approaches we tested (even before the creation of the DataDelta initiative) was voter roll sampling. In India, voter rolls, with names and other personal information, are made publicly available and, in theory, voter rolls should include all adults who reside in the polling area regardless of whether they intend to vote. With voter roll sampling, a researcher first randomly selects polling stations and then uses the publicly available voter rolls to sample individuals. To test whether voter roll sampling results in representative samples, we conducted a complete listing of all households in 13 randomly selected polling stations across 4 states (UP, MP, Bihar, and Rajasthan). Our results showed that voter rolls are an excellent sampling method in rural India but not as great in peri-urban and urban areas. In rural areas, 96% of households have at least one registered voter but in urban areas only 78% of households have at least one registered voter. Further, excluded households were not systematically different from those included in the voter rolls overall or in urban or rural areas.
Bottom line: Voter roll sampling is a great option if you are doing a household survey in rural India. Voter roll sampling may also be a viable option in other countries with universal adult franchise and publicly available voter rolls, though the actual coverage of voter rolls in those places would need to be tested first.
We next turned to what is, most likely, the most commonly used alternative sampling approach: the right-hand rule. With the right-hand rule, surveyors select households by starting at randomly selected points on roads, following the road (turning right whenever possible), and surveying every fifth household on the right. Unfortunately, our testing found that the right-hand rule excludes many households, results in highly variable probability of selection, and is nearly impossible to replicate (i.e. two surveyors given the same starting point often end up surveying different households). In short, the right-hand rule leads to unrepresentative samples and biased results.
Bottom line: Don’t use right-hand rule sampling!
We next explored whether an excellent new dataset of building footprints created by Google and Microsoft could be used to reliably sample households via an approach we call “rooftop sampling.” The basic idea behind rooftop sampling is simple: within each of the areas you have sampled, you simply select a set of buildings from the building footprint dataset and survey the 4 or so households nearest the selected point (codebase available here). Using the same household listing dataset we generated to test voter roll sampling, we found that few households are excluded when using rooftop sampling, and while the probability of household selection varies somewhat from household to household this variance does not lead to substantial bias. Through on-the-ground field testing we also found that rooftop sampling is highly replicable (i.e. two surveyors given the same buildings end up surveying the same households). Due to the slight bias from the variation in the probability of household selection, rooftop sampling is not ideal for large surveys where precision matters more than cost but is a great option for smaller surveys where optimizing across quality and cost is the main concern. A working paper with our results can be found here.
Bottom line: Rooftop sampling is a great option for small to medium-sized surveys.
Finally, we explored and refined the use of an approach to sampling variously called “grid-based sampling” (GBS) or “gridded population survey sampling.” GBS is very similar to conventional two-stage sampling except that, in the first step, the researchers select grids (or clusters of grids) using a gridded population dataset rather than census enumeration areas. Just as with conventional two-stage sampling, within each grid (or cluster of grids), the researcher conducts a household listing and then randomly selects households from among these lists. Our original motivation for testing GBS is that it is feasible even in areas like urban India and the Philippines where granular census data is not publicly available.
We implemented GBS on several surveys and, in the process, developed code and processes to more easily implement GBS in practice. Along the way we discovered that, while very promising, GBS can be expensive to implement and thus best reserved as a method of last resort when conventional two-stage sampling is not feasible. Note that as GBS is more or less functionally equivalent to conventional two-stage sampling we didn’t assess GBS for bias.
Bottom line: Grid-based sampling is a great option when granular census data is not available and it is infeasible to conduct the conventional two-stage sampling approach.
We set out to answer the question: “Are there any alternative sampling approaches which are cheaper than conventional two-stage sampling but which also result in representative samples?” Our answer is, “yes!” and that surveyors need to choose wisely. We found that one common cost-savings approach to sampling, the right-hand rule, requires too many sacrifices on quality so we won’t use it. Voter rolls and rooftop sampling provide reliable, cost-effective alternatives to conventional sampling (though voter roll sampling has only been tested in rural India). For surveyors that want to use conventional two-stage sampling but cannot access census data, grid-based sampling is a good alternative.
We are happy to share more about how we developed and use these methods or put you in touch with the IDinsight expert on each method. Please reach out to Doug.Johnson@IDinsight.org if you’d like to learn more.
7 May 2026
23 April 2026
14 April 2026
9 April 2026
27 March 2026
17 March 2026
13 March 2026
12 March 2026
3 July 2025
8 May 2025
15 May 2024
3 May 2024