Skip to content

Voter rolls can be cheaper and faster for sampling in India. But are they accurate?

New data can help researchers better weigh the cost savings and exclusion risks when choosing between voter rolls and door-to-door methods for creating sampling frames. Read the working paper here.

Sign reads, “Polling station number 68” in Madhya Pradesh, India ©IDinsight/Ruchika Joshi

Anyone who has tried to conduct a sample survey will attest to just how expensive and cumbersome it can be to construct a sampling frame through the household listing. Surveyors go door-to-door, making a comprehensive list of all the households within the area from which they can draw a probability sample. Alternatives to household listing such as “spin-the-pen” may be cheaper and faster but are prone to bias.

IDinsight’s Data on Demand (DoD) team has been experimenting with methodological innovations so that we can provide government and non-profit decision-makers with high-quality data at a fraction of the cost, within a matter of days. Among these, one set of experiments focuses on examining alternative sampling methods for constructing household sampling frames.

Based on our internal estimates, we found that using voter rolls can be less than one-sixth of the cost of using traditional household listings to construct sampling frames. But evidence on their accuracy and completeness is limited. Given this tension, we assessed whether publicly available voter rolls in India are suitable households sampling frames.

If you were to guess how representative India’s voter rolls are of the ground-truth reality, what would you guess? Fifty per cent? Eighty per cent? Would you guess that they probably exclude marginalized groups and thus cannot be relied upon?

When we compared the ground-truth household listings, what we found was surprising. The rolls include 91 per cent of the households found in the ground-truth household listing, although coverage is lower for urban areas. Further, marginalized groups do not appear to be systematically excluded from voter rolls. In sum, sampling from voter rolls can produce estimates of household-level economic variables with little bias, especially in rural areas.

Below we describe the context, method, and results from our experiment in more detail.

In India, researchers across disciplines have turned to voter rolls for sampling. Broadly, the idea is to select voters from the voter rolls of the sampled areas, instruct surveyors to find the household to which the sampled voter belongs, and include it in the household sample to survey.

Since the Indian Constitution guarantees every citizen above the age of 18 the right to vote, in principle, the list of voters must cover every voting-age individual in a given area, and should therefore serve as a complete sampling frame of adults. In practice, however, voter rolls may fail to comprehensively cover certain population groups like womenMuslims, the urban poor, and migrants, which can bias estimates obtained from voter roll samples.

To examine its implications for using voter rolls as household sampling frames, we conducted a household listing in 9 villages and 4 urban wards comprising 7,769 voting-age adults across four states: Bihar, Madhya Pradesh, Rajasthan, and Uttar Pradesh. We then matched eligible voters identified through the listing process with voters found on the voter rolls of the same area. This involved a combination of matching individual voter ID numbers (for those who provided a valid number) and “fuzzy matching” individuals (for those who did not) on names and other information like gender, age, marital status, and house number.

Using our customized matching algorithm, we calculated household exclusion based on the percentage of households in our listing that did not include even one individual who matched to an individual on the corresponding voter roll of their polling station.

Overall, voter rolls include 91 per cent of the households we found in the ground-truth census. Coverage is significantly higher in the 9 rural polling stations (96 per cent) compared to the 4 urban polling stations (78 per cent).

Figure 1: Household match rates by Polling Station

Inclusion in voter rolls does not appear to vary by a household’s religion or socioeconomic status, though there is some evidence that wealthier, higher-caste households in urban areas are slightly more likely to be excluded. In the paper, we speculate that this might be the case because the poor, and those belonging to more marginalized groups, are more likely to depend on voter registration cards for accessing a host of government benefits.

Figure 2: Household match rates by demographic group

While our focus was primarily on using voter rolls for sampling households, there may be circumstances where researchers want to use voter rolls as individual sampling frames. For such cases, we examined individual match rates by age-gender cohort, separately for urban and rural polling stations. Across subgroups, younger individuals have lower match rates, with rates rising with age and plateauing around 30 years old.1 Young women are less likely to be listed in voter rolls than similarly-aged men — a problem that is exacerbated in rural areas and is possibly driven by migration for marriage

Figure 3: Differences in individual match rates by demographic group

In our working paper, we also show that sampling from voter rolls can produce estimates of household-level economic variables with relatively little bias as compared to sampling from a rigorously generated household listing, especially in rural areas. Across a range of demographic and asset indicators, estimates derived from voter roll samples are less than 2 percentage points different from the ‘true’ values from our household census.

Although our results are not statistically representative of all polling stations in the selected states, we do find indicative patterns in our data that can help researchers better weigh the cost savings and exclusion risks when choosing between voter rolls- and household listing-based sampling methods.

Overall, we recommend using voter rolls as household sampling frames in rural areas. Not only is the exclusion rate as low as 4 per cent in rural areas, but household sampling frames based on voter rolls also appropriately represent marginalized groups. Given the tremendous time and cost savings over the traditional household listing, voter rolls as sampling frames hold promise to make routine data collection more feasible for policymakers, thus expanding the scope for more evidence-informed decisions.

However, in urban areas, where we found higher and variable exclusion rates, researchers should exercise caution. For such areas, we recommend researchers first do a pilot ground-truthing exercise to verify that exclusion error is appropriately low in their sample polling stations and also supplement voter rolls with other available data to ensure a complete sampling frame. Where polling station boundaries are ambiguous, researchers may also consider sampling at the level of the polling booth (which would include several adjacent polling stations in a densely populated area) instead of at the level of the polling station.

Finally, we do not recommend voter rolls for sampling individuals. Since up to 26 per cent of individuals may be listed under different names in voter rolls or outright excluded from them, sampling individuals from voter rolls will result in biased population estimates.

We discuss our methodology and results in more detail in our working paper. For researchers looking for practical suggestions on using voter rolls for sampling, we also provide an FAQ guide that can help you get started!

  1. 1. Match rates start to fluctuate substantially above 70 years old, reflecting the small sample size for these gender-age cohorts.