Skip to content
Blog

Grid-based sampling in practice

Understanding the on-ground challenges of grid-based sampling

In an earlier blog post, we described grid-based sampling (GBS) and its theoretical advantages over conventional sampling. How does GBS work in practice? Over the past year, we have piloted GBS for representative surveys in India and the Philippines and an impact evaluation in Zambia. While we are still cautiously optimistic about the potential for GBS to replace conventional sampling over the longer term, we found that implementation was much harder and the costs much higher than anticipated. GBS is a great option when granular census data are unavailable, but we recommend exploring other sampling methods, which could be a better option in some cases. In this blog post, we discuss four challenges our teams encountered while implementing GBS:

Challenge 1: Maps were less accurate than we anticipated

With GBS, accuracy of gridded population maps is vital for operational efficiency and valid results. We weren’t quite sure what to expect when it came to map accuracy. To our knowledge, only two researchers have attempted to estimate the accuracy of gridded population maps using actual ground truth data, and the studies were in locations (Sweden and Equatorial Guinea) not very similar to ours. 

In urban Telangana state in India, where we used population maps provided by Mapsolve, the correlation between estimated population and true population was .32 (though this is likely a significant underestimate of the true correlation because of how we determined which buildings lay within grids).

In the Philippines, for the first round of data collection we used population estimates from Meta and found that the correlation between estimated population and true population was only .21. This estimate also doesn’t account for the fact that, prior to surveying, we removed several grid clusters which were obviously empty based on an inspection using Google Maps. See Box 1: Why Good Maps Matter.

In a forthcoming working paper, we show that the relatively low accuracy of publicly available population maps means that, on average, GBS is less efficient than conventional sampling using census data, even if the census data is old and out of date. 

Challenge 2: Grid borders may not fall on natural geographic features

A second challenge of GBS is that grid borders don’t necessarily fall on natural geographic boundaries like streets or rivers. This makes it difficult and potentially ambiguous to determine whether homes that fall on or near the edges of the grids are in or out of the grids. For example, in the image of a grid from the Philippines below, it is hard to discern whether the houses near the grid border are in or out of the grid. And, if the grid is split by a major geographic feature, such as a highway, this can make survey operations more difficult than expected. 

Challenge 3: In areas with low population density, grids must be clustered, which can be challenging

For our survey in urban Telangana, we randomly selected individual 30×30-meter grids. The estimated population for the sampled grids ranged from about 3 to 15, just about the perfect size for sampling since we planned on sampling four households from each grid.

In the Philippines, with its lower population density, sampling individual grids would have resulted in many empty grids and a fortune in travel costs. Instead, we first grouped grids into clusters. This proved more challenging than we anticipated. Our first attempt to cluster grids, based on a “greedy” clustering algorithm, resulted in messy, awkward clusters composed of often non-contiguous grids. (See first image below). Our second attempt to cluster grids, which used the more sophisticated weighted k-means algorithm, was much more successful but computationally intense and required some sophisticated data science and engineering to implement at scale. (For code to implement the second clustering algorithm at scale, see here.)

Example of a cluster created using our first “greedy” algorithm.

Example of a cluster created using our algorithm

Challenge 4: Using maps offline is technically challenging

Google MyMaps is an effective and simple tool for displaying grid boundaries to enumerators in the field. Without internet access, use of Google MyMaps can be challenging. If an enumerator exits out of the app or survey, they may not be able to revisit the maps until they have internet access again. In Zambia, where internet access was not reliable, we used a solution based on SurveyCTO and printed maps, adding to the cost and complexity of sampling. 

GBS – an Ok solution that could be great 

Despite these challenges, GBS is still a good (and probably the best) option if surveyors want to do two-stage sampling with household listings when granular census data is unavailable. And with some smart investments (nudge to any funders reading this), all of the problems listed above could be solved. In fact, the issue of clustering grids is largely already solved, though it does take some coding skill to implement the solution. A small additional investment could make it so that even smaller organizations with no in-house data scientist could tackle these challenges. And the technical challenge will likely grow smaller as more and more places have mobile wireless coverage.

Low population map accuracy and grid borders will be tougher to solve but are still very much tractable. Let’s take the issue of low accuracy of population maps to start. Krishna

Balakrishnan, Chief of Research at MapSolve AI and one of the leading experts in the field of gridded population estimation, believes that by using the better algorithms and processes the correlation between estimated population and true population at the 30m cell level can be increased to approximately 0.5 to 0.6. If map accuracy were to increase to roughly .6, back-of-the-envelope calculations suggest that we would need roughly 18% fewer clusters, translating into more or less an 18% reduction in survey field costs.

Similarly, to tackle the challenge of grid borders, we believe that with some clever coding, OpenStreetMap data could be used to create geographic units with natural geographic boundaries rather than grids. 

If these challenges are overcome, GBS could be a fantastic solution. We estimate that it would be significantly cheaper than conventional sampling before taking into account the cost of obtaining and wrangling census data.