Skip to content

MOSAIKS: Making satellite imagery easy to use

Satellites abound in the space above the Earth. There are nearly 6000 satellites orbiting the Earth right now, and the exosphere is littered with the debris from defunct satellites. So many important aspects of everyday life for us humans on the ground rely on these metal contraptions in the sky: internet, television, navigation, science-fiction books, communication, weather forecasts, science-fiction movies.

These satellites have revolutionized so many sectors, and the development sector is no exception. Satellite imagery has been used to track land use and deforestation, monitor forest fires, predict natural disasters, and map urban growth. It augments available on-the-ground information, and provides new information about locations that are isolated or particularly hard for enumerators to reach to collect data. This has helped policymakers and other contributors in the development sector make informed decisions about how to allocate resources and improve efficiency and impact while pursuing SDGs.

Challenges with using satellite imagery

Satellite imagery, while hugely informative, is not easy to work with. This is high volume data, which poses multiple challenges for storage and analysis. The images are also complex – they contain multiple bands i.e. pictures taken using light from different parts of the visible and invisible spectrum; they can be hazy due to cloud cover or other weather events. Using and extracting meaningful information from these images is a real challenge.

Recently, satellite images have become accessible: Microsoft Planetary Computer, Google Earth Engine, Radiant MLHub, EarthCache and National Environmental Satellite, Data, and Information Service all provide services to access and analyze satellite imagery data. Advances in machine learning have also made it easier to deal with the complexity of the satellite images. Nevertheless, the use of satellite data is still out of reach for many players in the development sector because of the costs associated with accessing the data, and the compute power and expertise required to analyze them.

Making it easy to analyze satellite data

Esther Rolf and colleagues have figured out how to make combining satellite imagery with machine learning easier. Their method, MOSAIKS, allows any user to extract information from satellite images simply by choosing a geographical area in which they are interested. This information (or features) can then be used for downstream analysis e.g. predicting forest cover or human-development indices, without any further processing! For more information on how the method works and the tool we built, see our post on the tech blog. This method works just as well as training a machine learning model from scratch and users do not need any domain knowledge or theoretical know-how: they simply input GPS coordinates and get features. currently provides free access to pre-computed features from geographies across the globe based on images from 2019. The features are calculated from images at a 5m resolution (i.e. each pixel in the image represents an area of size 5m x 5m) and aggregated to a 1km resolution, but coarser resolutions are available if you are interested in features for larger areas representing, for example, administrative blocks. This service, however, provides features from only one satellite and requires manual upload of a list of GPS coordinates for the 1km resolution features. 

We extend this package in several ways: We built out the interface to allow users to flexibly configure the satellite to pull images from, the resolution of these images and the time period they are interested in. We included more flexible ways to input data and store the resulting features. We also generated features for IDinsight’s key geographies (India and Kenya), that we can reuse without any further computation or processing for supporting future project work and data science experimentation internally. Finally, we found ways to speed up the satellite image-processing so that the entire pipeline runs 10x faster than the original package.

How well does our tool work?

We tested out the package to generate features from satellite images of urban and rural India. We then used these features to train a regression model to predict the poverty rate at these locations, also using poverty rate data from the 2011 census. Our model provides reasonably good predictions of poverty rates when we compare against 2011 ground-truth data. But more importantly, it can provide estimates of poverty rates in 2021, just using features from 2021 satellite images. While this is just a proof of concept, we can imagine that this kind of predictive analysis would be useful for policy-makers or NGOs looking to allocate resources or expand program implementation for poverty reduction.1

Poverty rate predictions using MOSAIKS features. We used on-the-ground data from 2011 (left) to train a model using MOSAIKS features to predict poverty rates (center; yellow indicates high poverty rate, purple indicates low poverty rate). We then reused the model to predict poverty rates for 2021 using features from 2021 satellite images (right).

More broadly, as satellite imagery becomes an increasingly important resource in the development sector, we believe that this package could be a key component for IDinsight’s work with conducting surveys, pinpointing at-risk populations, and many other projects. As a start, it can help drive down survey costs and staff time: for example, we can augment available census data with satellite image features, instead of conducting expensive and time-consuming on-the-ground surveys. It can help efficiently scale up projects, with potentially no additional operating costs: for example, we might be able to get population density estimates in a new region from satellite images, and use that to plan for field operations, in place of more resource-intensive scoping methods. In the long term, by open sourcing this package, we believe we can help development sector players scale up their own operations and optimize how they use the resources available to them.

Altogether, we expect that our tool will help drastically increase the use of satellite imagery in international development, leading to greater impact and stronger data-driven policies, on the back of these orbiting metal contraptions.

  1. 1. In practice, we would have to collect on-the-ground data to verify the model predictions for 2021.