©SpaceX via Unsplash
Satellites abound in the space above the Earth. There are nearly 6000 satellites orbiting the Earth right now, and the exosphere is littered with the debris from defunct satellites. So many important aspects of everyday life for us humans on the ground rely on these metal contraptions in the sky: internet, television, navigation, science-fiction books, communication, weather forecasts, science-fiction movies.
These satellites have revolutionized so many sectors, and the development sector is no exception. Satellite imagery has been used to track land use and deforestation, monitor forest fires, predict natural disasters, and map urban growth. It augments available on-the-ground information, and provides new information about locations that are isolated or particularly hard for enumerators to reach to collect data. This has helped policymakers and other contributors in the development sector make informed decisions about how to allocate resources and improve efficiency and impact while pursuing SDGs.
Satellite imagery, while hugely informative, is not easy to work with. This is high volume data, which poses multiple challenges for storage and analysis. The images are also complex – they contain multiple bands i.e. pictures taken using light from different parts of the visible and invisible spectrum; they can be hazy due to cloud cover or other weather events. Using and extracting meaningful information from these images is a real challenge.
Recently, satellite images have become accessible: Microsoft Planetary Computer, Google Earth Engine, Radiant MLHub, EarthCache and National Environmental Satellite, Data, and Information Service all provide services to access and analyze satellite imagery data. Advances in machine learning have also made it easier to deal with the complexity of the satellite images. Nevertheless, the use of satellite data is still out of reach for many players in the development sector because of the costs associated with accessing the data, and the compute power and expertise required to analyze them.
Esther Rolf and colleagues have figured out how to make combining satellite imagery with machine learning easier. Their method, MOSAIKS, allows any user to extract information from satellite images simply by choosing a geographical area in which they are interested. This information (or features) can then be used for downstream analysis e.g. predicting forest cover or human-development indices, without any further processing! For more information on how the method works and the tool we built, see our post on the tech blog. This method works just as well as training a machine learning model from scratch and users do not need any domain knowledge or theoretical know-how: they simply input GPS coordinates and get features.
MOSAIKS.org currently provides free access to pre-computed features from geographies across the globe based on images from 2019. The features are calculated from images at a 5m resolution (i.e. each pixel in the image represents an area of size 5m x 5m) and aggregated to a 1km resolution, but coarser resolutions are available if you are interested in features for larger areas representing, for example, administrative blocks. This service, however, provides features from only one satellite and requires manual upload of a list of GPS coordinates for the 1km resolution features.
We extend this package in several ways: We built out the interface to allow users to flexibly configure the satellite to pull images from, the resolution of these images and the time period they are interested in. We included more flexible ways to input data and store the resulting features. We also generated features for IDinsight’s key geographies (India and Kenya), that we can reuse without any further computation or processing for supporting future project work and data science experimentation internally. Finally, we found ways to speed up the satellite image-processing so that the entire pipeline runs 10x faster than the original package.
We tested out the package to generate features from satellite images of urban and rural India. We then used these features to train a regression model to predict the poverty rate at these locations, also using poverty rate data from the 2011 census. Our model provides reasonably good predictions of poverty rates when we compare against 2011 ground-truth data. But more importantly, it can provide estimates of poverty rates in 2021, just using features from 2021 satellite images. While this is just a proof of concept, we can imagine that this kind of predictive analysis would be useful for policy-makers or NGOs looking to allocate resources or expand program implementation for poverty reduction.1
More broadly, as satellite imagery becomes an increasingly important resource in the development sector, we believe that this package could be a key component for IDinsight’s work with conducting surveys, pinpointing at-risk populations, and many other projects. As a start, it can help drive down survey costs and staff time: for example, we can augment available census data with satellite image features, instead of conducting expensive and time-consuming on-the-ground surveys. It can help efficiently scale up projects, with potentially no additional operating costs: for example, we might be able to get population density estimates in a new region from satellite images, and use that to plan for field operations, in place of more resource-intensive scoping methods. In the long term, by open sourcing this package, we believe we can help development sector players scale up their own operations and optimize how they use the resources available to them.
Altogether, we expect that our tool will help drastically increase the use of satellite imagery in international development, leading to greater impact and stronger data-driven policies, on the back of these orbiting metal contraptions.
13 September 2024
6 September 2024
2 September 2024
20 August 2024
15 August 2024
13 August 2024
11 July 2024
7 July 2024
4 July 2024
22 November 2022
7 June 2019
29 September 2021
25 February 2022