Using Machine Learning to Identify Out-of-School Girls in Rural India

A classroom in the Bhilwara district of Rajasthan.

A classroom in the Bhilwara district of Rajasthan.

Client: Educate Girls
Location: India, USA
Sector: Education
Dates of service:
IDinsight service: Data Analytics, Machine Learning
IDinsight contacts: Ben Brockman, Jeff McManus
Status: Active

The Problem

Currently, there are more than three million out-of-school girls in India. To address this challenge, Educate Girls (EG) works to locate these out of school girls, get them back into school, and provide remedial education. EG knows from its existing program sites that out-of-school girls are concentrated in highly disadvantaged clusters, with 95% of the out of school girls living in only 50% of the villages. Historically, Educate Girls has conducted door-to-door censuses across thousands of villages in entire administrative districts to identify target beneficiaries. However, this process has proven both time consuming and costly. IDinsight is partnering with Educate Girls to help more efficiently determine areas for programmatic expansion by using predictive “machine learning” techniques to better target the problem.

IDinsight is working with Educate Girls to help inform where it should expand its programs within Rajasthan, Madhya Pradesh, Uttar Pradesh, and Bihar. IDinsight’s evidence will inform a more targeted, cost-effective program scale-up.

Evidence Needs

IDinsight Service

IDinsight built and tested various machine learning models to determine an appropriate method to predict which areas had concentrations of out-of-school girls. We scraped publicly available census and village-level education data and merged this with Educate Girls outcome data for 7,796 villages in two states. After designing and testing various predictive models, we chose a random forest model to estimate the number of out-of-school girls in each of 210,000 villages in four states. We then built a clustering algorithm that utilized linear programming techniques to create operationally feasible, geographically concentrated clusters of priority villages for Educate Girls 2019 expansion.


IDinsight’s final prediction model allows Educate Girls’ to locate between 50 and 200% (depending on geographic scope) more out of school children for approximately the same operational cost per village.


Our evidence has helped Educate Girls increase the efficiency and cost-effectiveness of its process for targeting potential beneficiaries. The IDinsight team has made predictions for 220,000 villages across four states in North India that will inform the implementation of Educate Girls’ five-year expansion strategy. IDinsight and Educate Girls are now finalizing the batch of 1,800 villages for Educate Girls’ operations in 2019 using IDinsight’s targeting.