Blog

How data science can improve lives

Emily Coppel , Sid Ravinutala 10 April 2021

This Q+A with IDinsight’s Data Science co-lead, Sid Ravinutala, shares how his team is using data science to help decision-makers improve their policies and programs.

Photo by Chris Ried on Unsplash

Last week, I scheduled a Zoom call with Sid Ravinutala, IDinsight’s Director of Data Science, to help me understand how data science can be used to address social challenges in the Asian and African countries in which IDinsight works. He patiently walked me through some promising projects within IDinsight and beyond. I was left with a stronger sense of how it can help leaders make better decisions — whether allocating resources or implementing a program within their financial, time, or other constraints. Below is an edited version of our interview.

Emily: Can you talk a little bit about some of the most exciting use-cases of data science in global development? Where do you see the most promise?

Sid: I tend to think about data science in three components.

The first is prediction work. If you do a data-science 101 course, that’s what it’s focused on. For example, you might wish to predict which villages have a lot of out-of-school girls; or predict when you might stock out of drugs; or even predict which programs a beneficiary might be eligible for — which is actually something one of our clients is looking at.

The second component is modeling (or inference). Essentially, we want to understand a complex system a little bit better using modeling tools. For example, you could use models to understand the spread of the pandemic in your state. Modeling is useful not just in generating predictions but also in understanding what’s going on, what state we are in, and how the model — the world — is changing over time.

Then the last category is optimization. We often have limited resources or operational constraints within which we have to maximize our objective — these may be even more binding in the development sector.

For example: if I have ten social workers and there is a population of 100 families that they serve, how do I optimally allocate them for maximum coverage?

The social workers have time constraints i.e. the time it takes for them to visit and travel in between families. Without machine learning, some social workers might be over-burdened while others are under-utilized.

Many machine learning methods require a lot of data for prediction models; data that isn’t always available, especially in the development context. The reason I’m excited about optimization is that it doesn’t require us to have a ton of data. We can simply use administrative data, which already exists, to develop these optimal systems. We only need to articulate our constraints and objective. In our simple example, we’d need to know where the families are, where the social workers are, travel times, and estimates of visit length. That’s it.

There are a lot of innovative applications of prediction models being explored, and I think that’s going to continue. But the other two: modeling and optimization, are often under-utilized. Data science can greatly increase the impact of programs with these methods.

Emily: Which projects are you working on now that you’re most excited to see unfold, or that will have implications for other actors out there?

Sid: I mentioned optimization — we have a few projects in this space that are exciting. For example, we will be working with a country government that wants to increase access to schools for their rural population. Currently, they allocate funds annually that can be used to either build dormitories next to schools or create new bus routes. There is some combination of bus routes and dorms that will maximize the number of students in rural areas who can access a school. This is an optimization question.

What’s more exciting is that this is a process that they’ve been undertaking for many years and we will just be using analytics to help them do it better.

Data science is most effective when it informs decisions already being made.

A technology solution that requires a ton of operational, personnel, and workflow changes can be very hard to scale. But being able to say, ‘Hey, you are already doing this, let’s just help you do that better,’ is very exciting.

We’re also working with Praekelt in South Africa who have an online tool called “HealthCheck.” It’s a mobile-optimized survey you take that tells you whether any symptoms you’re presenting require you to isolate or get a COVID-19 test. We are working with Praekelt to use this data to support government efforts to build better early warning systems for COVID. For example, an uptick in people reporting a sore throat or cough in a geographic region could be an initial indication of a COVID outbreak or hotspot. Such an early warning system may allow the government to respond faster and people to take greater precautions.

Emily: Are there other applications from outside of the development space that you are pulling from for inspiration or other things you’ve seen out there that could be influential to our work maybe down the line?

Sid: There’s a part of me that studies methods for the sake of methods because they’re beautiful and enjoyable. When studying these, utility is not always in the back of my mind, though often they end up shaping how I think about a problem down the track. But there are a couple of things happening in the data science world that are very promising.

One is Natural Language Processing (NLP). You may have come across GPT3 — it received a lot of hype. It’s able to write poetry, write code for you, have a conversation with you — and there is an API. So you can play with models that OpenAI spent millions of dollars to train. At the moment it’s not that useful apart from experimentation, but it has a lot of promise.

Aside from bleeding-edge artificial intelligence like GPT3, NLP has some very practical use cases in global development. A lot of the organizations we work with have a huge beneficiary base and the only way they interact with them right now is through phone calls. Having a chatbot system might be a good use of Natural Language Processing. There are other barriers to adoption like internet penetration and computer literacy; so widespread usage might be a few years away.

Emily: You have a blog where you offer up some of your work open-source. Where do you think there are opportunities for collaboration and sharing?

Sid: My blog has been a commitment device for me, it keeps me honest — it makes me carve out time for learning and experimenting. The fact that it is read and is helpful to others is a nice side effect.

At IDinsight we depend so much on the open-source software (OSS) and so it’s our responsibility to contribute to it as well.

We are doing that with our COVID diagnostics tool, SurveyCTO python package, and hopefully our work with Educate Girls. We are also allowing time for data scientists to contribute to OSS.

But beyond code, we can “open-source” methods and learnings as well. How do we think about a commonly faced problem in international development? How do we structure it to make it amenable to study? What are the different methods and solution architectures we tried? What tooling or tech stack did we use and why. In addition to organizations like Facebook and Uber, you’ll notice a lot of e-commerce companies like Stitch Fix and Wayfair building such technical blogs. I would love to grow a similar culture of sharing within the development sector.

Emily: Are there other areas of work at IDinsight that show promise?

Sid: One is our learning partnerships. These excite me the most because one of the hardest things is identifying the questions we need to answer. Being embedded within an organization makes us uniquely positioned to identify these. By getting to know an organization well, we are able to help them articulate their binding constraints. We are also better positioned to develop a solution that integrates well with their operations.

Second, is our Data on Demand initiative. In machine learning, one of the toughest things is getting ground-truth data — especially for developing countries. Say you want to build a model that uses satellite imagery to predict the income level in a village. You’d need to actually have income data for some set of villages to train the model. IDinsight’s rapid data collection initiative, Data on Demand, allows you to collect this sort of data pretty quickly. Getting ground-truth labels for machine models may not have been what the system was originally designed for but, as a data scientist, it’s hard to not get excited about clean labelled data!

Thanks so much for talking with me. Really looking forward to all that’s ahead.

Sid Ravinutala co-leads the data science capability at IDinsight. He is passionate about building bespoke algorithms to inform policy and address operational challenges in the development sector. Read more about Sid here and visit his blog here.