Audio

(Audio) Building data science and monitoring system solutions to social challenges

Ben Brockman , Eric Dodge , Sid Ravinutala , Emily Coppel 28 November 2022

In this podcast, IDinsight’s Director of Strategic Communications sits down with the leaders from the organization’s Data Science, Engineering, and Monitoring team to discuss their career trajectories, innovative projects, and what they’re learning from working with government and non-profit partners across Africa and Asia.

Full transcript of the interview:

Emily: Welcome everyone. Thank you so much for taking the time to speak with me today. My name is Emily Coppel. I’m the Director of Strategic Communications at IDinsight, and today I am joined by IDinsight’s Data Science, Engineering, and Monitoring Systems (DSEM) team. We will discuss a bunch of different things, including some of the work they’re currently doing and their career paths.

We’ll dive into all the challenges and opportunities of using data science and monitoring systems to address social problems. So, without further ado, it would be great to have you all introduce yourselves.

Sid, could you kick us off, introduce yourself, and share a little about what you do and how you got into this line of work?

Sid: Thanks, Emily. A pleasure to be here.

So, how I got into this work, the narrative I tell is there have been two forces in my life. One has been the need to do very technical work – my undergraduate was in electrical engineering and computer science, and development economics for my grad school. The second has been doing meaningful work, and finding the intersection of those two has defined my career. If you look back on my jobs, it’s been volunteering in Ghana or working then in Papa New Guinea, and not necessarily doing the most technical work, but then switching to working in the private sector, where I got to exert that muscle of mine.

IDinsight seems like a combination of that to some extent. I get to do deeply technical work, which excites me, and at the same time, when the week is over, I get to look back and be proud of what we’ve achieved. That’s generally been how I got here.

In more practical terms, as I mentioned, I have a computer science and electrical engineering degree. In the early part of my career, I was doing a lot of technology consulting, and experiences of volunteering and living in other countries got me to development economics, and there I realized there was this thing called data science. And some of the stuff I was doing could easily be marketed as data science. So, that’s when I did a pivot and worked in a few companies that were doing data science.

I had been toying with the idea of joining IDinsight for how many? Ten years. And two years ago, I finally joined IDinsight. So, excited to be here.

Emily: Thank you so much, and we’re excited to have you. It sounds like a long cultivation process to get you to join our team.

Next, over to you, Eric. It would be wonderful to have you introduce yourself and share a little bit about what you do at IDinsight and your background.

Eric: Thanks, Emily. My name is Eric Dodge, I’m a Director on the DSEM team here at IDinsight, and I lead our engineering and monitoring systems work. The bulk of what we do is building monitoring systems, which are decision-focused data systems for our clients.

My origin story is that I have a similar background to many folks at IDinsight, where I came up through the development economics/academic research world. I worked at Poverty Action Lab in India, and I was in the policy world for a while.

What I really saw being in that space is that there were a lot of people who were thinking hard about data use in the social sector and what good data use looked like, but there was less attention paid to how do you actually build robust software systems that enable good data use.

As I became increasingly interested in designing dashboards and interfaces, I realized that one of the big reasons there wasn’t more of this stuff out there was because there weren’t enough people in the space thinking about how you build the software to enable it. So, at that point, I pivoted and did a lot more work on the data engineering/backend engineering side of things and then eventually made my way from academia to IDinsight to do more of what I would say is applied practical work.

Emily: Really interesting and kind of hits at the heart of what IDinsight does of these aspirations around data use versus where the rubber really meets the road in terms of what’s possible.

Interesting to hear about your background and Ben, our most tenured colleague, on the call. Could you introduce yourself and share a little bit about what you do and how you got to where you are today?

Ben: Thanks, Emily. A pleasure to be here. My name is Ben Brockman, I am a Director alongside Eric and Sid on our Data Science, Engineering, and Monitoring Systems team, and I’m really excited to talk a bit about our work on the team today.

As Emily alluded to, I’m an IDinsight lifer. I joined in 2012 as the second employee, when there were six of us sitting around on cardboard boxes as chairs and using makeshift desks for our first couple of projects in true startup fashion. In the first few years of IDinsight, I worked on what we would now call our client-facing side of the shop, which was our core service around delivering decision-focused evaluations primarily but not exclusively randomized control trials (RCT) to nonprofits and policymaker partners across Asia and Africa.

I spent a few months running IDinsight’s first set of RCTs in Cambodia, focused on sanitation. Then spent two years in our nascent Zambia office in Lusaka, working with the Clinton Health Access Initiative and Ministry of Health, and then spent a year in India working out of our office there on educational technology and some vocational training work back when our office was in Patna, in Bihar. I also spent a little bit of time in Delhi. I then, like some of the other folks on the podcast today, went to the Kennedy School and did a two-year master’s program where I got interested in this thing called data science, which was becoming sort of all the rage in the analytics and policy space.

I viewed data science as an intersection of the methods IDinsight was already using and new methods that I thought IDinsight could benefit from. After finishing grad school in 2017, I ended up actually coming back to IDinsight to explore the possibility of setting up a team more focused on data science exclusively. Predominantly to start machine learning, but as you’ll hear a bit about later today, I suspect expanding into more techniques like natural language processing, optimization, and more. I have spent the last five years building out that team with Eric and Sid, and I look forward to sharing more about that today.

Emily: So we’re all really here because of you, Ben, and your vision for this team’s growth. Thank you for that overview.

I think as I’ve had the opportunity to work with all of you, it continually strikes me how difficult this work is, how hard it is to take social sector problems that come across our desk from a really wide variety of different partners and right fit solutions for them. And yes, that’s something we’re doing across IDinsight, but especially with applying data science and figuring out what the right monitoring systems are for clients, it can be incredibly challenging.

In order to kind of ground this discussion in examples for those who may be listening, I’m wondering if you can talk about a few cases of projects that we’ve done. And perhaps in sharing those examples, you can structure it in terms of what was one of the challenges this partner was having and how we worked to identify a solution for them. Because I know that these are very layered interactions and there’s a lot to share, but I think that kind of problem approach/solution can be a helpful way for us to think about some of this work.

So, with that, Ben, maybe you can start talking a little bit about a project you’ve worked on, what the client’s problem was, and how we worked to address it.

Ben: I can start with appropriately the first project that we did on the data science side of the team with an organization called Educate Girls.

This project, even though it was a data science-focused one, grew out of our non-data science work where we had run an impact evaluation with this non-profit Educate Girls that works to locate, enrol and do remedial education work for out-of-school girls across India.

When we started working with them, they had been operating for around ten years, and we had done an impact evaluation as a part of one of the first global development impact bond – this performance-based financing mechanism to tie results funding to rigorous impact evaluation results. We found really positive results for Educate Girls. Showing that they were successful in not only enrolling out-of-school girls but really keeping them enrolled in school and improving their educational outcomes. After these evaluation results, they were starting to think about what came next. They were starting to explore getting scale-up funding to take this proven model and expand it to another 10 or 20,000 rural communities across India, looking to enrol millions of out-of-school children.

We happened to have a conversation about the key problem they were facing, and at that time, it was where they should expand next. They had historically worked in one corner of Rajasthan, in areas they knew through word of mouth, and had many out-of-school children, but they were trying to make their approach a bit more rigorous. They were starting to use some administrative data, but they didn’t have a great sense of where to find those out-of-school girls and where to expand their program. After we discussed the problem for a little bit, we realized that this is pretty similar to a traditional supervised machine learning approach, and what that really means is it’s using algorithms to learn a pattern and then extrapolate that out into new cases, where you can’t directly observe the outcome you care about.

In this case, Educate Girls had something like 10 or 15,000 villages that they had already done a census to understand how many out-of-school children there were in those villages. And we threw administrative data could find a bunch of information about those villages – how many people lived in them, poverty levels, literacy rates, etc., that might help you predict how many out-of-school girls there were.

We were able to then train algorithms to try to understand what those relationships were. But, the real power came when we could then apply these algorithms to the administrative data for another 200,000 villages that Educate Girls hadn’t yet expanded to but was considering expanding to and really make a forecast as to where we thought they would get the most bang for their buck in terms of program expansion.

And happy to report that when we did our 10-year impact report a few months ago, this is one of IDinsight’s highest impact projects we’ve done today. Our back-of-the-envelope estimates are that we will enrol somewhere between 400 and 500,000 extra out-of-school girls against the counterfactual of how they used to run their programs for roughly the same cost that they would have faced otherwise, which really represents a tremendous impact.

So, really excited to be able to highlight that as one of our first data science use cases, and happy to talk about some of the others if we have time.

Emily: Thanks, Ben. I think it’s really helpful to hear how you often addressed a really common problem, especially that NGOs face of, you know, they’re familiar in one district of what the challenges are, but don’t quite know how to expand or where or how to optimize their program to reach as many people as they can for the resources they have. So, really helpful to hear how you approached that with Educate Girls.

I’m going to turn to Sid next. Could you share details about a project that you’ve recently worked on? What were some of the challenges that the client/partner was grappling with, and how did we work with them to address it?

Sid: It’s great that you started by mentioning “right fit” because this project is a great example of where “right fit” was very important.

We are currently working with an organization called Praekelt, and one of their flagship programs is called Mom Connect. They run a hotline for new and expecting mothers to ask questions, seek new information, ask about their appointments, etc. It’s a whole bunch of services, and they have a huge uptake. Ben could correct me if I’m wrong, but it was about 8 million active users on their platform. So a huge audience that this reaches out to.

One of the problems they were having was that they had a small help desk. It was about three people answering thousands of questions coming in from others, which caused two things. One is that urgent questions were not getting answers, they would get buried under this huge pile of questions, and two is, as you would imagine, these three help desk operators will probably get to maybe a hundred odd questions and would never get to the bottom of the pile. So, a large number of questions were just not being answered, which erodes confidence in the platform as well, apart from having serious public health implications. If a mother asks questions about a child’s health, not answering those has real-life implications.

Coming back to your question about “right fit.” Praekelt had tried to implement natural language processing, which is just understanding human language using computers. They had tried to implement a solution like that in the past with engineers/volunteers from Silicon Valley. For various reasons, they never made it to scale, as the methods used often might have been too complex or the maintenance was not exactly right.

Our value add with Praekelt was not necessarily in the latest and the greatest and the cutting edge of natural language processing, but in right sizing the solution. We started with an analytically very simple algorithm but built an end-to-end solution. We built a solution that answered questions by matching questions from others to a database of government-approved FAQs. That was the first thing that we built. The algorithm in the middle was quite simple, we went with something off the shelf, and over time we’ve iterated that.

The reason we’re still working with Praekelt is due to the ways we work – not build and deploy, but rather iterate and work in these cycles. We have got an end-to-end solution using something very simple that fit their purpose. And over the last few months, we have improved performance on this by working with them and making sure we’re building tools to help them support this better, help the engineers manage this solution better. And then, in terms of pure performance of how many questions to be answered correctly, we’ve been building and improving the engine to do that.

This has been, and we can talk a little bit more about right fit, but this has been a great example of starting simple, building end to end, then iterating and improving over time.

Emily: Helpful to hear, and I have one follow-up question for anyone who may be listening and who’s new to understanding natural language processing. You’re essentially trying to take a question submitted through WhatsApp, like “is it normal to feel cramping at six months pregnant,” and match that to an answer from a database of answers given by the Department of Health. The natural language processing, how does that come in? Is that the mechanism that’s enabling you to understand what my question is because I might phrase it differently. Is that how to think about it?

Sid: That’s exactly right. There are many ways you might phrase that question. You might use colloquial language or slang, especially in South Africa, where many different languages are spoken. You might borrow words from a different language. And so, yes, you are speaking naturally as you would to another human, and the computer has to make sense of it and understand it. Understand is a very generous way to explain what computers do, but it has to at least match it to this database of approved FAQs that we have built.

Emily: Thanks so much, Sid, for walking us through that example. It’s helpful to hear how you are applying the technology in a different way.

Eric, I’m going to turn to you. I’m wondering if you can talk a little bit about a recent project that you’ve worked on, where you’ve been developing a monitoring system for a client and how you’ve thought about making sure you’re developing the right thing that fits their needs, and how you go about that.

Eric: Thanks, Emily. Earlier this year, we worked with a great organization in Kenya called Kidogo. They’re a social enterprise focused on providing high-quality daycare to the poorest of the poor in Kenya. They operate on a franchising model, and they’re relatively new, so when they came to us to work on a monitoring system, they were at this point in their scale-up where the traditional small-scale data systems were starting to not work for them. It was becoming really difficult to understand basic information about the organization. For example, how many franchisees do we currently have? How many did we have two years ago? How many people do we train every year? And then there are other things like, which of our franchisees haven’t paid their dues in the past three months? So, data and reliable information that’s needed to answer basic questions and run the organization was becoming more difficult to collate.

We worked with them to identify all the different source systems they were working with to collect and manage data and worked on establishing more like a single source of truth across those systems, like identifiers for facilities for their entrepreneurs. And then, we built a system to extract data from all those different source systems, warehouse it, and then display it on a dashboard.

The indicators on the dashboard came out of an involved process with the client to understand their key decision-making needs and ensure that this dashboard addressed those. One thing we thought a lot for this engagement was that Kidogo is a small to medium size NGO, and they don’t have an engineering team on staff, so how do you build something that they’re going to actually be able to use after you leave, especially when programs and your decision making and monitoring needs change. The system has to be evolved over time, and we worked with them very closely and wanted to hire a point person for this system, someone in an analyst role who’s data-savvy but doesn’t have an engineering background. We then co-built things with that person using a variety of tools that we call low code or no code and software as a service type of tools that don’t require managing servers and things like that, to ensure that we can have a really smooth handover and they can run things after we leave.

Emily: I’m really happy that you mentioned that project, Eric, because I think it’s an especially interesting example of an organization that’s amidst scale-up, feeling the pull against their existing systems and realizing that they need something more robust, but also our team recognizing that in order to sustain this, far after our project with them ends, they needed additional capacity.

That hits on something that I think about a lot, which is, seeing you all work and you talk a lot about data pipelines and where the data is coming from, I imagine that in a lot of instances, you have clients who are coming to you with large aspirations about what type of dashboard they might create or what kind of algorithm could work for them, but you get into the details with them and realize, there are actually some fundamental challenges about how they’re collecting data, or what does the data look like? That’s just one example of a challenge that I’ve seen by proxy of working with you. I imagine that you also run into other challenges where there are a lot of limitations to what the data can really do and accomplish any kind of “right fitting” that clients’ problems and also for their expectations of what’s possible.

I’m wondering, Ben, I’m going turn to you if you can talk a little bit about how we structure our approach or how we usually try to figure out early on where some of these pain points might be and how we make sure that we’re “right fitting” the solution for the client.

Ben: Thanks, Emily. This is something I’d say as our team has grown over the last 18 months, we’ve put extra emphasis on, so beyond my team-wide responsibilities, I look after the product vertical, managing a team of product managers, and I think product thinking has informed a lot of how we think about the question that you just posed around right fitting solutions to clients needs.

Our approach starts with identifying who the users are of the potential solution and identifying what is the problem that they’re trying to solve, what is the pain that those users are trying to make go away, or what is the goal that they are trying to achieve. Maybe they have no situational awareness of how their program is operating, and they don’t know the progress that’s being made towards a goal and the dashboard might work. Or maybe, as Sid described, there’s a system that is very manual currently, where they could really use some automation or going back to the Educate Girls example I had, where they have a program that works, and they want to expand, but they don’t know where. So, completely outside the realm of technology solutions, we really want to understand what is the fundamental problem or challenge and start from there, and it may or may not have a technology solution to it.

Sometimes someone says, “Hey, I have a great machine learning problem for you,” and we come in and we talk to them and say, “actually, I think what we really need is a spreadsheet and to go talk to a couple of people and have some basic summary statistics to start.”

So, I think the most important thing is to start with figuring out who will use a potential solution and understand what they’re trying to accomplish, where their pain points are, what their goals are, etc. The second principle, I’d say, is we want to work collaboratively and refine over time.

In the product world, over the last 5 to 10 years, there’s been a real shift from what’s been called waterfall software development to more agile software development. The waterfall was when you planned every step. If you have to build a system that has a hundred steps to it, you plan all hundred steps out in a row with a timeline and a Gantt chart, and you manage very tightly against that. At the end, you have your airplane or your software system or whatever your goal is of the thing that you’re trying to build. But, I think what a lot of software companies have figured out in and operate in various flavors of this agile working today is that it’s really impossible to know what all of the constraints are that you’re going to face upfront. So, you’re better starting to get something built, getting feedback, particularly from the users, to understand the limitations you face and iterating over time.

For us, that means a focus on building prototypes where we can get feedback so we don’t have to build a full dashboard to wait to get feedback. Now we use a software called Figma to mock up designs and get feedback, where we can ask clients, “Hey, what do you think of the buttons and the dropdowns over here, or this kind of graph versus that kind of graph,” and it’s much quicker to build those prototypes than to build a full-fledged dashboard.

I’d say our approach really stems from identifying users, focusing on their problems, iterating quickly and building together with our clients. And then the final piece, which Eric alluded to, is thinking about handover and sustainability from day one and designing with that in mind.

Emily: That’s so interesting. Sid, anything to add there on top of what Ben shared?

Sid: I think Ben covered it. As you are aware, Emily, we, being an impact-driven organization, think a lot about this. So, if a project does not have impact, it’s a big source of lack of job satisfaction. The reason why we are here is for there to be impact, so right-fitting solutions is something that we have put a lot of energy into.

Recently, I wrote a blog on AI for Good that covers some of these lessons in there as well, but Ben covered it very well. Just one additional thing to that is from a technology perspective. We try very hard to build on top of the tech stack or on top of the technologies they are already using, which sometimes means we have to learn a new technology. Still, we have found that in the long term, that leads to a more sustainable handover or easier integration with their current technology. That’s the only thing to add over there. Eric, I don’t know if you have any thoughts.

Eric: In terms of right fit, just echoing what Ben said, the most important thing for impact for us is: do we create something useful or do we create something for the end user, something that the client wants to use, and everything around that will tend to fall into place.

Other secondary things to think about – One thing I would add to this right-fit question is that we work with a huge range of client types, so we have to think a lot about how do you work with a small to medium size NGO in Senegal versus how do you work with the Indian Government and build technology solutions that fit into those organizations, and as you’re designing it, you’re working with them in a way that works for them to elicit all the information that you need to build something useful. On the product side, user research and stuff, that’s a huge challenge that we’re getting better at, thinking about how we categorize things. We’re also doing a lot more to think about within client types and how we can move toward a more productized approach to designing and building as we scale.

Emily: I’m really excited to see how that develops in the next couple of years. Especially as you bring on more clients, address more challenges and see what you can build on based on your learnings and approach from past projects.

This next question I’ve had is definitely not unique to the DSEM team, but something that is a challenge. This work is really difficult to do. It takes a lot of work to create something that is what people need and is something that people will use. But I think the biggest challenge is if it’s something that people will trust. I wonder if you can talk about any challenges that you’ve had with partners who have been really skeptical about our recommendations or our approaches, or even the data that they’re seeing in a monitoring system, and what you’re learning about supporting your partners and clients to trust something that they probably don’t deeply understand because they’re not data scientists, engineering or monitoring experts.

I’m wondering if any of you want to touch on that either through an example or through your own experience.

Sid: One thing Emily, I want to say is that it makes complete sense that they’re skeptical, and there is a healthy amount of skepticism that we should hold when we are implementing machine learning or AI-based solutions.

There have been many examples in previous writing. I reference this book called Weapons of Math Destruction, where Kathy O’Neil talks a lot about over-trusting machine learning algorithms and their downsides of it. One aspect of it is I completely understand the skepticism, and I almost encourage it to some extent. Then there are two things from there. One is, how do we address that? Something that I wrote about recently is having a human in the loop. Not just allowing for a human in the loop but requiring a human in the loop. This makes sure that you’re not relinquishing all control to your AI overloads but always having a human who makes the final decision.

That’s one aspect of it, and it kind of makes sense if you think about it. There’s a lot of nuance to the world. The algorithm will capture not everything, and a lot of the people who have been making these decisions take a lot of this nuance into account. So, thinking that a machine learning algorithm can replace all that nuance and experience that a person brings to a decision is naive. We encourage that, and we try to identify people who are making the decision and think about what we can build that helps them make that decision a little bit better. We are not trying to replace people, we’re trying to augment them with tools to make that decision, and that approach goes a long way.

The other thing is, you know, Google has this list, I think it’s called Rules for Machine Learning, and the second one is build metrics. So, if you’re going to be building some sort of machine learning, figure out what are the metrics which tell you if you’re going in the right direction if this is actually delivering on the promise. We’ve done that in the past. In our work with Educate Girls, for example, that Ben mentioned early on, we had a dashboard and metrics to show how many more out-of-school girls you are finding when using the algorithm. It’s feedback for us, but it also allows us to build trust with our client that it is indeed having the impact that we thought.

Those are two things; having a human in the loop is extremely important. Identify the person who’s making the decision, and the approach should be to augment the decision-making, not replace it. And the other one is starting from the beginning and having metrics that track how well you’re doing.

Emily: I really like how you think about incorporating the human element into this and all of your planning, processes, and general approach. And it is kind of the opposite of the “set it and forget it approach,” which I think a lot of people may come into projects with us and think that that’s what they’ll eventually be creating, but I think these tactics really help to build trust and help show people that it’s augmenting, it’s supporting you, it’s another tool, but it’s not a replacement.

Eric: I think for the monitoring systems work like Sid was talking about with machine learning, you have a bit of a black box there that’s spitting out some result, and so it’s a little bit opaque to that end user how that number was arrived at.

But, with the monitoring system stuff, the actual calculations are very straightforward. We’re really just showing people totals and averages that are cut different ways, so that is transparent. But even with that, we have to work a lot with clients to set expectations around what it means to be a savvy consumer of their own data. For which you have to bring a natural skepticism, to what you’re looking at and also understand the data-generating process that led to that number.

We always have this moment where we’re extracting data from bunch of different systems that they already have – they have a surveying mobile app, a frontline worker app, they have data on some Google sheets, and we’re pulling all that in, and putting up a dashboard. And there’s this moment where the first time they see the real data, and then they say that it doesn’t make sense at all. And we have to remind them that this is part of the process. Then we go investigate why that set of numbers seems so weird. And everyone in the organization is responsible for data quality, so often, a light bulb goes on at that point. But yes, it’s about setting those expectations about what it means to be a data-driven organization.

Emily: That’s such a lovely example.

Ben, do you have anything? Anything you want to add on this point, and then I just have two more questions for you guys.

Ben: Sure. Just briefly, and I think one of the things, and Eric alluded to this, is that we sometimes get pushback on like, “Hey, do we really need to do all of the scoping work, or all we really need is this quick dashboard,” and there can be a tendency to think of these technology investments as projects rather than long term systems development. And I think this really comes from some of the incentives in our sector with how government funding works and, even more importantly, how philanthropic funding often works.

So, any funders listening in, please fund scoping and please fund maintenance. You’ll see much more impact in your projects. I also think over time, partner organizations have started seeing the value in the medium term if we can convince them to go through that initial scoping work, where we do that extra workshop or two to get everybody on the same page before we start building. We tend to run into many fewer problems down the road, and more than once, we’ve gotten that, “I’m really glad we did those workshops even if they were a bit of a slog to get through in the beginning.”

I think in overcoming that skepticism, there’s actually quite a bit of helping various teammates on the client side all get on the same page and figure out collectively what their goals and needs are. And that is really important, and it is sometimes met with a bit of skepticism as to how hard could it be when we first come into a project.

Emily: I appreciate your nod to the larger system in which some NGOs and others in the social sector are operating where to even do this type of overhaul in the first place, it requires outside funding, and that is how our projects work as well. We’re looking for outside funding to support them and to make an investment in their future. But it’s not always the place where donors’ minds go first in terms of a high-impact proposition. So, I think that’s a really helpful point.

I want to switch gears a little bit for my second to last question. You are often recruiting for DSEM roles, and you’re growing the team. We’re seeing a demand increase for this type of work, and I’m wondering if you have any words of advice or wisdom for young professionals who are interested in data science or engineering or creating monitoring systems and are passionate about social impact.

Any advice that you would give somebody new in their career trying to figure out how to use these skills to the best of their abilities. What type of experience should they be looking for? How can they develop early on?

Sid: Thanks, Emily. So first I want to say, come work with us! There are a lot of extremely interesting and very important problems to be worked on. Whatever your motivation, if you want to work on the hardest problems in the world or have positive social impact, this is a great place for you to do that. So, reach out to us, and talk to us about it.

In terms of what skills you need, because the kinds of problems that we work on are so varied, we hire people who have a very strong foundation in statistics and probability. Those two things often require university courses to build a strong intuition around it. Some of the computer science skills you can learn. We’ve had people who’ve come with minimal software engineering experience and picked it up pretty quickly, but there is no substitute for strong foundation in statistics and probability. So, yes, if you’re choosing your courses on what to do, don’t skimp up on the statistics and the probability to do a deep learning course.

Ben: I really appreciate that, Sid. And I think that foundation is really important and hard to overcome, particularly for our data science colleagues. It doesn’t just have to be in a university, there are great online resources to go through these, but it’s not something you’re going to pick up in a weekend. It’s something that you build over time, and the best way to really learn it is to do some hands-on work and get feedback from experts in the field, whether it’s in an education setting or side projects or whatnot.

Further building on Sid’s point, one thing I would say is we’re more often than not looking for people with a breadth of skills or people who have the most impact and often are really strong technology generalists. They’re generally not the world’s foremost experts in deep learning or natural language processing. They are people who may be strong in statistics, but they know how to set up a database on the fly, and they can write a data pipeline to connect various data sources. They make not half-bad visualizations, and maybe they know a little JavaScript to pull together a mockup of a front-end application.

In the sector we’re working in, it’s not Google, where you have a 50,000-person engineering team where you can become an expert in one very specific widget. To really succeed here, you need to wear a lot of different hats, which I think is exciting, and you get exposed to a lot of different types of problems and a lot of different types of tools. And then, if you leave a place like IDinsight and maybe go work somewhere else in the social sector, you might be the only engineer at a community health organization or an education organization. If you’re that only engineer, the ability to do a little bit of everything has so much value for that organization rather than having to build a team of specialists, which that organization might not be able to afford.

So, just reemphasizing what Sid said, we are idinsight.org/careers. We have plenty of job openings in all of our verticals. Please check us out!

Eric, what do you look for on the data engineering side when you’re trying to find candidates? I know we particularly need some full-stack engineers, so what are we looking for there?

Eric: Maybe I can talk about IDinsight, and then I can talk about the sector more broadly.

Our team, data engineers form the core of our software engineering team because our basic problem is there’s data in a bunch of source systems, and we need to figure out how to get that all warehoused and harmonized and put into a dashboard or other kind of reporting interface for decision makers.

We are building out a full-stack engineering practice now, we have one full-stack engineer, and we’ll be hiring a couple more next year. We are especially looking for more mid-level folks, people with five or more years of experience. For anyone who might be interested in joining our tech stack, on the data engineering side, we really look for entry-level folks who are very strong in Python and SQL. And then maybe have some experience with AWS and some experience with different pipelining frameworks. On the Fullstack side, our tech stack is on the front end, Flas on the background, and for the back end, we use Postgres, and AWS is primarily what we use for cloud.

At IDinsight, we are getting big enough on our technology team where we have the support systems in place now where we can have early career people join, and we are able to invest in their skills and train them up, and we have more mid-level folks who can supervise them.

I think the sector’s changing, and there’s more organizations that are operating at that kind of scale, but I think people who are interested in working in technology, in the social sector more broadly, what the sector needs are like Ben said, more generalists. But, also, I really think the sector needs a lot of people who are kind of like bridge people and translators. So, I would say it’s not a coincidence that Ben, Sid, and I all have a policy background. The development sector and all the organizations are filled with policy-type people, and the fact that we can bridge between the technology world and speak the policymakers’ language and help them understand how technology can work in the sector and add value for our clients is a needed skill set.

I would say for anyone who is interested in getting into technology in the social sector. It is invaluable to learn about the problem space that you’re going to be working in because nobody’s going to be handing you tickets to work on. You need to work with the practitioners, really understand the needs of the sector, identify novel problems that technology can offer solutions to, and then work with people to figure out how you actually build that.

Emily: Really, really interesting. I like this kind of common theme of right fitting that system throughout our conversation.

Is there anything that you all would kind to add before we sign off about your experience over the last few years or anything that we didn’t cover that you think is really worth sharing?

Ben: I can maybe just add one note, which is you learn so much about what the solution needs to be by just jumping in and starting to build something.

I mean, there’s a balance here between, you want to make sure you’re working on the right problem, but once you figure out you’re working on an important problem, I think the best thing to do next, rather, is to just jump in and start learning, because when you go to automate things when you go to connect databases when you go to visualize data, as Eric said earlier, there’s a lot of, “whoa, what’s that? That doesn’t make any sense.” And it’s not uncommon that a program that someone thought was delivering something for the last six months may not have been running for the last month because there was no system to flag when a crucial input ran out. Or you might have completely unequal performance; in one area, the program is great, and the other area is lagging behind, and before you only saw the average, and before you could split that out on a map, it was not obvious what was actually happening on the ground.

So, I think my advice is to really jump into the work where you can. Make sure you’re working on the right problems and not be afraid to sort of pivot to fix some of the problems that you uncover along the way.

A lot of times when you go to build a dashboard, it’s like, “well, actually, there are some data quality problems; maybe it doesn’t make sense to build this dashboard.” And when that happens, we shouldn’t throw our hands up in the air and say, “well, I guess we’re not building a dashboard.” We should stop and fix those data quality problems and then build a dashboard because that’s really where the value is and when we’ll start driving social impact. I’ll end there. Eric, Sid, any final words of sage wisdom to share?

Sid: Thanks, Ben. Looking back on the last two and a half years, and I think it’s pretty fair to say that we are one of the few organizations trying to do data science, data engineering and development. So there are a lot of lessons we have learned being at the frontier.

One of them, which is maybe a bit of an internal lesson for IDinsight as well, is not thinking of data science from a methods perspective but from a problems perspective. By that, I mean that data science is actually a catch-all and can conclude pretty much any method you can think of. If there’s a computer involved, you could argue it’s data science. So, instead of trying to articulate the various methods that we can now offer, an approach that we found has been really useful is just switching to the kind of problems that we can now help with. And that reframing and stepping away from methods has really helped us or allowed our client-facing teams, who are day in, day out, working on issues with or on problems with our client, to see where data science would fit in.

So this reframing, I think, was a long time coming, and I would encourage other practitioners as well to not talk about data sciences methods but instead address the kind of problems that we can solve.

Eric: I would just end by saying that, personally, I find it really exciting to be working on software at an organization where there’s so many people who are subject matter experts, who really understand how do you work with government in different developing countries and being able to partner with them as thought partners and how to deliver a complete suite of data for decision-making solutions.

IDinsight was traditionally based around delivering impact evaluations, which got a client to one big decision about a program, and what we’re doing on the DSEM team is trying to broaden that and the base of decision makers and inform lots of smaller decisions over time for more people, that slowly accrete, to impact. So, I think it’s a really exciting time to be working at IDinsight and just a really inspiring group of people.

Emily: That is a great note to end on. Thank you, Eric, Sid, and Ben, for your time today. It is absolutely a pleasure talking to you, and also it has been a pleasure seeing your journey over the last few years, and the remarkable growth of this team under your leadership and what we’ve been able to accomplish and hearing a little bit too about what is ahead is also really exciting.

Thank you so much, and as Ben said, if you’re interested in a career working with our DSEM team or with IDinsight, idinsight.org/careers is our last plug.

Thanks so much. Really appreciate it, everyone.