©Roman Biernacki, Pexels
We’ve been building AI solutions for social sector partners since 2018, pioneering applications with organizations like Educate Girls, Last Mile Health, and LMIC country governments. In 2022, we turned that lens inward. We believed generative AI could make our teams more effective and, by extension, make our partners’ social programs more impactful.
Generative AI is an opportunity for the social sector. At a moment when billions of dollars in funding have been cut and pressure on resources is further intensifying, AI could help organizations stretch resources further, reach more people, and deepen impact for each person they serve. But poorly implemented, AI could increase inequality, decrease trust, or propagate bias. Here’s what we’ve learned trying to get it right.
Early on, we learned that AI transformation will fail if it’s left to a few technical experts. Knowledge work is too diverse and complicated for a central team to feasibly identify and tackle every opportunity. But we also worried that a fully organic, bottom-up approach would be too slow, chaotic, or risky.
We used a distributed accountability model to balance these tradeoffs. We organized a small central team to define strategy and build the enabling environment. This team was charged with removing roadblocks and freeing up individuals across the organization to do the real work of learning, experimenting, and finding practical ways to apply AI to the problems they were experts in solving.
The central team tried to make it safe and easy for others to learn and experiment. We laid out an internal AI strategy that connected the tools to our mission, so teammates could see how AI fit their work and could accelerate our impact. We implemented clear data policies so teams knew how to use AI safely. We launched an enterprise LLM as a secure starting point and supported teams to explore other tools independently. We built internal training modules, including content to help staff spot and question bias, especially given that AI trained on US and European data often missed crucial context for our work. We held knowledge-sharing sessions to spread ideas more rapidly. Leadership worked to normalize experimentation, celebrate both successes and failures, and create the psychological safety necessary for rapid learning. We were trying to build a workforce capable of identifying the best use cases and becoming AI-augmented experts in their domains.
The biggest challenge for our teammates was and continues to be finding the time. Our teams were running hot, a common occurrence in strained social sector organizations. Even when people understood AI could save them time in the long run, they struggled to make space to experiment. They ran into the “too busy to adopt the wheel” problem. They couldn’t stop hauling their bricks on square wheels long enough to try the round ones.
A few things helped, though we continue to face this challenge. We built short, five-minute training videos that shared how to use AI at IDinsight and were lean enough to watch with a few spare minutes. We encouraged teammates to share quick wins on an internal message board. We ran workshops with functional teams, prioritizing the problems they faced, identifying where AI could help, and co-building quick solutions. We tried to make the first steps small enough that busy people could actually take them.
Taken together, these steps slowly increased individual innovation. A data engineer accelerated wrote technical documentation five times faster, hitting a client deadline that fully manual writing would have missed. A project team built a custom tool to evaluate Theories of Change, finding gaps in program logic that would have been slower to surface otherwise. A researcher saved days previously lost to tedious journal formatting, redirecting their time toward higher-value tasks. A communications teammate turned technical reports into social media posts faster, reducing coordination and time to publication.
We started our generative AI initiative excited about the possibilities, while also nervous about the hype and risks. We found that rigorous measurement helped us distinguish between excitement and actual value, as well as guard against some of the risks.
Early on, we fell into the low-ROI trap. Adoption clustered around obvious uses like meeting summaries and refining emails. While these applications were useful, outcome data made it obvious that we weren’t massively moving the needle. We were victims of The Streetlight Effect, looking where the light was best, not where the value was highest. Measurement gave us the push to shift toward the harder, messier work of finding where AI can truly reshape existing workflows or solve new problems that were previously too hard or bandwidth-intensive to tackle.
Hard metrics also showed us where to focus. We tracked adoption rates, productivity increases, and reductions in workflow duration. We noticed the gains were unevenly distributed. Some teams were generating big productivity improvements, while others were not seeing much value. This data helped us find which bright spots to learn from which teams needed more support.
Measurement also helped us catch problems early. We built a custom tool to assist with performance reviews, and the efficiency metrics were impressive. We estimated the tool was generating a 20x return on investment. However, qualitative feedback from the pilot told a more complicated story. Teammates felt that AI-assisted feedback contained fewer concrete examples than in previous years, making it harder to know how to improve. Measurement helped us catch this quality slippage and pushed us to retrain the tool and keep measuring.
Like many organizations, we started our AI transformation by imagining we could automate parts of our work. We quickly learned that in judgment-heavy, context-dependent and relationship-driven work, full automation is uncommon. AI can’t build trust with a government partner or navigate the politics of a policy change. It also tends to propagate errors when asked to complete too many steps in a row. Instead, we focused on using it to help us augment our work.
Leaning into augmentation helped cement our vision for how to use AI. Each team member could operate more like a small team, with AI handling easy, single-step tasks, enabling them to concentrate more on judgment, human interactions, and driving multi-task projects forward.
An early concern teammates expressed was around quality. They appropriately feared that AI-assisted outputs could be poor quality, and if we started cutting and pasting outputs that hadn’t been properly reviewed, our clients’ trust in our work would evaporate and so would our impact. These fears are valid. Augmentation creates a deceptive risk that work can look good enough while lacking the substance and accuracy necessary to truly solve problems.
Doubling down on existing accountability standards helped. We reinforced the idea that AI does not change anyone’s professional responsibilities. We must always remain the final judge and decider, including maintaining responsibility for every word we write and figure we cite. We believe domain expertise augmented with AI can be powerful, but AI used without this sense of responsibility would be a liability. Once teammates were clear that we were still responsible and accountable for the content and quality of our outputs, concerns decreased, and experimentation increased.
We also learned to delegate tasks, rather than full workflows. Handing AI a discrete task like critiquing an outline, debugging a coding error, or brainstorming ideas, worked well. Handing it an entire workflow meant errors compounded across multiple steps without judgment and expertise intervening and correcting along the way. Our best results have come from staying closely in the loop.
This didn’t mean skipping workflow redesign. In fact, redesigning workflows led to our biggest gains. An engineering team collapsed a week-long mock-up process into two hours. A data collection team went from questionnaire design to field-test ready 16x faster. But humans remained heavily involved in these redesigned workflows, driving, managing, judging, and correcting at key points.
Today, 98% of our team uses generative AI weekly, 46% use it daily, and our teammates estimate they’ve gained 15% more productive capacity.
(sources: Real-Time Population Survey, Gallup)
We’re still learning. We’re continuing to optimize workflows, refine our approach to building bespoke solutions, and figuring out how to disseminate and scale what works. We’ve also started advising a handful of organizations on their AI journeys, bringing what we’ve learned to teams facing similar challenges. If that kind of support would be useful, you can reach out to Rob at rob.sampson@idinsight.org.
10 June 2026
7 May 2026
23 April 2026
14 April 2026
9 April 2026
27 March 2026
24 May 2014
1 March 2019
7 March 2019
2 April 2019