Skip to content
Blog

Evaluating AI for social impact: Reflections from UNGA 80

Lorreen Ajiambo 30 September 2025

At this year’s UN General Assembly, IDinsight hosted a side event that brought together funders, researchers, implementers, and social sector leaders to address a pressing question: as artificial intelligence becomes more common in the sector, how can we evaluate it without slowing down innovation?

The conversation, held at Copinnette in New York City, unfolded against the backdrop of growing enthusiasm for AI’s potential to transform development programs and growing concerns about its risks. What emerged was a nuanced discussion that acknowledged the promise of AI while underscoring the sector’s responsibility to ensure that enthusiasm translates into measurable, equitable impact.

The event opened with remarks from Oliver Jones of the UK AI Security Institute and IDinsight CEO Rebecca Gong Sharp. Oliver situated the conversation within the global debates on AI governance, emphasizing that while policymakers are grappling with questions of safety and security, the development sector faces an equally urgent challenge: ensuring that AI adoption serves public good rather than private interest.

Becca extended this framing by highlighting IDinsight’s role in building bridges between innovation and evidence.

“We need to bring a dignity and inclusion lens to our AI evaluation work. Asking how people experience these tools is just as important as measuring reliability.” 

– Becca Gong Sharp

 

Insights from the panel

The heart of the event was a panel moderated by IDinsight’s Chief Data Scientist, Sid Ravinutala. The discussion featured perspectives from funders, practitioners, and evaluators who are directly engaged in shaping how AI is deployed in the social sector.

 

Han Sheng Chia, Director of the AI Initiative at the Center for Global Development (CGD), offered a perspective grounded in both policy and practice. He illustrated how evaluation methods must evolve alongside the technologies they assess. He referenced the four-level AI evaluation framework that CGD co-developed with The Agency Fund and J-PAL. Just as programs iterate rapidly, evaluation frameworks must flexibly adapt to provide timely and actionable insights. 

Temina Madon, co-founder of The Agency Fund, highlighted how many organizations default too quickly to familiar methods like RCTs before fully understanding how people experience their product. This is particularly limiting for AI, where tools evolve rapidly and a one-time impact evaluation may only capture a narrow snapshot. Instead, a product-centric mindset is needed, one that treats evaluation as an ongoing process of learning, iteration, and refinement. In this view, evaluation is not separate from development; it is the development process. She proposed the use of Product cards to make AI evaluation more transparent and accessible, modeled on the “model cards” concept introduced in the tech sector. These would provide a short, standardized summary of what data a tool uses, how it has been tested, and what safeguards and intended uses are in place.

Crystal Haijing Huang, Senior Economist at IDinsight, built on this point by sharing concrete examples from ongoing partnerships. Through IDinsight’s work with IDRC and FCDO’s AI for Development initiative, as well as collaborations with Google.org, she described how mixed-methods evaluations are helping to assess AI interventions in health and behavioral science. Importantly, she highlighted that evaluations must look beyond intended outcomes to capture unintended consequences. “AI can widen inequalities just as easily as it can reduce them,” she noted. “Evaluation gives us the visibility to course-correct.”

Finally, Karla Palmer, who manages AI for Social Good and Science at Google.org, spoke from the perspective of a funder shaping the ecosystem. She described how Google.org’s funding approach to generative AI has been deliberately coupled with investments in evaluation capacity. Through initiatives like the Google.org Accelerator and a collaboration with J-PAL, Karla emphasized that philanthropy has a responsibility to align incentives so that grantees are not only developing innovative AI tools but also testing their real-world effectiveness. In her words, “funders cannot simply reward shiny pilots. We must reward evidence of meaningful, lasting impact.”

Emerging themes

Several themes emerged over the course of the discussion. The first was the need to rethink traditional evaluation methods. While randomized controlled trials and large-scale evaluations remain important, the pace of AI innovation demands more adaptive methods be deployed in parallel. Panelists pointed to rapid experimentation, proxy outcomes, and tools such as user funnel analysis—which evaluates retention at each stage of adoption—as ways to provide timely feedback without sacrificing rigor.

A second theme was the idea of shared responsibility. Evaluating AI cannot fall solely on implementers or evaluators. Funders, too, must recognize that their choices about what to finance can either encourage or discourage evidence-driven adoption. Similarly, implementers need to view evaluation not as an external imposition but as a tool to refine and strengthen their own work.

The third theme was the role of evidence in building trust. In a field where skepticism about AI runs high with good reason, transparent evaluation is the best way to separate hype from genuine impact. Evidence allows social sector leaders to move beyond fear and speculation, grounding decisions in data that reflect the realities of the communities they aim to serve.

Taken together, the panelists painted a picture of a sector at an inflection point. AI’s potential is undeniable, but so are the risks of unchecked adoption. The conversation underscored that evaluation is central to ensuring that AI applications contribute to equity and effectiveness in the social sector.

Why this conversation matters

The UNGA 80 side event underscored a critical lesson for the development community. AI can help optimize resources, target beneficiaries more effectively, and expand the reach of proven interventions. 

Governments, NGOs, and funders should adopt AI where it can add value, but must insist on evidence to guide its use. Stakeholders need to build evaluation into the DNA of every AI intervention, and share lessons openly so that successes and failures alike can inform the wider field. As a sector, we need to develop standardised approaches that can determine the impact of AI interventions at different stages of product development. 

In this era of rapid AI-driven transformation, IDinsight is committed to making evidence the foundation for responsible, cost-effective, and lasting social impact. As AI becomes an increasingly prominent tool in the social sector, generating and sharing evidence will ensure that technology serves those who need it most.