Blog

Generating meaningful insights when your RCT design is upended

Jeff McManus , Michael Sebele , Mtise Mwanza 23 March 2023

The IDinsight team debriefing with the school principal of one of the Rising Academy Network-supported schools during the FasterReading endline data collection in Montserrado County, Liberia.

You’ve carefully assigned schools to treatment and control groups for an intervention, conducted baseline data collection, and are starting to lovingly clean (really just polish) your data set. But before you can share your informative impact estimates, you learn from field reports that the painstakingly-planned research design hasn’t been followed. Are all your efforts (and the research funds) wasted? Or can you still learn something useful about the impact of the intervention? In this blog, we share how we dealt with just such a situation in our RCT of an accelerated reading program in Liberia.

What was the situation?

We partnered with Rising Academy Network (RAN) to assess the impact of an accelerated, phonics-based reading program, FasterReading, on foundational literacy skills in early childhood education (ECE) classrooms. We set out to conduct a randomised controlled trial (RCT) in 74 of 95 RAN-supported government primary schools across 10 Liberian counties. RAN rolled out FasterReading in 37 treatment schools from January to July 2022 while 37 other schools received regular ECE programming. You can read the full endline report here for more details on the evaluation.

Over the course of program implementation, RAN staff discovered that some ECE classrooms were not clear on whether they were in the treatment or control group and many thus ended up deviating from their assignment. In brief, RAN didn’t want to withhold the program from grades that weren’t being studied, so they implemented the FasterReading program in grades 3-6 in both treatment and control schools. In some control schools, ECE teachers and school administrators were confused about whether the ECE classes were supposed to be implementing it or not. In others, teachers were aware of their treatment assignment, yet elected to use FasterReading materials in ECE classrooms anyways since they felt those materials were higher quality than existing teaching materials.

Ultimately, this led to significant exposure of ECE students to the program or to program materials in certain control schools, and incomplete implementation in certain treatment schools, undermining a clean treatment versus control comparison. The table below summarises non-compliance across treatment and control groups in our evaluation.

How did we address non-compliance?

Despite more than a third of schools not following their treatment assignment, all was not lost. We huddled with the implementer to discuss how we could most appropriately model non-compliance, and how to factor that into our analysis.

Our preferred approach to addressing non-compliance was to measure treatment as treatment intensity. We defined treatment intensity as the percent of the program that was received by a school. For instance, if a school implemented three out of five FasterReading cycles, then we coded it as receiving 60 percent of the program. We then conducted an instrumental variables regression, where treatment intensity was instrumented with treatment assignment.¹

Our treatment intensity estimator relied on two assumptions: (i) We assumed that treatment effects scale linearly with the number of weeks of implementation; (ii) We assumed the number of weeks of implementation in control schools was as good as random and not related to school or student characteristics. For this evaluation, we believed these to be reasonable assumptions. Students likely benefited from more weeks of instruction in the FasterReading program. Moreover, after speaking with RAN teachers and staff, it appeared that most control schools implemented cycles of the FasterReading program in ECE classrooms because of confusion surrounding treatment assignment, rather than anything related to school or student characteristics that affect reading scores. If this assumption were not true than we may not be able to distinguish between impacts of the program and these other factors.

To address this, we could estimate treatment-on-the-treated (ToT) bounds, which rely on fewer assumptions than the treatment intensity estimator, but they provide much less precise estimates of impact. Given this limitation of ToT bounds, and since we thought that the assumptions underlying the treatment intensity estimator were reasonable, we included the ToT bounds as a robustness check and reported them in the appendix, but we stick with the treatment intensity estimates as our ‘best guess’ of the impact of the program for students who completed all five cycles.

Researchers frequently face these situations where carefully-designed research plans are upended by the realities of implementing a new program in a complicated world. Rather than apply the more conventional, and conservative, corrections, it is often useful to model complexity to try and generate more meaningful insights.