Blog

(Part 2) Did you wash your hands?

Jeff McManus 12 November 2019

Part 2: Measuring socially desirable behaviours when you can’t observe them

Two weeks ago I discussed the results from IDinsight’s RCT of a handwashing intervention in the Philippines (working paper), and showed how estimates of handwashing rates and program effects differ when using observational versus student reported data. This week I describe other survey techniques that we experimented with to measure handwashing when we were not able to directly observe handwashing opportunities.

Handwashing facility in Camarines Norte province, Philippines. Credit: Cedric Carrido

Two weeks ago I described how we designed and conducted a randomized controlled trial of the “HiFive for Hygiene and Sanitation program”, a six-week behaviour change campaign to promote handwashing among primary school children in the Philippines. The program sought to increase student handwashing rates at two, especially critical times: after using the toilet and before eating. While we were able to discreetly observe whether students washed their hands after they exited the bathroom, we were not able to observe handwashing before eating since lunch happened in different times and places, often at students’ homes. Instead, we relied on student surveys (4,295 students across 196 treatment and control schools) to measure handwashing before eating.

Recognizing that students may be prone to over-report handwashing — particularly in treatment schools that received the behaviour change campaign — we attempted to measure handwashing rates before eating using three different survey techniques. Our hope was that these techniques would help us to counter measurement bias and provide three estimates of treatment effects that we could use to triangulate the program’s impact.

First, we asked students to recount in detail everything that they did before lunch and coded whether they mentioned handwashing: “I would like you to tell me what you did during today’s lunch break. What time did your morning class end today? And when did you eat? What did you do in between leaving class and eating? Tell me even the small things, like if you ran into a friend, where you ate your lunch, if you ate with anyone, or if you used the CR [common room].” By not prompting students, this ‘script recall’ technique, in theory, could reduce social desirability bias.

Second, we asked students to respond to a series of yes/no questions and randomly assigned students to receive a 5-question list that included handwashing before eating or an otherwise identical 4-question list that omitted the handwashing question. The key was that we asked students to report the total number of questions that they would respond ‘yes’ to, but not which ones. This ‘list randomization’ technique effectively anonymized student responses to the handwashing question, which in theory should eliminate social desirability bias (provided students understood the instructions, could aggregate responses to the questions in their heads and recognized that this technique anonymized their response to the sensitive question…).

Third, we directly asked students whether they washed their hands before eating. We asked students questions in this order — script recall, list randomization, and direct response — to minimize the effect that prompting students about handwashing could have on responses to future questions. Here is how student responses to each of these questions compared depending on whether they were in a control school (“C”) or a treatment school (“T”).

What do we make of these results? While we don’t have ‘true’ observation data for handwashing before eating, I’m highly sceptical of the direct response data given how similar it is to self-reported rates of handwashing after using the toilet, the strong positive correlation between whether a student said that they washed their hands after using the toilet and before eating (corr = +0.27, p < 0.01), and the dubious quality of the self-reported responses to handwashing after toilet use (as described in the previous post). Script recall reduced the fraction of children who reported washing their hands, though it is unclear if the result was still an overestimate of handwashing (if students felt primed to mention handwashing to the enumerator) or an underestimate (if students just forgot to mention that they washed their hands). The list randomization results were disappointing in two ways: Children did not seem to realize that their responses were effectively anonymized by the technique — implying similar levels of handwashing rates as in the self-reported data — and many children seemed confused (despite a practice round) and gave responses all over the place — hence the giant standard errors.

So did the HiFive program improve handwashing rates before eating? Based on these results, it’s really hard to say if student behaviour truly changed or if the program only influenced how students responded about their behaviour. One could make the argument that the difference in handwashing rates before eating according to the direct response data (+4.3pp, p = 0.03) was similar to the difference in handwashing rates after toilet use according to the direct response data (+4.5pp, p = 0.05), which in turn was similar to the difference in handwashing rates after toilet use according to observation (+3.7pp, p < 0.01) — and therefore we might expect to see a similar treatment effect on handwashing before eating if we could have observed it. But this argument strays pretty far from statistical analysis into the land of conjecture.

In summary, here’s what we learned about measuring student handwashing rates from the HiFive evaluation:

1. Self-reported data for socially desirable behaviours may be wildly inconsistent with reality. Try to observe the behaviour, if possible.

2. Treatment effects estimated from self-reported data are also not reliable, especially if the intervention could have increased salience or social desirability effects without changing behaviour.

3. Other survey techniques — like script recall and list randomization — sound cool in theory but may not be appropriate for young respondents. Pilot these before using, and be ready with a backup plan.