Blog

Why we’ve launched a public data repository

Marc Shotland 16 February 2023

IDinsight's Ewoud Nijhof and Felipe Acero Garay at a school in Bomi County, Liberia ©IDinsight

IDinsight has launched a public repository to populate with data from our past projects.

But why?

A little over a year ago, Heather Lanthorn and I wrote a blog post about our new research ethics policy and internal ethics review committee. Research ethics has a long history, with full disciplines dedicated to its study. The tension we were facing was how IDinsight should approach research ethics, when most of the standards and norms did not clearly apply to organizations like us—organizations that produce demand-driven evidence. But slipping through the cracks was not a status quo we found acceptable.

We face similar dilemmas when approaching research transparency.

Research transparency in the social sciences, while growing in prominence (see BITSS and COS), is still relatively nascent. Yet its target audience is academic researchers, not organizations like IDinsight. As with research ethics, slipping through the cracks was not acceptable. One of the most important practices in research transparency is publishing research data along with the analysis used to produce estimates and draw conclusions (or in our case, make recommendations).

What are the benefits of publishing data?

Data publication serves as a quality-control mechanism. It aids us in checking that research conclusions can stand up to public scrutiny. With access to the data and analysis files, others can review the underlying assumptions and methods, potentially catching errors, or at least helping to understand the limitations of the research.

Beyond quality, data publication can add to the scientific body of knowledge. The data can be used by researchers and practitioners around the world, who can then use it to answer other research questions, build upon it, or combine it with other data to generate new insights.

There are also equity and ethical reasons to publish data. Much of the data collected is sponsored by well-resourced organizations from the “global north.” Publishing data gives researchers who do not have access to the same resources valuable research inputs. This becomes an ethical issue when the data come from populations in the “global south,” and if members of those populations are denied access, the whole exercise can be extractive.

Still some questions for (and answers from) IDinsight

There are still reasons why IDinsight might not put so much effort into publishing data:

Publishing data is not costless. The time and money we invest into it is taken away from producing new evidence to inform critical decisions that improve lives. But academics could make the same argument for not doing so. We, along with most in the academic community, believe that the benefits of accountability alone make the cost well worth it.

It is not IDinsight’s primary mission to add to the scientific body of knowledge. In most cases, we prioritize generating non-public information for policymakers to use in decision-making, which distinguishes us from academics and think tanks. However, if we can generate public goods in the process, why not?

One reason why not is that we could risk distorting the scientific body of knowledge by only publishing data from projects that make our clients look good, and keeping the less-flattering projects out of public view. This is similar to the phenomenon of “publication bias,” where only successful programs are deemed interesting enough to publish, while program failures are relegated to the file drawer, leading to a skewed sense that more things work than actually do. To mitigate this we have a transparency policy that asks us and clients to pre-commit to publishing the data (and results in general) regardless of the results. Along with publishing data we also preregister impact evaluations, with pre-analysis plans to avoid another source of distortion—“p-hacking” (the practice of tweaking the analysis and choice of outcomes to produce “statistically significant results”).

What next?

Yes, we have launched a public data repository, but we still have far to go in populating it. So far, we have data from four projects. We hope to post data from all projects where data publication is allowed (prioritizing those where we’ve published a paper in an academic journal). As we do, we encourage all of you to use this new data repository and help us to contribute to the advancement of knowledge (and solutions!) in alleviating poverty and fighting injustice.