Is it time to start evaluating social policy using randomised trials?
Guest blogger Tom Yates calls for more randomised trials to evaluate social policy.
In February, I went to see Rachel Glennerster speak in London at a meeting put on by organisations including the Alliance for Useful Evidence. I was impressed.
Glennerster is the director of the Abdul Latif Jameel Poverty Action Lab (J-PAL) at MIT – the outfit behind hit book Poor Economics. J-PAL is at the forefront of a movement promoting the wider use of randomised evaluations, akin to those used in medicine, in the development of social policy. Esther Duflo’s TED talk, gives a good idea of what they are about (I note the evidence regarding the educational benefits of deworming has recently been challenged).
As Duncan Green of Oxfam puts it, “[J-PAL] are unashamedly, indeed belligerently, micro, small-is-beautiful technocrats…they are incrementalist and disagree with the structural focus of the ‘political economy’ approach or sweeping calls for revolution – they argue that positive changes can be achieved with less pain, often on a massive scale, even in hostile political environments, simply by minor tweaks to policies and institutions…And they aren’t scared to follow the logic of their own arguments, even when it takes them in some surprising directions…it’s hard to pigeon-hole the work as particularly right or left wing.”
The strongest arguments for this approach are the often counter-intuitive conclusions reached and the savings Governments might make by not investing in useless interventions.
Glennerster described a study demonstrating the dramatic impact of ‘commitment devices’ on fertiliser use in rural Kenya. A small but time limited discount on fertiliser offered at the time of harvest, when money was relatively plentiful, resulted in much higher uptake than a larger discount offered at other times in the year. “Present bias” (procrastination or prioritising today over tomorrow) could be overcome by forcing a decision at the point it would least impact on people’s immediate quality of life. Glennerster argued present bias was phenomenon observed in other contexts. In Indian slums, she said, people have been observed to pay others to look after their money for them, aware they might fritter it away on cups of tea or other sources of immediate pleasure. Glennerster described how, in The Philippines, commitment devices have been demonstrated to increase savings and reduce cigarette consumption.
One of Glennerster’s main arguments was that flexibility in study design – particularly choosing the best and least disruptive method of randomisation – allowed randomised evaluation of interventions in a broad range of circumstances. She outlined a range of options:
“Randomise phase in” (in epidemiology we call this a ‘stepped wedge’ design): if an intervention is being adopted but there is not capacity to roll it out everywhere at the same time, you can randomise the order in which it is introduced by clinic, school, region, etc. In medicine, this was first used to evaluate the impact of Hepatitis B vaccination on chronic infection and the risk of liver cancer in later life. Glennerster has used the design to demonstrate microcredit wasn’t as good as observational data had suggested.
“Randomisation at the cut off”: sometimes there are groups of people who definitely need an intervention (e.g. the poor) and groups who definitely don’t (e.g. the rich). Usually there is a group of people in whom the intervention is of unknown benefit. It is possible to gain information about the intervention by randomising the threshold at which the intervention is offered. Whilst you only gain information about borderline cases from this method, it is useful information. An iterative process can then be followed to test whether benefit can be demonstrated at the next threshold.
“Randomise intensity of intervention”: For example, you could randomise areas to receive 5 vs 10 evening class teachers. This allows assessment of local equilibrium effects – for example, whilst education might benefit individuals, you may be concerned that a larger pool of educated workers could drive down wages for skilled labour.
“Encouragement design”: If something has already been widely implemented but uptake is patchy, you may be able to assess its effect by randomising people to receive encouragement to access the intervention or no encouragement. Given encouragement may not be particularly effective and given many people in the control arm may access the intervention anyway, these studies may need to be big to demonstrate an effect.
An audience member wondered whether, in a time of cuts and austerity, information could be gained from examining the effect of the withdrawal of services.
The audience was mostly made up of British civil servants who were interested in the politics of randomised evaluation – how they could persuade their minister to adopt such an approach. There are obvious stumbling blocks, including the need in many fields for long term data to assess impact. Glennerster was realistic that politicians may not be willing to put their flagship policies to the test but thought politicians could often be persuaded to test details of their policies or to test the best means of implementing them. It works if they are allowed to set the questions!
Glennerster pointed out that the evaluation of PROGRESA, a conditional cash transfer programme in Mexico and perhaps the most famous example of randomised evaluation in social policy, was instigated by a Government who knew they were going to lose the next election. It was a way to safeguard their programme. They knew the next Government would find it hard to stop the trial once it was started and were confident the evaluation would show benefit, again making it hard for the next Government to drop the programme. Randomisation can be politically advantageous.
Many see evaluation as a global public good and feel funding for evaluation shouldn’t have to come out of programme budgets. A number of organisations including 3ie, the World Bank and the French Government have created central pots of money that those implementing programmes can apply to should they wish to evaluate what they are doing. These funds should, over time, prove cost saving.
I disagreed with only one point made at the talk. An audience member argued that in global health the move from funding health systems to funding disease specific programmes represented progress towards evidence based policy making. There is a danger here that we systematically prioritise interventions whose benefits are easy to measure or whose harms are difficult to measure. We shouldn’t go about evaluations thoughtlessly and we must compare like with like. However, this is not an argument against randomisation or rigorous evaluation if sensibly conducted.
Much of the most exciting work in this field is happening in low income countries – particularly at the interface between health and microeconomics. See, for example, this study looking at the impact on HIV transmission of cash transfers to young women in Malawi – a massively exciting result.
Over recent years, policy making in the UK has occured in something of an evidence vacuum – recent health service reforms being a case in point. Audience members pointed out, however, that, even here, there remain islands of reason. These include the Education Endowment Fund and the interesting work of the Behavioural Insights Team at the Cabinet Office.
So hope remains. Long live sensible policy making! Long live the randomisation revolution!
Tom Yates is a junior doctor with an interest in infectious disease epidemiology and TB control. He is currently training in epidemiology and divides his time between London and KwaZulu Natal, South Africa. Tom blogs on epidemiology and population health at sickpopulations.wordpress.com and tweets @tomayates.