I analysed 600 of DFID’s Annual Reviews. Here’s what I found.


  • DFID annually reviews the performance of every programme they fund, and publishes this information online.
  • We read 600 randomly chosen annual reviews, in order to look for patterns in the scores awarded by the reviewers.
  • We found relatively little variation in the data; 64% of programmes got an average score (A, “meeting expectations”), with less than 4% receiving the lowest or highest scores (2% received C, 2% received A++).
  • Programmes are scored both during and after implementation. During implementation, programmes are more likely to be scored average (A) grades, and less likely to be scored a high or low grades. Only 2% of annual reviews award a ‘C’, but 8% of post completion reviews do. I suspect annual reviewers favour average grades in order to avoid the potential negative consequences of extreme grades. This represents a missed opportunity to highlight underperformance during implementation, when it is still possible to improve.
  • There is substantial grade inflation over time. This might be because programmes are getting better at setting expectations that they can meet. This casts doubt on the validity of the annual review process; if the current trend continues, by 2018 95% of programmes will receive an A or higher.
  • This blog is the second in a series examining DFID’s annual reviews. For the first blog, examining the weird grading system that DFID uses, click here. Future blogs will suggest ways in which grading can be improved.

Full blog:

DFID annually reviews the performance of every programme they fund, in order to challenge underperformance, suggest improvements and learn from both successful and unsuccessful programmes. The results are published online, demonstrating an admirable commitment to transparency and scrutiny, and supplying fantastic data on the effectiveness of DFID’s programmes. To my knowledge, however, annual reviews have not been externally researched. This is for a simple reason; annual reviews are lengthy word documents, with no easy way to download and analyse them. I estimate that there are at least twenty million words of text available online, growing by around five million a year.

Fortunately, I have quite a lot of spare time and some extremely tolerant friends, so we decided to read annual reviews for a randomly selected 600 out of the 4,000 projects available online, and note down the scores given for each annual review and post completion review. With the help of the amazing IATI data, we compiled a spreadsheet listing the vital details of all projects such as spend, sector and country, alongside the grades awarded by DFID for the achievement of outputs and outcomes. This blog presents some of the findings.

What are annual reviews and post completion reviews?

To understand this exercise – and the limitations – you need to understand how DFID’s reviews work. Each DFID programme has a results framework, which sets annual performance milestones against each indicator.[1] In an annual review, the programme is scored according to whether it met the annual output milestones or not. The programme is awarded A++ if it ‘substantially’ exceeded expectations, A+ if it ‘moderately’ exceeded expectations, A if it met expectations, and B or C for not meeting expectations. Receiving a B or a C is a big deal; a programme that gets two Bs in a row or a single C can be put on a performance improvement plan, and might be shut down. Annual reviews happen (as the name suggests) every year. Post completion reviews are conducted at the end of the programme, and award a score, using the same weirdly inflated grading system, for achievement of both outputs and outcomes.

The remainder of the blog presents some of the findings, alongside my attempts to explain them. Please note that I analysed just the grades – leaving the other twenty million letters of the reviews untouched. More in-depth research might be able to validate some of my theories and suggestions, and would be a promising future research project.

There is little variation in the scores awarded for annual reviews

Fully 64% of the annual reviews in my dataset received an A grade, indicating that the project is meeting expectations. Less than 2% received an A++, and less than 2% received a C. (See the table below).

AR Scores # Projects % projects[2]
A++ 10 2%
A+ 104 17%
A 397 64%
B 92 15%
C 15 2%

This makes it harder to conduct analysis into the factors affecting the scores given, and so was a bit of a disappointment for me. It is not, however, an issue for DFID. There is no objective, correct percentage of programmes that should be exceeding or failing to meet expectations. DFID could reasonably argue that, since a ‘C’ is effectively a mark of failure, you wouldn’t want more than 2% of programmes in the portfolio to receive it. Programmes which get a ‘C’ may anyway be shut down, so there’s a natural bias against having many Cs in the portfolio, given the effort that goes into launching a programme in the first place.

Post completion reviews show more variation in grades than annual reviews, and a lot more negative scores

Post completion reviews and annual reviews both rate the programme on the outputs achieved. It turns out that post completion reviews have a lot more variation. They are less likely than annual reviews to give an A, but more likely to give every single other grade. (See figure 1 below).

figure 1.jpg

In particular, post completion reviews are much more likely than annual reviews to award a ‘C’ grade. Overall, 15 annual reviews (2% of the total) award a C grade; of which 13 are in the first year of project implementation. By contrast, 8% of post-completion reviews award a ‘C’ for the outputs achieved.[3] (See figure 2 below).

figure 2.jpg

There are a number of possible reasons for this. One potential reason is that programmes really do worse in their final year of implementation, perhaps because problems become harder to hide, or staff leave for new jobs and it becomes difficult to recruit new ones. This seems unlikely, however, as the annual review data suggests that programmes actually get better at hitting output targets as programme implementation continues. (See next section).

Consequently, this seems to reflect a flaw in the review process; DFID’s ongoing monitoring is more positive than the review at the end of the programme. It may be that more end-of-programme reviews are done by external reviewers, who perhaps have a more negative view of programmes achievement. I don’t have data on who these reviews were conducted by, unfortunately.

I suspect the lack of variation in the scoring is also due to risk aversion on the part of DFID’s annual reviewers. In particular, a ‘C’ rating has serious consequences, and can lead to a programme being shut down. This can make DFID staff look bad, creates extra work for everyone involved, and leads to difficulty in spending allocated budgets. A post completion review does not have these consequences, as the programme has already finished and can’t be shut down. By contrast, an ‘A’ is an extremely safe option to give; it expresses reservations without any serious impact on the project. This could lead to reviewers giving more ‘A’ grades than they should.

There is grade inflation over time

Since the first set of annual reviews in 2012, the percentage of ‘A’ grades has steadily increased (from 56% in 2012 to 68% in 2015) and the number of ‘B’ has decreased (from 21% in 2012 to 10% in 2015). This is shown in figure 3 below. The same trend is apparent in post completion reviews, where the number of ‘B’s awarded has plummeted from 30% in 2012 to 7% in 2015.

Figure 3.jpg

The same trend is apparent if you look at the scores awarded to a single programme, separated out by the first, second and third annual review. Between the first and the third annual reviews the percentage of A+s and A++s increases, while the number of Bs and Cs reduces. (Shown in figure 4, below).The number of As also increases, from 57% in the first year to 64% in the third, but I haven’t included it in this graph as it distorts the scale.

figure 4.jpg


An optimist would conclude that programmes are getting better – but I don’t know of any other evidence which suggests that there has been a dramatic change, either positive or negative, in programme quality between 2012 and 2016. It could also be that the worst performing programmes are getting shut down, which would lead to an overall positive trend in the data.

Having experienced several annual reviews first-hand, I suspect that programmes are getting better at setting targets below the minimum achievable level. Of course, there is no incentive to set an ambitious target and not meet it; while there are plenty of incentives to set a modest target and exceed it.

While this is partially a good thing – there’s no point in programmes failing because they set unrealistic targets – it threatens to make the whole process meaningless. For example, if the current trend continues, by 2018 95% of programmes will receive an A or higher in the annual review. DFID needs to strengthen incentives to set ambitious targets which really allow programmes to be held to account.


The arguments presented above do not suggest that DFID’s annual reviews are uniformly useless. The score is just one facet of an annual review, and in many ways the hardest one to get right. DFID deserves credit for doing and publishing annual reviews at all, and those who have experienced them will know that they often include hard questions and rigorous scrutiny.

Overall, however, this analysis suggests problems with the review process. Firstly, programmes are more likely to receive an average ‘A’ rating during implementation than on closure, and much more likely to receive a ‘C’ rating once implementation has finished. I suspect the likely cause is risk-aversion on the part of the reviewer, which reflects a missed opportunity to highlight underperformance when improvements are still possible. Secondly, grades are improving over time. While this probably represents an improvement in the ability of programmes to set realistic targets, it also risks devaluing the annual review process, if expectations are set so low that everyone meets them.

This analysis would have benefited from a larger sample; I only sampled 15% of the total number of programmes. It also hasn’t criticised the underlying logic of annual reviews, although it could be argued that programmes should be annually assessed on the likelihood to reach outcomes, not just outputs achieved. Additional insights would have been gained from a qualitative analysis of the annual reviews, as well as the quantitative analysis. Any keen students or researchers want to take on the task?[4]

[1] This is almost always in the form of a logical framework.

[2] Rounded to the nearest percentage. In some of these tables, not all of the scores add up to 100%. This is normally because of missing data in the ARs; not all have awarded grades.

[3] A bit of care needs to be taken in interpretation. The strength of this evidence is limited by the size of the sample; very few programmes get Cs, and so comparisons are naturally tricky. The ‘year 1’, ‘year 2’ and ‘year 3’ annual reviews are defined in relation to how many annual reviews the programme has had; so they actually might happen in different years. More programmes have had a year 1 annual review than a year 2 or 3 review, for example. Likewise, the post completion reviews aggregate things that have happened in different years. Finally, not all programmes which have had an Annual Review have had a post completion review, and vice versa.

[4] If so, please email us at aidleap@gmail.com. We’re happy to share the data we’ve received so far.

The Limits of Transparency: DFID’s Annual Reviews

This is the first blog in a series which will examine DFID’s Annual Reviews, exploring what they say, what they mean, and how they could be improved. 

The aid world is full of contradictions. Think about the last time you worked overnight to produce a report that nobody will ever read. Think about facipulation. Think about the Sachs vs Easterly soap opera. But for sheer, brazen ridiculousness, few things beat DFID’s Annual Review scoring.

DFID should be applauded for scoring Annual Reviews, and publishing all Annual Reviews online. It’s transparent, honest, and allows others to hold both DFID and implementing agencies to account. Quite refreshingly unexpected for an aid bureaucracy otherwise devoted to self-preservation. So at some point in DFID’s internal decision-making, the aid bureaucracy pushed back. You can imagine the conversation within DFID:

Person A: We want to objectively review all our programmes, score them, and publish the scores online!

Person B: But…then people will find out that our programmes aren’t working!

Person A: Good point, I didn’t think of that. *Long pause* I know. How about we only award positive scores?

And that’s what DFID did. Programmes are ranked on a five point scale from ‘A++’, through ‘A+’, ‘A’, ‘B’ and to ‘C’. Programmes which are meeting expectations – just about doing enough to get by – will score an ‘A’ grade. Call me slow, but I thought an ‘A’ was a mark of success, not a recognition of mediocrity.

Programme which underperform will be scored as ‘B’, and must be put on an improvement plan if they score two ‘B’s in a row. Again, possibly I under-performed at school, but I was always quite happy to get consecutive B’s for my homework. A programme which is truly diabolical, and in severe danger of being shut down, would receive a ‘C’. Programmes cannot receive a ‘D’, ‘E’, ‘F’, ‘G’, or ‘U’, unlike the normal English exam system.

dfids scoring system (2).jpgJust to prove I’m not making things up.

DFID thus suffers a kind of technocratic schizophrenia. It possesses the most transparent and open assessment mechanism in the world – and a scoring system designed to prevent any appearance of failure.

World Humanitarian Summit Report: What is it? What does it say? What happens next?

Final_WHS_LogoFollowing 2.5 years of consultation and discussion, today the UN Secretary General’s report about how the humanitarian sector needs to improve has been published. ‘One Humanity: Shared Responsibility’ outlines 5 areas of core responsibility that need Ban Ki-Moon (UN Secretary General) believes should be focused on at the World Humanitarian Summit due to take place in Istanbul 23-24 May.

The 5 core responsibilities are:

1. Political leadership to prevent and end conflicts

Humanitarianism can’t resolve many manmade problems without political input. The report calls for coordinated, compassionate and courageous decisions by political leaders to analyse and monitor risk; act on early warning signals; work together to find solutions in a timely manner; accept that with sovereignty comes responsibility to protect citizens. There is a clear emphasis on human rights violations. Political unity is required for prevention, not just management of crises. The UN Security Council needs to put its divisions aside and actively engage in conflict prevention. More evidence and visibility is needed of successful conflict prevention to help mobilise resources (funds and people) for it in the future. There needs to be more sustained investment in promoting peaceful and inclusive societies.

2. Uphold the norms that safeguard humanity

Re-affirm the humanitarian principles. Despite all the legal frameworks and agreements in place, the world is still ridden with ‘the brazen and brutal erosion of respect for international human rights and humanitarian law’. The type of wars that we now see have left civilians and aid workers in severe danger of kidnapping, injury or death. We need to reassert the demand for respect for agreed shared norms, enforce laws and support monitoring mechanisms to tackle the erosion of rule of law. The Secretary General asks all members states to recommit to the rules and calls for a global campaign to affirm the norms that safeguard humanity. Start by ensuring full access and protection for humanitarian missions. Those not already signed up to core statutes and conventions of international humanitarian and human rights laws, are invited to accede at the Summit. Those already signed up are asked to actively promote and monitor compliance.

3. Leave no one behind

The humanitarian imperative includes the idea that aid shall be given based on need and that everyone’s need should be required regardless of race, religion, nationality etc. The 2030 Agenda has reiterated the need to focus on those at the very bottom and those in the worst situations, not allowing issues of access to be an excuse for not helping those in need. The stateless, displaced and excluded are highlighted, particularly children, though all those deprived or disadvantaged are noted. A new target for the reduction of new and protracted internal displacement by 2030 is called for. Specific changes are listed for the national, international and regional levels (see page 24). Finally it calls for a shared responsibility in addressing large movements of refugees; a commitment to ‘end statelessness in the next decade’; and the empowerment of women and girls.

4. Change people’s lives – from delivering aid to ending need

We need to invest in local systems and stop being obsessed with the humanitarian development divide. Despite the SDGs and the new era of cooperation for development that they represent, ‘conflict and fragility remain the biggest threats to human development’. The focus needs to be on reducing vulnerability rather than just providing short term relief. To do this we need to set aside the ‘humanitarian-development division’, and focus on the assets and capabilities available at all levels and across all sectors. This section calls for collaboration based on complementarity drawing on our collective advantage. The WorldVision statement from the Global Consultation has been used to summarise the new paradigm approach ‘as local as possible, as international as necessary’. People’s dignity and desire to be resilient should be harnessed to reduce the dependency on foreign assistance.

5. Invest in humanity.

The real humanitarians are those who live in countries vulnerable to disasters and so we should be helping them to be better prepared for emergencies.

Local capacity needs to be strengthened in order for funds to be shared directly to national authorities and local NGOs. Funding is currently under representative for national/local NGOs (0.2% in 2014) and for disaster preparedness and prevention (0.4% of ODA in 2014). Areas of greatest risk do not receive the necessary funds. The current financing structure is inadequate, inflexible and ineffective. As local actors are best placed to understand need and develop relevant solutions, their capacities should be increased so that they can receive more resources and accept more responsibility for both preparedness and response. New platforms and mechanisms, as listed in the High Level Panel’s report on Humanitarian Financing, will be introduced by the UN and others are encouraged to set similar targets.

Three key points:

Much of the report includes issues traditionally considered non-humanitarian such as access to justice or economic empowerment programmes for women. From a humanitarian system perspective there were three interesting acknowledgements:

  • The ‘inequity in the aid system’ plus the ‘out-dated’ and ‘fragmented’ international aid architecture are not admitted but it is noted that many have expressed their ‘outrage and frustration’.
  • Then the ‘pride’ of local actors is acknowledged. Highlighting where hope does exist, for instance when women and the young are ‘empowered’ to act. Isn’t it shocking how creative people can be with their solutions when given the space!
  • The good news is that the determination to keep going and fight for change is growing at the local level. Increasingly the formal international aid system is being left behind. Individuals and groups from the ‘global south’ are increasingly organising themselves and their communities – or perhaps they’ve been doing it for decades and we are only now recognising it because of processes like the WHS?

And now?

The report calls for us to stop taking the easiest route and acknowledges that humanitarian assistance and/or peacekeepers alone is not sufficient. Global leaders are called upon to attend the Summit ready to ‘assume their responsibilities for a new era’. Section V outlines what he expects from each of the key actors. However, the ‘unified vision’ that the Secretary General calls for is still a long way off as rifts have emerged among civil society and governments are cleverly keeping their heads in the sand till the final moment. The Annex to the report, an Agenda for Humanity, clearly lists the suggested commitments – could this provide the concrete ideas for groups to galvanise around?

The Secretary General recognises the UN’s role in the failure to date. Can he and his team deliver real change through the Summit? When we’ve seen so much achieved at the global level recently – Global Goals/SDGs and COP – is it realistic to expect the magnitude of change needed to be delivered in just 3 months time? To date the political will to seriously see humanitarianism as more than a front page winner for Presidents/Prime Ministers has been lacking. And the refugee crisis in the EU at the moment has demonstrated just how out of touch and short sighted many of our current leaders are. But then again the UN got Beyonce to sing on World Humanitarian Day !

Dear Summit organisers, please prove us sceptics wrong, pull off something of significance . . . something that at least leaves us with a roadmap or consensus to make a real global compact for change at a subsequent Summit in less than 3 years time. But this needs to be more than a good sing song, more than a global campaign to do better, this needs to be a CONCRETE AGREEMENT FOR CHANGE.

An ambivalence towards Female Genital Mutilation/Cutting

cultural relativism.jpgFor a number of years now I’ve worked, researched, and advocated against Female Genital Mutilation/Cutting (FGM/C). And, yet, I still find myself with a sense of ambivalence towards the practice.

FGM/C, which is also known as Female Genital Mutilation or Female Circumcision refers to the partial or total removal of the external female genitalia, or other injury to the female genital organs for non-medical reasons. Over 125 million girls and women have undergone FGM/C worldwide.

The health consequences of FGM/C are devastating. According to the World Health Organization, FGM/C causes immediate and long term health consequences including pain, shock, bacterial infections, infertility, risk of childbirth complications, and sometimes even death. Consequently, some have argued that female “circumcision” is a form of “female genital mutilation” and should be eradicated (See the late Efua Dorkenoo’sCutting the Rose”).

Furthermore, FGM/C can be seen as a form of male control over and subjugation of women. Frequent justifications for the practice include ensuring virginity and purity before marriage and preventing infidelity during marriage. And let’s not forget that in most cases FGM/C is practised on minors, not of ‘consenting age’ and lacking informed choice.

So far, so unambiguous. FGM/C is a harmful practice which causes so much suffering. So why do I find myself ambivalent towards the work I’ve been doing over the last few years to advocate against it?

It is partly because I’ve also come to realise that anti-FGM/C campaigns can harm, as well as help women. FGM/C is a deeply culturally embedded practice. Consequently, it is not generally perceived as a form of “mutilation” by those who undergo the practice (See Ahmadu 2000). Women who do not undergo FGM/C could become ostracised from their own community or might be unable to marry. Also, is it really my place to determine what others can and can’t do to their own bodies?

Moreover, some people have claimed to detect racist connotations underlying the notion of “female genital mutilation”. Why is it that genital surgeries in the West – more snappily known as “designer vaginas” – are condoned, yet, as Uambai Sia Ahmadu argues, the same procedure on “African or non-white girls and women” is considered “Female Genital Mutilation”, even when it is conducted by health professionals. I personally think there is something very wrong with society if girls and women think they need to surgically alter their genitals…. However, Ahmadu’s points certainly raise some important questions.

In essence, my ambivalence is between a universalist, zero tolerance approach and an open, culturally relative one. Is there a way to combine the two positions? Marie-Bénédicte Dembour suggests embracing the ambivalence and adopting a mid-way position – to “err uncomfortably between the two poles represented by universalism and relativism” (Dembour 2001:59). Using the metaphor of a pendulum, Dembour argues that for FGM/C neither view can exist without the other because as soon as one stance is taken, you have to adjust to the other. In one context, FGM/C may be accepted; it may be practised out of love to ensure a daughter can be married. But in another context, e.g. in the UK, it may be seen as a form of child abuse. Dembour illustrates this point in reference to changing trends on legal decisions on FGM/C in France; moving to severe sentences in the 1980s and early 1990s and back to acquittals in the mid-1990s. She explains that having moved too far in one direction, the judiciary felt uncomfortable with this position and moved back to a more lenient one.

PenArguably, Dembour’s approach is a cop-out. It doesn’t provide any clear, forward direction. And, yet that is likely the point. FGM/C is a highly complex practice – there’s no definitive way forward, otherwise wouldn’t we already have figured that out?

Perhaps my way forward is to fit in-between the two poles and move towards either one depending on the context. For example, in a situation where I am clearly an outsider as a white woman from a non-practising community living in the UK, I find it difficult to condemn those who are not living in my own country for practising FGM/C. Our world views, knowledge, contexts are so very different and I, therefore, have no legitimacy condemning what they can and cannot do to their own bodies. In this situation, I find myself moving towards cultural relativism. However, I do feel that for those women and girls likely at risk or affected by FGM/C in the UK, I lean towards universalism and will advocate against FGM/C.

I’d like to open this discussion for others to contribute to. FGM/C is a highly sensitive and controversial issue, with multiple viewpoints on it. Do you agree with my reasoning here? And do you have other situations you’ve experienced that I and others also in this dilemma might be able to learn from and consider?

Why you’re disillusioned with aid work

overconfidenceChris Blattman posted an interesting blog recently reflecting on why people get disillusioned with aid work. He argued that the “median development job is insulated from the world.” Cynicism isn’t an irrational reaction to aid work, but “completely sane and correct”.

This is a paradox; in a profession dedicated to reducing poverty, almost no jobs in aid actually have any connection with the poor. Why is this? I think the reason is that many aid jobs aren’t actually about delivering a good programme, or helping poor people, or managing finances, or any of the purposes stated in the job description. Huge numbers of jobs exist to manufacture confidence that the aid programme is going well, in order to reassure those one step up in the aid hierarchy.

The problem is that donors have minimal contact with the actual implementation of aid programmes. Reasonably enough, they worry about what their money is spent on. Many have constituents and/or Parliaments to report back to. In order to make this report, they pass the worry to their implementing partners, who pass it down their hierarchy till it reaches those at the field level. With a recent increase in aid budgets and scrutiny, demands for information about how and why money is spent have got increasingly salient. This has reached the point where the main job of many people working in aid is to convince other people that the aid money is getting spent in the right way.

This happens in all sorts of ways. Monitoring and evaluation staff produce data-sets and reports, which nobody ever looks at, but all sound reassuringly scientific. Technical specialists in gender, health, agriculture, etc all massage proposals and reports in order to reflect the latest fashionable jargon. Finance staff produce information about how money was spent, while procurement staff extensively document the processes that they went through to spend it. HR staff have an even more vital role – they hire the people needed to produce all the other reports. A vast array of interns and other administrative staff format, calculate, and mediate. And so, at each level in the aid chain, confidence is slowly manufactured.

While many aid workers manufacture confidence, I suspect it is particularly important for expats. Expats are less useful for any other role; they don’t speak the local language, don’t understand the culture, and don’t have the connections and contacts which would allow them to get things done. Moreover, expats are much better placed to build confidence. They speak English (or whatever language the donor works in), and look good on TV at home. They’re generally outside local power structures and hierarchies, and are considered – at least by those with no expertise in the area – to be less corrupt. Consequently, my guess is that those who blog and complain about their disillusionment with aid work – primarily young, idealistic expats – are those most likely to have a job centred around manufacturing confidence.

Many discussions on this topic focus on the problems that manufacturing confidence causes. To an extent, of course, they’re right. The desire for confidence creates really stupid processes which drive perverse incentives, from badly implemented payment by results schemes, to output-based monitoring. It looks like a vast waste of time and money – hence most involved quickly become disillusioned.

However, manufacturing confidence is not pointless. It’s a crucial aspect of any system where money is spent on behalf of others, and all the more so in aid programmes, where the distance between donors and ultimate users of aid is so large. Disillusioned graduates who spend their time in offices grumpily writing reports may be a long way from the dusty village that they dreamed of, but they are still playing a valuable part in the aid machine.

Consequently, it would be better if aid staff accepted that manufacturing confidence is really a significant part of their job, rather than a secondary, unwanted side-note. They could then spend more time thinking about how to build confidence as quickly, easily, and cheaply as possible, rather than complaining about how much time they need to spend on it. They might still end up disillusioned, but at least they’ll have made a contribution in the meantime.

Contribution vs Attribution – A Pointless Debate

RCT-Gold-StandardThere are three certainties in life. The first two are death and taxes. The third – known only to monitoring and evaluation (M&E) practitioners – is that during any workshop on M&E, someone will smugly point out that we should be examining contribution, not attribution.

Those of you without experience in M&E jargon (lucky you!) will need a bit of help at this point. “Attribution” is the idea that a change is solely due to your intervention. If I run a humanitarian programme dedicated to handing out buckets, then the gift of a bucket is ‘attributable’ to my programme. I caused it, and nobody can say otherwise. “Contribution” is the idea that your influence is just one of many factors which contribute to a change. Imagine that I was lobbying for a new anti-corruption law. If that law passes, it would be absurd to say that I caused it directly. Lots of factors caused the change, including other lobbyists, the incentives of politicians, public opinion, etc. The change is not ‘attributable’ to me, but I ‘contributed’ to it.

There are two reasons why it is a mistake to emphasise contribution. Firstly, it’s far too often used as a get-out clause. The phrase “we need to assess contribution, not attribution” is typically used to mean “Something good has happened. We want to imply that it was thanks to us, without trying to work out exactly how.” Even if you’re assessing contribution, you still need to understand the extent to which you contributed, and the process through which this happened. Of course, the contribution gurus understand this. All too often, however, their disciples just use contribution as an excuse to avoid doing any actual thinking.

This reflects a fundamental misconception that, if you look at contribution, you no longer need to examine the counterfactual. (The counterfactual, for the by-now-bewildered non-M&E folk, is the question of what would have happened if your programme had not existed.) Of course, it is not always possible to quantitatively assess the counterfactual, in the way that randomised control trials (RCTs) do. But it is always a valid thought-experiment, and an essential part of good M&E. If you’re not thinking about the counterfactual, then you’re simply not asking whether your programme really needed to be there.

Secondly, the term ‘contribution’ itself is used to mean multiple, often inconsistent things. The common-sense meaning of contribution, as defined above, is that there were many factors that caused the observed change. This is completely unhelpful. Any change (except for the most simple) is caused by many things. An RCT – often seen as the gold-standard way to assess attribution – recognises that many factors caused the measured outcomes. That’s why you use a RCT in the first place. Consequently, this meaning of ‘contribution’ does not justify the methodological differences that this approach implies.

A more useful definition of ‘contribution’ is that the change is not possible to measure quantitatively, or is a binary change (like a law being passed). In this case, it is not possible to ‘attribute’ a certain percentage of the change to your intervention. To illustrate the point, consider two different outcomes; (1) increased yield, (2) the passing of a law, and (3) a change in societal values. The first one is quantitative and divisible – it makes sense to talk of a percentage of a change in yield. Consequently, we can speak of an ‘attributable’ change in yield. The second and third, by contrast, don’t’ lend themselves to quantitative breakdown. It does not make sense to pass a percent of a law – it either passes or it doesn’t. Consequently, trying to work out what percentage of a new law is ‘attributable’ to your organisation is simply nonsensical. Similarly, you could never assign a percentage to the extent to which societal values change. In the latter two cases, consequently, it makes sense to talk of contribution rather than attribution.

In every case, however – whether ‘attribution’ or ‘contribution’ – the basic question is the same; what difference did the intervention make? How did it make this difference, and what other factors were relevant? Whether you can quantify the difference or not is a methodological detail, but doesn’t affect the basic question that you are asking. Looking at contribution, consequently, is a red herring.

Ebola: Lessons not learned

Guest Author: Marc DuBois, Consultant for Overseas Development Institute

Today marks 42 days since the last new case of Ebola in Sierra Leone, meaning the country will join Liberia in being declared Ebola-free. That brings the world one step closer to a victory over Ebola the killer.

But Ebola has another identity – messenger. We listened. It told us that many aspects of the international aid system are not fit for purpose. Many – too many – of the problems the outbreak revealed are depressingly familiar to us.

Pre-Ebola health systems in Sierra Leone, Guinea and Liberia were quickly overwhelmed and lacked even basic capacity to cope with the outbreak. The World Health Organisation (WHO) failed to recognise the epidemic and lead the response, and international action was late. Early messaging around the disease was ineffective and counterproductive. There was a profound lack of community engagement, particularly early on. Trained personnel were scarce, humanitarian logistics capacity was insufficient and UN coordination and leadership were poor.

The lessons learned should also come as no surprise: rebuild health systems and invest in a ‘Marshall Plan’ for development; make the WHO a truly robust transnational health agency and improve early warning systems; release funds earlier and make contracts more flexible; highlight what communities can do, and engage with them earlier. Except these lessons learned haven’t really been learned at all: they are lessons identified repeatedly over the past decades, but not learned. 

Why is the system almost perfectly impervious to certain lessons despite everyone’s good intentions? The short answer: these lessons are too simplistic. They pretend that the problem is an oversight, a mistake to be corrected, when in fact the system is working as it is ‘designed’ to work.  The long answer: what is it about the politics, architecture and culture driving the aid system that stops these lessons from becoming reality?

Take a simple idea, like reconstituting the WHO as an intragovernmental agency with a robust mandate to safeguard global public health, and the power to stop an outbreak like Ebola. Sounds great, but not new. So it also sounds like wishful thinking. It does not address the inherent tension between sovereignty and transnational institutions.

Think of it this way: the more robust an institution, the more of a threat it poses to the individual states that are its members, and hence the greater incentive for those states to set limits to its power. WHO was ‘designed’ not to ruffle feathers.

A robust WHO? Can you imagine the WHO ordering the US or UK governments to end counterproductive measures such as quarantining returned Ebola health workers or banning airline flights to stricken countries? It will never happen.

Here is the true lesson to be learned: at a time of public fear and insecurity, it would be political suicide for any government to allow such external interference. The problem isn’t the institution, it only looks like it is; the problem is the governments that comprise it. That is not to say that WHO cannot and should not be improved. It is to say that the solution proposed cannot address the fundamental problem.

Or take a complex idea, such as community engagement. Our Ebola research found that the ‘early stages of the surge did not prioritise such engagement or capitalise on affected communities as a resource’, a serious omission that ultimately contributed to the spread of the disease, and hence a key lesson learned (see e.g., this Oxfam article).

Disturbingly, this is a lesson with a long history. Here, for example, is what the Inter-Agency Standing Committee (IASC) found in evaluating the international response to the 2010 earthquake in Haiti. The relevance, virtually word for word, to the situation in West Africa speaks for itself:

The international humanitarian community – with the exception of the organisations already established in Haiti for some time – did not adequately engage with national organizations, civil society, and local authorities. These critically-important partners were therefore not included in strategizing on the response operation, and international actors could not benefit from their extensive capacities, local knowledge, and cultural understanding … This is not a new observation. Exclusion of parts of the population in one way or another from relief activities is mentioned in numerous reports and evaluations .

Why is this lesson so often repeated and so often not learned? Does the answer lie in an aid culture where ‘taking the time to stop and think – to comprehend via dialogue, engagement and sociological research – runs counter to the humanitarian impulse to act’. Our report discusses a greater concern: the degree to which people in West Africa were treated ‘as a problem – a security risk, culture-bound, unscientific – to be overcome’. 

The ‘oversight’ is hardly an oversight: people in stricken communities ‘were stereotyped as irrational, fearful, violent and primitive; too ignorant to change; victims of their own culture, in need of saving by outsiders’. Perhaps that clash of cultures highlights why we should not expect community engagement to spontaneously break out simply because the problem has been recognised.

Powerful forces work against aid actors engaging with the community during an emergency, leaving us with a lesson that has not been learned even after years of anguished ‘never again’ promises to do better.

Lessons learned are where our analysis of the power dynamics and culture of the international aid system should begin, not where it ends.

For the full report, read ‘The Ebola response in West Africa: Exposing the politics and culture of international aid’.

Why aid shouldn’t be outsourced.

Over the past few decades, an increasing amount of public money has been outsourced. Medical, security, and social services have been put out to tender, and contracts worth over £100 billion a year in the UK are won by businesses ranging from major accountancy firms to tiny consultancies. Sometimes the competitive pressures have cut costs, increased quality, and enabled innovation. At other times, it’s led to huge profits for contractors and shoddy service for everyone else.

By outsourcing, I mean any kind of arrangement whereby an external party, whether NGO or private sector, is contracted to deliver a pre-designed aid programme. The services to be delivered are defined in advance, and then the donor (or contracting body) invites bids to deliver these services. This is increasingly prevalent in the aid sector. A recent review of DFID’s work revealed that the percentage of aid channelled through for-profit partners in fragile states increased from 3.7% (2009-10) to 19.4% (2012-13). Despite the occasional grumbling from politicians and the British tabloids, I’ve seen remarkably little serious critical scrutiny of this trend.

Like anything else, outsourcing is sometimes good, sometimes bad. Outsourcing cleaning services seems to work pretty well. Outsourcing healthcare, by contrast, raises much more significant concerns. This blog discusses three factors that would justify successful outsourcing, and then explores the extent to which the aid sector meets these criteria.

The first factor is a competitive market. Outsourcing is often justified on the grounds that competition between suppliers drives up quality and cuts costs. If there are not enough suppliers to form a competitive market, then this logic falls through. Outsourcing in an uncompetitive market just leads to a state-sponsored monopoly, with no pressure to cut costs or perform well. Even in a market with multiple possible contractors, it needs to be easy to switch from one supplier to another. If that’s not the case – for example, because the donor is locked into a five or ten year contract – then this reduces the pressure on the supplier to perform.

The second factor is predictable and measurable results. Outsourcing aims to harness the innovation and cost-cutting ability of the private and third sector, by aligning private and organizational incentives (to make money) with public incentives (to deliver some kind of service). In order to align incentives, contracts are drawn up specifying what results the business has to achieve to receive the money. If these results can’t be specified upfront, then it’s unclear how such a contract could be created. How could the donor agree to pay money, if it doesn’t know what it’s paying money for? If results can’t be measured throughout the programme, then the donor will never know whether the contractor has performed well or not.

The final factor is intrinsic motivation. Some theoretical research has suggested that mechanisms that reward high performance with high pay can have a negative effect if they undermine intrinsic motivation. In an international development context, organisations also often need to work together to achieve the results they need. Sharing knowledge and developing partnerships is key to success.

Is the market of aid suppliers competitive? To be honest, I’m not sure. I’ve heard plenty of accusations that aid is delivered by a small cartel of large organisations. Large aid programmes, especially in fragile states, are complex to manage and deliver. ICAI found that “As a result, the security and justice portfolio is increasingly reliant on a small pool of large contractors.” Their review of fragile states highlighted a potential “over-concentration in a few big global players”. On the other hand, there are plenty of aid contractors out there: although each with a different speciality and focus. I’d be interested to hear more evidence on the competitiveness of the aid market.

My problem with outsourcing for aid projects comes from the second and third factors. Results in most aid programmes are unpredictable. Attempts to specify them in advance can damage programmes rather than aid them, setting perverse incentives to meet inappropriate targets. If results can’t be specified in advance, however, it is not possible to hold them accountable for performance. Any kind of result can be seen as a success – because you have no prior target by which achievements can be deemed to be successes or failures. Contracting organisations to deliver unspecified services seems to open the door to poor performance backed up by excuses.

This is compounded by the fact that the results that really matter aren’t easily measurable. Some are, of course; number of mosquito nets distributed can be counted, the number of children in school can be assessed, and they can even be split up by gender. Higher level outcomes, however, are often impossible to measure and attribute to one project. Incidence of malaria depends on the strength of the health system, weather, and public sanitation. Children’s education is dependent on the wealth and interest of the parents, the way in which teachers are trained and treated, and so forth. Private organisations would (quite rightly) feel uncomfortable being held accountable for such goals.

Outsourcing aid delivering starts by trying to align incentives through complex contracts. This is the wrong place to start. Instead, donors should concentrate on finding organisations that share values and interests, whether governments, NGOs, or local community organisations. They should build the capacity of these organisations through long term support, and hold them accountable for long-term success, rather than for hitting targets or running projects.

One Mountain, Two Tigers

Author: Hannah Ryder, Deputy Country Director, UNDP China.

tigersI’ve recently celebrated my first anniversary of working in China, and I can wholeheartedly say it has been fantastic so far. In particular I have been lucky enough to be surrounded by supportive and enthusiastic colleagues. To be very honest, I had worried about a challenge summed up well by a Chinese saying that “two tigers cannot live on one mountain”. When I joined UNDP China I knew I would be working alongside deputy country director Patrick Haverman and there was potential for confusion about our roles. We needed to make sure we complemented and added value to each other.  Of course, it took a while to work out, but it seems to be going well so far.

This same saying about one mountain and two tigers has been on my mind recently when thinking about the new financial institutions that China and other emerging economies have recently created – the New Development Bank (NDB) and the Asian Infrastructure Investment Bank (AIIB). Let me explain.

In every official discussion or media report about the NDB and AIIB, they are always set out as intended to be complimentary to existing institutions such as the World Bank, IMF, and the Asian and African development banks and so on.

But like me, while it’s all well and good saying this, is it really happening in practice? The fact is, when I joined UNDP a year ago, I had a fairly defined portfolio, and through trial and error Patrick and I have worked out where working together on problems really makes sense and doesn’t. Is there such a strong foundation, a territory for the NDB and AIIB to roam slightly separately?

Here’s an example, brought to my attention when Donald Kaberuka, former president of the African Development Bank, visited the UNDP China office recently.

Dr Kaberuka said he thinks a special role for the AIIB and NDB should be financing large regional projects.  Indeed, such projects are badly needed. In Africa for example, the Grand INGA dam project – which would be larger than China’s huge Three Gorges Project, providing clean energy for hundreds of people – has been little more than a dream since the 1970s. But why? Regional projects like INGA suffer from many complex problems, but one major problem is getting the finance to work. When an organisation like the African Development Bank or World Bank is trying to finance a project, they don’t just have to look at the financial credibility of the project itself, they also have to look at the income classification of the country or countries the project will be in. But the latter kind of classification is uneven.

As an example, Ghana, Togo, Benin and Nigeria are next to each other, and could arguably benefit from a regional project. However, Togo and Benin are classified as Low Income Economies, while Nigeria and Ghana are lower Middle Income Countries. As is explained in this blog post this means that the two groups of countries are eligible for different types of finance at different interest rates from existing banks like World Bank.  This makes regional projects difficult – as it is more complex to blend all the different types of finance.

But why are these classifications used in the existing banks anyway? Well, existing banks get a lot of their money from developed countries, who have a historic target for giving 0.7% of their GNI to a particular list of the poorest (i.e. lowest income) countries in the world.  This list is based on these classifications and reviewed every three years. The existing banks use the list – and therefore the income classifications – to ensure the finance they get from the developed countries to spend meets this target.

The great news is that the NDB and AIIB are brand new, and the money they have to lend is from countries like China. Countries like China don’t have to meet the 0.7% GNI target, and therefore don’t have to use the “list”. This means they – and the new banks – can avoid the complications created by income classification all together. This may make it easier for the new banks to make regional projects a reality and an area of specialization. It could help them become complementary in practice and not just rhetoric.

This is just one idea that has come up for how – to go back to the Chinese phrase – several financial “tigers” could roam on one mountain to deliver development. But there are many other ideas that have been shared and will continue to arise over the coming months. The key will be for the new banks to continue consulting far and wide, and test out different approaches – just like Patrick and I did at the UNDP China office. Indeed, they can come to UNDP for such ideas, as we have a very wide network of country offices who are eager to help.

I’m hoping that in one more years’ time, as I approach my second anniversary at UNDP China, more and more tigers will continue to roam on the development mountain.

Why we need to buy in to the World Humanitarian Summit NOW

Last weekend the Global Goals were announced. These follow on from the Millennium Development Goals and have been agreed following several years of consultation and negotiation. Celebrities have been bought in to persuade us that it is important for us, as individuals and citizens of nation states, to hold those in power to account for keeping them.

Final_WHS_LogoOver the past two years, a separate global consultation – reaching over 22,000 people – has been looking at how to improve the humanitarian sector (the Exec Sum of the synthesis has just been published). The UN Secretary General’s World Humanitarian Summit (WHS) will take place in Istanbul, Turkey, in May 2016. The objective of the Summit is to agree recommendations and garner commitments to improve the humanitarian sector. 

The humanitarian sector has been criticised both internally and externally for being increasingly ‘unfit for purpose’. The current system is based on principles which were developed in Europe over half a century ago. It’s main mode of operating is to fly costly international experts in to a country when a disaster strikes, normally funded by Western governments. However, both the donor landscape and the variety of groups providing aid are changing. Some perceive this change as a danger to the purity of humanitarianism, others see their vested interests being threatened, whilst others are frustrated by the slow pace of the change.


Though the Global Goals and WHS are inherently linked, the two processes have been managed separately. Many criticised the post 2015/post MDGs/SDGs process, claiming that it was being run by politicians and raising concern that ambitious goals would be meaningless without real buy in from global leaders. What can/should we learn from this experience when preparing for Istanbul in May 2016?

First, all such global processes include a painful change curve. By this I mean that there will always be an initial burst of energy, followed by growing criticism until there is an explicit denouncing of the process and/or its potential to achieve results. Finally the troops are rallied, and a compromise is met.

At this end stage, there will be people still saying that it was all done wrong. There will be people shouting about how great the result is. And there will be those, the majority, who are simply relieved that an end has been reached.

In September 2013 in New York at the WHS kick off meeting many round the table expressed doubt that the Summit would enable any significant change. The same was heard when the first debate about the SDGs was held – how could a paper agreement really end poverty? The WHS global consultation received mixed reactions in the different regions, and consultations have been criticized for not reaching remote or vulnerable people.

Most recently at a conference hosted by Manchester University, Alex Betts – Professor of Refugee Studies at Oxford University, made a public statement against the planned process for the next 3 months. Betts believed Dr Jemilah Mahmoud’s recent resignation, from her post as Chief of the WHS Secretariat, was because of internal politics at UNOCHA. He named 3 white middle aged man who were now in control of the process and would be drafting the UN Secretary General’s report. This meant, he stated, that the voices of those heard during the consultation could be easily lost. In response some of the conference’s participants proposed a separate summit be held at the same time next year: one where the voices of affected populations would be heard and acted upon.

Secondly, despite all the grumblings, disappointments, concerns about watering down to the lowest common denominator or about being too ambitious and unrealistic, the process will reach a conclusion. Many doubted that the debates about the SDGs and the new MDGs would reach one set of goals. Yet 17 global goals have now been agreed and announced.

As part of the humanitarian sector, I believe that despite the imperfect process, we must grasp this opportunity to instigate change. We must focus energies on discussions about substance and content over process and politics. We have 8 months to buy in and direct the final result towards a useable solution.

Thirdly, the process has just begun. The Summit itself and the recommendations agreed at it, are just the first step. A clear roadmap, with commitment, ownership and funding arrangements will be essential before any change is witnessed. The MDGs have already demonstrated this for us.

Currently, there is no roadmap for after the Summit. Key member states from around the world – from both the Security Council and the G77 – need to see a process for implementation. And there needs to be mechanisms to hold everyone to account.

In conclusion, we need to get over the imperfections of the process and focus on getting the most out of this rare global opportunity. Much time, effort and resources have already been committed to the Summit. It would be a real shame to see the Summit become a talking shop full of junior officials without the authority to take decisions.

Civil society (read you, I or we) can play a key role in the following ways:

  • putting pressure on their Governments to properly engage with the Summit, helping to shape the agenda and ultimately committing to undertake its recommendations
  • engaging during the formation of the recommendations to ensure that they are relevant, implementable and ambitious
  • supporting and calling for the development of a roadmap with clear responsibilities for implementing the recommendations, specifying how they will be financed and what monitoring/accountability system will be put in place

It’s rare that the humanitarian sector has this opportunity to garner change for the better, let’s ensure we make the most of it.