What’s wrong with DFID’s monitoring – and how to fix it

May 16, 2016May 16, 2016 / aidleap

This is the third in our series on DFID’s monitoring systems. Click here to read our previous blog, which discussed our analysis of over 600 Annual Reviews from DFID.

I’ve previously mocked DFID’s Annual Reviews on this blog. In the spirit of constructive criticism that (on a good day) pervades Aid Leap, it’s now time to say something more detailed about why they don’t work, and how they might work better.

cartoon on grading.png

Annual Reviews are DFID’s primary way of monitoring a programme. They generate huge amounts of paperwork – with an estimated twenty million words available online – alongside a score ranging from ‘C’ to ‘A++’, with a median of ‘A’. If a programme receives two Bs or a single C, it will be put under special measures. If no improvement is found, it can be shut down.

This score is based on the programme progress against the logical framework, which defines outputs for the programme to deliver. Each of these outputs is assessed through pre-defined indicators and targets. If the programme exceeds targets, it is given an A+ or an A++. If it meets them, it gets an A, and if it falls short, it gets a B or C.

It’s a nice idea. The problem is that output level targets are typically set by the implementer during the course of the programme. This means that target-setting quickly becomes a game. Unwary implementers who set ambitious targets will soon find themselves punished at the Annual Review. The canny implementer will try to set targets at the lowest possible level that DFID will accept. Over-cynical, perhaps; but this single score can make or break a career (and in some cases, trigger payment to an implementer), so there is every incentive to be careful about it.

A low Annual Review score, consequently, is ambiguous. It could mean that the implementer was bad at setting targets, or insufficiently aware of the game they are playing. Maybe a consultant during the inception phase set unrealistic targets, confident in the knowledge that they would not be staying on to meet them. Maybe external circumstances changed and rendered the initially plausible targets unrealistic. Or maybe the programme design changed, and so the initial targets were irrelevant. Of course, the programme might also have been badly implemented.

Moreover, the score reflects only outputs – not outcomes. A typical review has just a single page dedicated to outcomes, and fifteen to twenty pages describing progress against outputs. It makes no sense to incentivise the implementer to focus on outputs at the expense of outcomes by including only the former in the scope of the annual review. The best logframes that I’ve seen implicitly recognise this problem by putting outcomes at the output level– but this then means that the implementers have even more incentive to set these targets at the lowest possible level.

I don’t want to throw any babies out with the bathwater. I think the basic idea of a (reasonably) independent annual review is great, and scoring is a necessary evil to ensure that reviews get taken seriously by implementers. As I’ve previously argued, DFID deserve recognition for the transparency of the process. I suggest the following improvements to make them a more useful tool:

All targets should be set by an independent entity, and revised on an annual basis. It simply doesn’t make sense to have implementers set targets that they are then held accountable for. They should be set by a specific department within DFID, and revised as appropriate in collaboration with the implementer.
Scoring should incorporate outcome level targets, where appropriate. It’s not always appropriate. But in many programmes, you can look at outcome level changes on an ongoing basis. For example, water and sanitation programmes shouldn’t just be scored on whether enough information has been delivered; but on whether anyone is using this information and changing their behaviour.
For complex programmes, look at process rather than outputs. There’s a lot of talk about ‘complex programmes’, where it’s challenging to predict in advance what the outputs should be. This problem is partially addressed by allowing these targets to be revised on an annual basis. In some cases, moreover, there is an argument for more process targets. These look not just at what the organisation is achieving, but how it is doing it. A governance programme, for example, might be rated on the quality of their research, or the strength of their relationships with key government partners.
Group programmes together when setting targets and assessing progress. Setting targets and assessing progress for a single programme is really difficult. It’s always possible to come up with a bundle of excuses for any given failure to meet targets – and tough for an external reviewer to know how seriously to take these excuses. The only solution here is to group programmes together, and assess similar types of programmes on similar targets. Of course, there are always contextual differences. But if you are looking at two similar health programmes, even if they are in different countries, at least you have some basis for comparison.
Clearly show the change in targets over time. At the moment, logframes are re-uploaded on an annual basis, making it difficult to see how targets have changed. If there was a clear record of changes in logframes and targets, it would be much easier to judge the progress of programmes. I’m not sure whether this should be public – it might not pass the Daily Mail test – but DFID should certainly be keeping a clear internal log.

6 thoughts on “What’s wrong with DFID’s monitoring – and how to fix it”

Michael O'Donnell

May 17, 2016 at 12:42 pm

There’s lots to agree with in here: the need to focus more on outcomes than on outputs; grouping projects together to understand collective contribution to change… But I’d query two of the recommendations:
(1) The idea of the donor or a 3rd party setting results targets is scary. I’d suggested instead a negotiated process between funder and implementer. Those not close to the intervention generally don’t have a good track record of knowing what are the best indicators and targets, and you can cause awful problems with the wrong targets in place. But obviously implementing staff are neither all-knowing or immune from the temptation to game targets. Resource constraints on the part of donors is probably the biggest barrier to this though.
(2) For complex programmes, I’d say look at process in addition to outcomes. You still need to keep your eye on the prize while incentivising learning and adaptation rather than rigid adherence to plans.

And in fairness to DFID, recently (post Smart Rules) I’ve found them much more flexible around altering logframes frequently (not just once a year) – though I also wouldn’t be surprised if that’s not everyone’s experience.

Michael O’Donnell, Bond

Reply
Pete Vowles

May 17, 2016 at 1:30 pm

Thanks again for your work on thinking about how we can improve our programme monitoring. Again, a useful set of comments. A few thoughts:

Firstly, I’d challenge some of your views on how we monitor programmes:

1. We monitor programmes throughout the year – a good Annual Review should be the product of high quality monitoring throughout so while an Annual Review is the formal document it is only one part of a 12 month review process

2. Any programme can be shut down at any time. So, I’d dispute the idea that if a programme doesn’t improve after ‘special measures’ it can be shut down any more than any other. Of course, we should ask questions (and seek to improve programme delivery) but as I have said before, I’d be as likely to close a programme that repeatedly gets an A++ as they are not being ambitious enough. I have been struck by partners in Kenya who say they have good programmes because they have a track record of A+ scores. Not necessarily.

3. If a partner has intentionally lowered targets to get a better score, they are as likely to be shut down as rewarded. (In any case partners don’t set their own targets…)

These comments goes back to my point on an earlier blog that scoring is an art as much as a science – it needs to take into account context and circumstances. A poorly performing project in a tough context might be the right project whereas alone delivering in an easy context might not.

On the recommendations, I’d agree we need a better view of outcomes in Annual Reviews (and we are into it). Just as we need to challenge theories of change to make sure delivering outputs equals outcomes..

I agree that process indicators sometimes make more sense than outputs – this is a key feature of the Smart Rules and already happens widely (I signed one off last week).

On a log of internal targets over time, we have the beginnings of this internally and will soon be able to track through our new Aid Management Platform

Welcome thoughts and disagreement.. I have changed jobs so these are my personal views rather than a formal reply.

Pete
In a car in Nairobi traffic on a Smart Phone so apologies for any spelling/ grammar.

Reply
Donna Loveridge

May 18, 2016 at 5:46 pm

This has been an interesting series. Good also to hear from Pete that tracking changes in targets (and indicators?) is underway.

I agree with Michael about the donor or 3rd party setting targets. A more constructive approach would be to have the donor, implementers and other key stakeholders sit together to discuss the TOC that sits behind the log frame (if there is one) and establish targets that are reasonable but also a stretch. However, managing to achieve the stretch requires trust between the donor and implementers.

Also it is useful to build the evaluative capacity of implementers. Perhaps this might be one way to mitigate gaming. Given all the challenges with log frames, If a (complex) programme is only monitoring against its log frame I am never very confident of their level of effectiveness. Something else is needed.

Cheers
Donna

Reply
aidleap

May 18, 2016 at 6:53 pm

Thanks for the comments everyone.

Yes, I thought the suggestion of external teams setting targets would be controversial 🙂 When a programme is going well, there is trust between the implementer and the organisation, and transparency on costs, then a negotiated settlement works great. However, when trust breaks down in my experience DFID tries to use the logframe to exert control over the project. But because DFID doesn’t have the resources to set or properly judge targets, in practice a poorly performing implementer has far too much ability to set the targets that they want. I think that this is exacerbated when you get private organisations implementing aid projects, when DFID can also (not unreasonably) wonder whether the organisation is making excessive profits on the project.

Perhaps the solution is a negotiated target in most situations – but with a team of specialist target setters who can be called in if any disagreements arise. At the moment, DFID just doesn’t have the resources to handle target changes properly.

Reply
Enrique Mendizabal

May 18, 2016 at 10:26 pm

It would be interesting to see WHO undertakes the reviews/evaluations? If they are independent -how independent are they when they are likely to be the same organisations (companies, think tanks, NGOs) that also bid for DFID funds to deliver projects?

Reply
Colin Risner

June 15, 2016 at 9:32 am

Sorry for late posting – maybe this stream is dead– thanks for the analysis of DFID review scores. As someone who as many times been both the victim (team leader of DFID projects) and the perpetrator (leader of evaluation teams) I could say much, but the basic reality is that the process, including the scoring, is always largely political and hides behind a myth of objectivity and independence. This is not to diminish its value or importance. I am not contradicting myself here – the thing has great value just NOT as a trully independent and objective assessment of project performance – no such thing exists. The motivational impact of reviews – especially the scores – on implementation teams and counterpart government agencies is I think especially significant and needs to be carefully considered.

An interesting bit of comparative info – several years ago I saw a meta analysis of EU project scoring. I am not sure if such an analysis is available now but given the standardisation and brevity of the EU evaluation method and the huge number of projects in many countries it was very interesting. I suspect the EU may be much less transparent than DFID so maybe such meta data is not generally available — referendum relevant issue perhaps!

Basically of the 5 standard EU criteria at that time I drew the following conclusions — of course there were many exceptions but in general — (1) relevance – this scored low – perhaps because the projects had taken so long to come to implementation that they had ceased to be relevant to dynamic local conditions (2) Efficiency — this scored even lower than relevance — my guess because of the awful EU administrative procedures and systems that acted as an effective barrier to flexible, timely implementation (3) Effectiveness – perhaps surprisingly this scored the highest – my optimistic conclusion was that project teams on the ground, faced with an outdated, badly designed – or chronically over designed- project and hampered by all sorts of externally imposed barriers and contraints just rolled up their sleeves and said — “well lets get on with it and do some good anyway!” — or to put it another way “we won’t let the bastards defeat us” . Perhaps it’s an innacurate representation of management theory but I put this down to what I understand to be “the Hawthorne effect” — or at least the idea that human performance is not a simple function of objective criteria like working conditions but depends on such process issues as team dynamics, leadership, expectations — ie choice! (4) Impact and Sustainablity — these both grouped very closely around a median – it is very difficult for reviewers to assess either while projects are underway so I guess the strong tendency was towards a safe scoring option .

Not sure how much relevance the above has to DFID scoring — but I agree with the recommendation to focus more on process across the board.

Colin Risner

Reply