Evaluations should enable development practitioners to learn from current programmes, and improve future ones. Sadly, they currently fall far short of that ideal. Most evaluations are short, shallow, and completely ignored. There are a couple of reasons for this:

From the brilliant http://freshspectrum.com
- Bad knowledge management. Large NGOs and consulting firms may commission hundreds of evaluations a year. However, these are generally not stored publically, and often not stored internally either. Consequently, there is no way for this information to be used for anything.
- Easy to suppress negative evaluations. While some evaluations do get read, they will generally be the positive ones. It is extremely to suppress any negative evaluations.
- Poor design. The terms of reference for evaluations are often impossible to implement. They may have vague or inappropriate evaluation questions, or pack a huge number of questions into a very limited number of days.
- Poor execution. Evaluation consultants are often underqualified and overworked. Especially for more technical jobs, they may just not have the skills or knowledge required.
This is ultimately a problem of incentives. There is no real incentive for evaluations to be transparent or high-quality. Neither donors nor NGOs have any desire to showcase their failures, and overwork and staff turnover prevent them from learning from experience. Donors sometimes hold organisations accountable for the result of evaluations – but they seldom consider the quality or appropriateness of the work.
So what could the solution be? I believe that we need a peer-review system for evaluations, similar to that for academic articles. This could establish and utilise criteria for a quality evaluation, and the peer-review itself could be performed by academics, consultants, or monitoring and evaluation specialists at different organisations.
This can be linked to an evaluation database. Each planned evaluation should be registered before being conducted, and then the peer-reviewed report will be placed in the database, alongside the peer reviewing and any other relevant material. The Millennium Challenge Corporation offers a great example of this. .
This will need to be enforced and supported by donors, at least initially. Donors should signal that participation in this quality assurance process will be rewarding for the organisation. Perhaps bids that commit to this approach could be favoured over those that don’t.
If successfully managed, this could address a number of the problems that begun this article. The design and execution may still be poor – but with the knowledge that the evaluation will be peer-reviewed and openly available, there are greater incentives to improve. It would no longer be possible to cherry-pick the best evaluations for publication. Finally, and perhaps most importantly, all these evaluations would be readily available, so the knowledge wouldn’t be lost.
There are of course challenges facing this approach. The database would need to cover a wide range of potential evaluation types, and not be trapped into only considering a certain approach (such as randomised control trials). Finding peer reviewers would be the hardest part – this may need to be a paid rather than voluntary role, especially given the time pressure to produce and use evaluations.
Such a database could be managed by a small, independent secretariat. The best approach, however, would be for it to be taken on by an organisation specialising in evaluations; perhaps ODI or IDS, 3ie (if they can get over their RCT obsession), Learn MandE, or Better Evaluation. Any takers?
p.s. If you enjoyed the post, see our discussion in the rough notes page, and please leave your comments below!
p.p.s. I noted after writing this that the Big Push Forwards conference report proposed a different but related idea:
“Promoting a ‘kite mark’ for evaluators and purchasers of evaluations who ascribe to certain ethical, moral and developmental standards of monitoring, evaluation and impact assessment. This would seek to create a ‘guild’ who would promote these standards and in so doing publicall and confront poor evaluation processes, evaluators and in appropriate ToRs.“
Another issue is that without sound and ongoing monitoring, sufficient and meaningful data is often not available come time for evaluations. And negative biases in evaluation can be just as problematic as positive ones, both damaging to partnerships: http://www.how-matters.org/2011/04/17/got-em-an-evaluation-story/ Bad evaluations can end up being the expression of top-down, one-sided mechanisms for accountability, especially when the appropriate cost and complexity needed for an evaluation has not been considered. (Does the $50K project need a $30K evaluation?) Evaluation implemented solely for the purpose of accountability can undermine the effectiveness of the very interventions it is trying to measure, and makes evaluation a tool for policing, not learning.
That’s definitely all true. Building an evaluation in from the beginning (or at least earlier) is now dfid’s preferred approach, and allows the evaluation to help the programme develop – at some potential cost of independence.
I agree that evaluations can be a mechanism for controlling programmes – but if anything, that applies even more to monitoring. I think both have very similar problems with generating information that is considered credible, and to get people to use it.
Thanks for kind words on my cartoons 🙂
Pingback: Reflection on 2013: Merci beaucoup | AID LEAP
Pingback: Archaeological Evaluations | AID LEAP
Would a league table of the results be published in the press. i.e. NGO Premier League Champions are OXFAM. UNICEF to be relegated to the second division. Interesting to put it in the London Times every year. It will help people know where their money will be better spent. Some smaller ngo’s doing good work but not known could gain more prominence if there was sort of measuring of worthwhile spend and publishing of the same in the public domain.