A manager in a field programme that I evaluated recently showed me the glowing findings from his latest monitoring trip – based on a total sample size of two farmers. When I queried the small sample size, he looked shocked that I was asking. “It’s OK”, he explained, “We’re not aiming for scientific rigour in our monitoring.”
I regularly hear variants of this phrase, ranging from the whiny (“We’re not trying to prove anything”), to the pseudo-scientific (“We don’t need to achieve 95% confidence level!”) It’s typically used as an excuse for poor monitoring practices; justifying anything from miniscule samples, to biased questions, to only interviewing male community leaders.
I think managers use this excuse because they believe there is a difference between what we do (monitoring, finding things out, investigating), and what other people do (science, evidence, proof, academia). This is, however, a false dichotomy. There is no magic bar which you jump over, and suddenly find yourself conducting scientific or academic research.
Of course, there are some techniques which require high levels of time or expertise, and so are more appropriate for specialists in that area. This might include randomised control trials, or ethnographic analysis. These techniques are not, however, somehow more ‘scientific’ or ‘rigorous’ than others, but more suitable for certain questions.
For example, if you want to know the effect of a new medicine, then a controlled trial is the best approach. If you want to understand society from the point of view of the subjects of the study (and have plenty of time on your hands) then ethnography is more suitable. If you want to understand how a community-based intervention affected a community, then a focus group discussions mixed with a wider survey might be appropriate. Academics typically asks different research questions, and so use different techniques to those used in monitoring and evaluation.
This doesn’t mean that monitoring and evaluation departments are somehow justified in conducting terrible research. If your focus group discussions are dominated by men, a sampling method poorly constructed, or your analysis relies on cherry-picking data, it’s not suddenly OK just because you’re monitoring rather than an academic. Bad research is bad research, no matter who does it.
Monitoring and evaluation would get a lot better if donors, programme staff, and even some M&E professionals stopped comparing monitoring practices to some kind of imagined scientific ideal. Instead, it would be more profitable if they thought more clearly about the impact of the monitoring, and what it is designed for. Do you need to estimate the attributable impact of an intervention? Do you want to understand how this change has happened? Or do you plan to understand the context in which you work?
All are reasonable purposes, and all have different implications for the type of research that you might want to do. Pretending that basic principles of rigour and good research practice don’t apply, however, perpetuates the idea that monitoring and evaluation is a kind of glorified feedback form, a box-ticking exercise that gathers ‘results’ but doesn’t worry about what they mean.
Hard to disagree with the ideas in this post, but I think there are other things afoot that make M&E sloppy, perpetually late, or nonexistent. As the author says, it all starts with the data that field staff generate. However, sometimes donors are not overly concerned with the quality of M&E, since they have other, primarily political, reasons for giving money. This is a very real issue where the program is in an area barred to donor staff. Generalized, ground level observations from NGO staff often take precedent over quarterly/annual M&E requirements. I think we have all seen projects that continue to trudge on, despite what M&E says. The author refers to “whiny” and “pseudo-scientific” field staff. This is a very telling comment for me, since I have seen all too often the “cowboys” vs “bean counters” dichotomy when doing M&E. Program field staff might have constructed something over months or years at personal risk and often do not take kindly to a quick and critical M&E visit. This is particularly a problem when the M&E personnel do not have strong program implementation experience. I have never known a bad M&E staff who had substantial implementation knowledge. I would argue that matching “quality data” with the “reality” of a field level project is an art and not a technical exercise: quality data is only one factor, and sometimes not the most important. Finally, how many of us have had to give the “bad guys” medicine, food, transport, etc. in order to service target groups? I recall one donor telling me that a “25% loss level” was acceptable. How do you get that into an M&E report? Well, you don’t. Thanks. Duke
http://www.amazon.com/Handbook-Hopeless-How-Zone-Hallucinations-ebook/dp/B00T6EI3C8/ref=pd_sim_351_1?ie=UTF8&refRID=0RJ8MB7CZ135V5NKY80Y
http://www.amazon.com/Living-Dying-Dogs-Duke-Miller-ebook/dp/B00HS9O8PO
Thanks for your comment. I agree that there are many reasons why monitoring is bad – we look at some more here https://aidleap.org/2015/02/02/why-dont-we-ever-learn/
Your point about good m&e requiring field experience is a fair one. I’ve met plenty of m&e staff who have an unrealistic idea of what is possible, and annoy managers with clearly impractical ideas. Maybe a future blog should examine the relationship between the two.
Pingback: Bad Research is Bad Research. [full stop] | Jeff Bloem
At the risk of banging my own trumpet, this is *exactly* why we developed the Bond Evidence Principles, which try to set out a minimum floor – bad research is bad research, full stop – and then help people to think about the criteria for improving the quality of their M&E / evidence – with the idea that it can be good enough rather than great, but at least you should know the difference.
http://www.bond.org.uk/effectiveness/principles
The Dilbert cartoon does suggest a different reason why programme monitoring is so bad though, which is nothing to do with capabilities, approaches or mental models among implementers, managers or monitors / evaluators – and everything to do with incentives. There’s a great discussion of some of this in the earlier post, but an awful lot of it gets bogged down in structures and technical / disciplinary boundaries. These are important, but more fundamentally, most organisations don’t incentivise learning, they don’t make significant resourcing decisions based on robust data, and they see “M&E” as closing the loop on a process of upward accountability to funders that starts from the proposal on – rather than a feedback loop to implementers and communities. To paraphrase Dogbert, if you produce data that’s not for anything (except keeping up appearances), it’s hard to pretend that accuracy matters.
And the worst of it is that now I’m primarily managing implementation rather than evaluations, I’m just the same!
I’ll also bang the ‘drum’ (or blow your trumpet?!) for the BOND Evidence Principles, but also for the recent DFID report by Leslie Groves on beneficiary feedback in evaluation: https://beneficiaryfeedbackinevaluationandresearch.wordpress.com/author/lesliecgroves/. This speaks very well to the concern you raise on M&E being too often seen as ‘closing the loop on a process of upward accountability’. Irene Guijt and Leslie Groves are hosting a webinar on this very subject next week – https://beneficiaryfeedbackinevaluationandresearch.wordpress.com/author/lesliecgroves/.
I have been working on a research project at the University of Bath to try to tackle the ‘attribution’ problem in impact assessment of rural development projects, and can honestly say that the field staff we have been working with in Malawi and Ethiopia were as concerned, if not more, about academic rigour as we were! You can find out more about the Qualitative Impact Protocol (QUIP) at http://go.bath.ac.uk/art
Pingback: Links I Like [6.15] | Jeff Bloem
Pingback: Summer reading | kirstyevidence
Pingback: What does changing complex systems look like in practice? | AID LEAP
Pingback: Why is programme monitoring so bad? | WhyDev