Spurious Precision – Leading to Evaluations that Misrepresent and Mislead by Burt Perrin

Hello, AEA365 community! Liz DiLuzio here, Lead Curator of the blog. This week is Individuals Week, which means we take a break from our themed weeks and spotlight the Hot Tips, Cool Tricks, Rad Resources and Lessons Learned from any evaluator interested in sharing. Would you like to contribute to future individuals weeks? Email me at AEA365@eval.org with an idea or a draft and we will make it happen.


Sometimes it is helpful to be very precise. On the other hand, false precision – giving the air of precision without the data to back it up – can be irrelevant and/or misleading at best and, at worst, can destroy, rather than enhance, the credibility of your evaluation – and of you. Hi, I’m Burt Perrin, and I’d like to discuss what considerations such as these mean for evaluation practice.

If one is undergoing brain surgery, one would hope that this would be done with precision based upon established knowledge about the procedure. That said, a surgeon can be no more precise than the underlying data about their procedure permits. Similarly, an evaluator can be no more precise with their conclusions than their underlying data permit, and yet attempting this is where too many evaluations go wrong.

A recent study drawing sweeping conclusions about the practices of evaluation managers is a case in point. The study was based upon a response rate of less than 2%, with the researchers saying that the small sample size did not matter as this was “an exploratory study.” There were numerous other flaws such as questionable interpretations of responses to many of the questions. Despite this and numerous other shortcomings, findings were reported using multiple decimal points implying a level of precision that was unjustified based on the research methods. Such “precision” was hardly justified by the underlying data. This, unfortunately, is not an isolated example.

Too often, analyses are presented to two decimal places because this is the SPSS default, which, unless the underlying data set permits this degree of specificity, would be meaningless and misleading. Surely, with, for example, a typical margin of error of at least 3%, the absurdity of such spurious precision should be evident.

There are dozens of other factors that similarly might limit the ability to be overly precise, such as sampling error, measurement and instrument error, considerations regarding questionnaire design and manner of administration that have been shown to affect responses. These include, but go beyond, question wording, question ordering, variables concerning questionnaire implementation in practice, coding errors, variable (mis)understanding of the questions and response options, demand characteristics of the survey context, timing, and many more. With less than a 100% response rate, there is a very strong potential of non-response bias distorting findings. (I view this consideration as particularly important, and I will return to this in a separate blog posting. Watch out for it.)      

Meaningful analysis needs to consider possible influencing factors such as the above, bearing in mind that even the most accurate measurement can only be specified within a range. When factors such as these are not taken into consideration, the meaningfulness of the resulting data, analyses, and conclusions can well be called into question, irrespective of the number of decimal points used and reported significant levels. 

Rad Resource

Effects of misplaced precision can distort overall approaches to evaluation, and to program design and implementation, leading to a focus on what is easy to measure rather than what is most important.

A key learning: Generally, the more complex an issue, the more difficult it is to specify with precision, with danger of falling into what Scott Chaplowe refers to as the accountability trap. <https://socialinnovationsjournal.com/index.php/sij/article/view/704>

I’ve spoken and written about this topic extensively, including in the AJE (https://journals.sagepub.com/doi/abs/10.1177/109821409801900308 _. As John Maynard Keynes has observed: “Better to be approximately right than exactly wrong.”

Burt Perrin (Burt@BurtPerrin.com), formerly from Canada and currently living in France, has been engaged in evaluation internationally for many years, for example as a leader and founding member of the CES, EES, and AEA. He has authored another AEA365 blog posts,. including about the backfire effect (Facts and evidence make no difference – or worse) and about bureaucracy and evaluation (Can evaluation help make bureaucracy more responsive – or is it part of the problem?).


Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.

1 thought on “Spurious Precision – Leading to Evaluations that Misrepresent and Mislead by Burt Perrin”

  1. Hi Burt. Thanks for prompting the following thoughts of my own…expressed also via an email list
    “Hi all
    This posting is been prompted by Burt Perrin’s AEA Tip-of-the-Day posting which I have seen today on the subject of spurious precision. Seen here on this link https://aea365.org/blog/spurious-precision-leading-to-evaluations-that-misrepresent-and-mislead-by-burt-perrin/

    I’ve just been analysing some data for other members of an evaluation team that I currently belong to. It is the kind of dataset that people use when doing QCA (qualitative comparative analysis). But I was analysing it using my own app known as EvalC3. (BTW, I don’t this choice makes a difference to the argument below)

    I was able to find a configuration of case attributes which were sufficient for the outcome (of interest) to be present. Such a judgement of sufficiency involves a form of precision, in that there should be no cases of the outcome not occurring when the configuration of attributes is present.

    The interesting thing I found, not for the first time, was when I stripped down this configuration of attributes I found one particular attribute which was an even better predictor of the outcome. That is in the sense of identifying more cases with the outcome. But it was at the cost of having a number of false positive cases that is had falsely identified as having the outcome when in fact they did not. Here I was in effect dealing with a probabilistic model, with a degree of built-in error. But saving grace was it had a wider range of applicability.

    If this model was all to do with what works and doesn’t work within the field of brain surgery then these false positive errors would have been totally unacceptable. But in a lot of development work the cost of false positive errors may not be so dramatic, and may be worth the trade-off of achieving a desired outcome in a wider number of cases. Deciding on the merits of the trade-off would probably require a fairly context by context discussion.

    Nevertheless the possibility does throw into doubt whether methods like QCA, and perhaps more fairly, the users of those methods, place too much emphasis on the identification of configurations which are necessary and/or sufficient, at the cost of ignoring those which are more probabilistic.

    I would be interested in your thoughts, readers.

    Regards, Rick Davies

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.