I am David Bernstein, owner of DJB Evaluation Consulting and Director of Evaluation for the General Commission on Religion and Race of the United Methodist Church.
There is a long running debate among Springsteen fans regarding whether Mary’s dress “waves” or “sways” in the song Thunder Road (1975, Born to Run). For many years, I insisted the dress waved, not swayed. At a Rock and Roll Hall of Fame exhibit featuring Springsteen memorabilia I saw his hand written lyrics. Sure enough “Mary’s dress swayed.”
In a similar manner, for years I was convinced that Likert-like scales needed a mid-point to represent “neither positive nor negative.” I was apparently not alone.
Rad Resources:
Some feel that a Likert scale is not a Likert scale unless there is a midpoint, multiple items/issues being probed, and parallel structure between the positive and negative perceptions on either side of the midpoint. See for example http://www.john-uebersax.com/stat/likert.htm.
The design of Likert Scales has previously been discussed on AEA365, for example Nyame-Mensah on February 14, 2016, Losby and Wetmore on December 27, 2012, and Aguirre on May 6, 2010 (http://aea365.org/blog/?s=Likert&submit=Go) as well as numerous times on EvalTalk.
Why Likert-like, not Likert? Likert scales should be empirically tested to demonstrate that the “distance” between each number on the scale is identical as interpreted by respondents and reliably used. Often there is neither time nor resources to do cognitive testing of surveys, although pre-testing is recommended. So Likert-like is one term that could be used. I call them rating scales to placate the perfectionists. Scales are ordinal, representing categories of perceptions, not interval, representing numerical differences, Scales are ordinal, and so the mean is only valuable as a relative sense-making symbol, not as a real numeric value.
As an evaluator who uses mixed-methods, I know the world is gray, not black and white. I know people can genuinely have a “neither positive nor negative” viewpoint. However, I have become convinced that many people receiving human and social services don’t really like to be negative towards the people providing their essential services, and use the midpoint to express themselves as being “less than positive.” The midpoint does nothing for those needing feedback about their services.
This is how I became a “forced choice” proponent. I will use a midpoint when appropriate, but more often than not, my scales are four point, not five point. I am now a forced-choice cheerleader, praising the virtues of even numbered scales. Unfortunately, midpoint scales wave, they don’t sway.
Hot Tip:
As Losby and Wetmore pointed out in their 2012 AEA365 post, some respondents may be using the mid-point to express that they have “no opinion” or that the question is “not applicable (N.A.) to them, so the choice of the midpoint actually creates an invalid result (measurement bias). Make sure to include a “not applicable” category if appropriate.
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
Well put! I especially like the tip on having a not applicable option as there are times where people are simply unsure. It will likely prevent the results from being skewed. When taking surveys, I catch myself using the midpoint when I am unsure of the statement/question. I also find that I veer towards the more positive side depending on the subject of the question on surveys that don’t have a midpoint. Overall, there are always people that will pick the midpoint to avoid conveying their positive/negative attitude, but I do agree that using an even-number scale may help combat this.
Thanks David,
Have you heard of the Net Promoter Score? I like that scale because of the interpretive framework. It is a 0 to 10 scale, so 5 is a true mid-point. However, the interpretive framework is that anyone selecting 0 to 6 is considered a detractor. That made a lot of sense to me. A rating of 6 or “it was okay” is not a good recommendation/rating. It is above the mid-point but does indicate that improvement is needed. So, maybe you don’t need to get rid of mid-points, but rather have an interpretive framework that considers mid-point ratings as indication that improvement is needed.
Hi Paul, and thanks for your comment. I have not heard of the Net Promoter Score. I do like 0 to 10 scales, particularly for getting the reading of a group during a focus group. For larger, short-term type data collections, I don’t think most respondents “think” in 0-10 scales. That is, while not a bad way to go, I think it is easier for quick responses short surveys to have a shorter survey. If you provide a statement and ask for a rating from Strongly Disagree —> Disagree —> Agree —> Strongly Agree (with or without numbers), people can fairly quickly decide where they fall on the scale. On a 10 point scale, the question becomes a more complex, burdensome process on the respondent, IMHO. My thinking would go like this. “OK, I agree, but not really strongly. If 5 is a midpoint, that leaves 6 and 7, maybe 8 as possible responses, since I am definitely not a 9 or 10 on this issue. Well then, what’s the difference between a 6 and a 7, or a 7 and an 8? Who knows, I’ll just pick one.” I think the 10 point scale may call for a level of precision that may not be completely necessary, and probably begs for doing some cognitive testing and/or reliability testing. That might be a bit much for smaller nonprofits like the ones for whom I work.
Hi David, Yes, there is research that back up that a 7-point scale would perform better, but the length of the scale isn’t really my point. The point I was trying to highlight is maybe what is needed is an “alternative” interpretive framework, rather than abandoning the mid-point. For example, you could agree that a rating of simply “agree” means the program is doing a fine job, anything below that means improvement is needed, and rating of “strongly agree” means the program is doing great. Instead of reporting averages, you could then report the % of “Strongly Agree’s” (I.e. your doing great in this area) – the % of Netraul’s, Disagree’s, Strongly Disagree’s (i.e. you need to improve in this area). I think that stat would help program managers more immediately and accurately understand where participants think their programs are doing well and where they think it needs to improve.
Thoughts?
Hi David: Thanks! This framing of the scale issue is really helpful!
Great post! I swing back and forth on the midpoint issue and usually it depends on the topic or item at hand that makes the decision.
My biggest concern is, knowing that people may choose the midpoint if they do not want to select a “less than positive” choice, that they will instead select the positive side of where the midpoint would be.
I suppose this just calls to the importance of including open-ends and qualitative studies to dig into the “why” behind these scales. Does anyone have a particular way of getting that important “why” feedback that is working great for them?
Hi Julie, and thanks for the complement. At the General Commission on Religion and Race of the United Methodist Church (GCORR), we noticed a consistent pattern of “negative” ratings disguised as midpoint “neither positive nor negative.” With a lot of our programs on religion and race we are both literally and figuratively “preaching to the choir.” That is, the people that find their way to our webinars and trainings come from churches and broader areas (annual conferences) that already support our work. Therefore, they are sometimes less inclined to be directly critical, even when we want them to be…it’s a cultural thing with well-meaning people of faith. I worked with our management to shift to the four point scale, and we added an open-ended question to encourage feedback: “If you ‘Strongly Disagree’ or ‘Disagree’ with any of the statements above, please provide the statement # and indicate what GCORR can do to improve.” The other thing we are doing, now that we have a system of survey templates to collect routine feedback, is to try to conduct more in-depth data collections early in the development process, such as focus groups. We don’t have time or resources to redo our work, so getting more in-depth feedback early in the process provides guidance on how we can best serve our stakeholders. I will be presenting on the GCORR evaluation framework at the Eastern Evaluation Research Society Conference in Galloway, NJ (May 5-7) if you are interested.