My name is David Roberts and I’m an independent consultant in evaluation and market research working out of Canberra, Australia. We recently had a discussion on AEA’s LinkedIn group focusing on using scales in surveys. While I am not an expert in analyzing scales here are some things I have found useful:
Rad Resource – Research from Jon Krosnick: Scroll down on the bio page from this Stanford professor to review summaries of his research on scales and to access study reports.
Hot Tip – Scaling Approach 1: One option for scaling responses is to analyze each individual’s responses and then score each response against the range of that individual’s responses. The simplest way to do so, is to treat each individual’s normal responses as varying around 0 and score accordingly. So if one person consistently rates between 4 and 5, a 4 is re-scored as -1 and a 5 is scored +1. Other responses are re-scored in terms of their distance from that individual’s median score. You can then analyze the scores for each question rather than the raw responses. It works better if you use at least a 7 point scale (Krosnik’s work suggests you should do that anyway). You can also use more sophisticated scoring methods based on range and standard deviation of the individual’s responses, but the utility of such an analysis is marginal for most applications.
Lessons Learned: The usefulness of this approach depends on the time you have for analysis and whether you are asking questions where one can expect an individual’s response bias to be reasonably consistent across questions. It allows you to explore nuances within the overall bias and to find areas that are more positive or negative than others. If you have key questions, you can develop baseline response data by asking questions in which one might expect a similar bias to the key questions and then seeing how the responses to the key questions differ from those for baseline questions.
Hot Tip – Scaling Approach 2: I’ve also tried a simpler approach, which is to treat the median response to a single question as 0 and then report the variations around that. (Please note the mean is not useful here – scale responses are ordinal values so the emotional ‘distance’ between 1 & 2 can be very different from the ‘distance’ between 2 & 3.) This approach can make the presentation of data more understandable.
Lessons Learned: However, the approach is problematic because it assumes that the bias is consistent across the sample, or that the individual biases in a large and diverse group cancel each other out. You may not be able to make either assumption.
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
Hi Krystina
I should start by saying that I am dubious about the value of absolute rating scores. I used to work in an agency that consistently got 85-88% satisfaction ratings but every customer I talked to had strong criticisms of the agency. I simple could not find any way to reconcile the satisfaction ratings with the personal experiences people shard with me. So I have been looking for ways to get meaningful data from ratings.
The reason I talk about the individualised score first is that I prefer it. I probably shouldn’t have treated the second method as a hot tip at all. I have grown wary of using the second method, for the reasons I gave. Bottom line: the second method has most of the disadvantages of scales. The only real advantage is that you can more easily show the distribution around the median score. It has the additional disadvantage that the median score is obscured (the median becomes 0).
In my view the individualised relative score is a better tool for understanding how people rate something. I would use it instead of trying to give an overall satisfaction rating. It gives you a rating for each question asked relative to the other questions asked in the same survey. I came across it in an article by a market researcher but I have lost the reference, sorry.
Absolute scores are often meaningless, particularly satisfaction ratings, because such ratings depend in part on expectations and because each individual has a bias. Young people are generally biased to the extremes of rating scales, old people towards the middle, optimists and those who want to say positive things are biased towards the positive and others are biased towards the negative. It also varies with mood. I completed a survey with rating scales the other day. I was in a foul mood and all my responses were negative. If I did it today, I would have been much more positive. Looking at an individual’s scores allows one to see how each response compares to that individual’s bias on the day. (It also gives you an indication of when respondent fatigue sets in – very important when thinking about he value of individual data items). This approach allows you to say that x% had a more positive (or y% more negative & z% similar) response to this question than to the other questions in the survey. It allows you to find the areas where people have different responses. You can then look at the demographics to analyse which people share those ‘relative ratings’: or use factor analysis to identify segments with particular patterns of responses.
As an aside there are also issues with using scales to get a handle on behaviours, e.g. for frequency of behaviours, number of hours watching TV that sort of thing. Schwarz and Hippler (1986) showed that people take information from the scales about what is normal, and then frame their response on the basis of where they think they fit in the norm. If your scale goes from 1 hour per week to 10 hours a week: and my scale goes from 3 hours a week to 30 hours a week – the median score for my group will be 3 times the median score for your group. The distribution of responses across the scale points stays the same regardless of quantum indicated by the scale.
I found this posting to be very interesting and would like to know more about why and when to use such methods. Is there a way to contact the author?