Survey says . . . let the participant decide

Describe your current level of pain using the scale below.

Diagram showing pain scale level with different colors illustration
Image of visual analog pain scale with 10 options. Image by brgfx on Freepik www.freepik.com

Use the scale below to indicate your current level of happiness.

Three emoji faces. One smiling, one neutral, one unhappy.
Image of 3-point happiness scale. Image by juicy_fish on Freepik www.freepik.com

Rate your degree of agreement for the following statement using either a 4-, 6-, or 11-point scale of your choosing from strongly disagree to strongly agree.

“I lead a purposeful and meaningful life.” From the Flourishing scale by Diener et al. (2010)

Surveys are the foundation upon which social and behavioral research assesses everything from favorite brands of toothpaste to life satisfaction. Consumer researchers, education professionals, and psychologists from all sub-fields depend on the outcomes of these surveys to identify trends, evaluate knowledge, or diagnose disorders. Because so much of our world relies on what and how people think, developing reliable, valid, easy-to-deploy, fast-to-complete, and immune-to-individual quirks-of-personality-comprehension-circumstance surveys, survey development encompasses its own psychometric discipline.

Many organizations around the world are dedicated to researching and establishing best practices for survey development and quantitative and qualitative assessment of those responses. Some examples include The American Psychological Association testing and assessments section, the Institute of Education Sciences, and the International Consumer Research and Testing. Not only do these organizations provide guidelines for best practices, but they also sponsor a number of meetings and workshops to help direct developers of surveys and instruments to share their findings and recommendations.

Although the Psychonomic Society is not a dedicated organization to survey development and assessment, the cognitive sciences are instrumental in developing quality surveys and assessments. Attention, short-term memory, cognitive load, clarity in language, decision-making, and cognitive biases represent some of the constructs that influence a respondent’s behavior when interacting with a survey. I initially became aware of the intricacies of survey development during a previous Psychonomic meeting when I listened to a talk about the magical number of points of a Likert scale that facilitated the most reliable outcomes: 5, 3, 7, or 11. Although I don’t remember the final recommendation specifically (too many years have gone by), I do recall that somewhere between 4 and 6 pts for a rating scale produced the most reliable results. This basic knowledge influenced my own survey-based research on student and general population responses to human-animal interaction programs or manipulated emotional responses due to background music and pre-existing biases.

Thus, when I came across the study designed by Tanja Kutscher and Michael Eid (pictured below) and published in the Psychonomic Society’s Behavior Research Methods, I was curious about the current state of scale development.

Authors of the featured article, Tanja Kutscher (left) and Michael Eid (right).

The first insight I experienced from their study was that the state of scale development research had advanced significantly since I was first exposed. Not only did the number of responses matter, but the sample surveyed, the education level, the response format, the construct, previous experience, and whether or not the scale was provided for the respondent or allowed the respondent to choose mattered as well! Additionally, the theory and quantitative assessment applied to assessing the optimal format of a scale had advanced to a level that surprised me, although I suppose it shouldn’t have.

The primary topic of their paper was to explore one of the most important drawbacks of surveys – the inappropriate category use. Apparently, survey respondents are notorious for deviating from the intended use of the response categories and can be categorized into response styles. There are the ordinary response style folks who tend to fall in the highest probability of response categories. There are the extreme response style folks who prefer the extreme categories. There are also the semi-extreme response style folks who use the two extreme categories at each end of the scale. Finally, there are the non-extreme response style folks who stay away from the extremes.

This variation in response styles is what leads to bias in the validity and reliability of a survey. Response styles can influence construct validity (whether a construct like happiness actually exists and can be measured reliably) and criterion validity (whether the scale correlates with other scales that should be related to the construct of interest). Check out this website for a graphic summarizing the difference between these types of validity.

These response styles are one of the primary topics investigated by Kutscher and Eid in their study, in which they investigated response style patterns for three different response scales that were “given” (administered by the researchers) to respondents and then “chosen” (self-selected) by respondents from the general population.

Using the established Flourishing scale by Diener et al. (2010), which typically utilizes a 7-pt scale, Krutscher and Eid created three versions of the response scale: a 4-pt scale, a 6-pt scale, and an 11-pt scale. Respondents residing in the US (more than 7000) and were recruited from Amazon’s Mechanical Turk (MTurk) platform and paid US$0.50 for their participation, were randomly assigned to one of the three response scales. Following this section of the survey, the respondents completed demographic questions and instruments used to assess criterion validity, including personality traits, self-esteem, and general self-efficacy. The final section of the survey was an opportunity to take the flourishing scale again, but this time, respondents were able to choose whether they wanted to use the 4-pt, 6-pt, or 11-pt scale. Interestingly, the researchers presented one item at a time, and respondents could select whichever scale they wanted to use for each item.

The researchers focused on select items to test their primary hypothesis in which more respondents would fall in the ordinary response style in the self-chosen condition than in the given condition. Taking two items from the Flourishing scale, #2. My social relationships are supportive. and rewarding, and #5. I am competent and capable in the activities that are important to me, the researchers examined the percentage of response styles for each scale: 4-pt, 6-pt, & 11-pt.

As seen in the figure below for the responses for “given” scales, three classes were observed for the 4-pt scale and four classes were observed for the 6-pt & 11-pt scales. The resulting classes corresponded to ordinary response styles (Class 1, 40-46% of the sample), extreme response style (Class 2, 14-19% of the sample), range response style (Class 3, 11-35% of the sample), and a style that discriminated between extreme responses (Class 4, 23 & 34% of the sample).

Figure 1 of the featured article. Class-specific category characteristics curves for the flourishing items #2 and #5 in the “given” categories. Class 1 corresponds to ordinary response styles. Class 2 corresponds to extreme response style. Class 3 corresponds to range response style. Class 4 discriminated between extreme responses. Percents indicate percentage of sample representing that style. Numbers next to curves indicate the rating score associated with the shown distribution.

A similar, but tighter, outcome was observed for the self-chosen scale responses. As seen in the figure below for the responses for “self-chosen” scales, three classes were observed for all scales tested (4-pt scale 6-pt & 11-pt scales). Ordinary response styles (Class 1) accounted for 55-58% of the sample, extreme response styles accounted for 23-30% of the sample (Class 2), and range response style accounted for 13-20% of the sample (Class 3).

Figure 2 of the featured article. Class-specific category characteristics curves for the flourishing items #2 and #5 in the “self-chosen” categories. Class 1 corresponds to ordinary response styles. Class 2 corresponds to extreme response style. Class 3 corresponds to range response style. Percents indicate percentage of sample representing that style. Numbers next to curves indicate the rating score associated with shown distribution.

When the researchers tested their second hypothesis corresponding to criterion validity, specifically correlations associated with two aspects of personality (openness to experience & neuroticism), self-esteem, and self-efficacy, they found correlations between the flourishing items of interest and each of the individual trait characteristics tested. These correlations were more consistent across ratings collected in the self-chosen condition as compared to the given condition. This finding suggested that when researchers give a specific scale to respondents, the number of response options can affect the outcome of the study due to inappropriate category use.

The researchers ultimately stated that their “study showed that allowing individuals to choose their preferred rating scales significantly improved the psychometric quality of survey data. Compared to predetermined rating scales, self-chosen rating scales lead to more accurate responses, improved construct validity and increased reliability. This suggests that incorporating this flexibility into survey design can effectively address issues related to inconsistencies in respondent behavior due to predetermined response formats, ultimately leading to higher quality data in psychological assessments.”

So, we should all personalize our survey experiences and decide if we like a 4-pt, 5-pt, 6-pt, 8-pt, or 11-pt scale for each item to be evaluated. Or rather, perhaps we should simply ask our participants to pick their own scales.

Featured Psychonomic Society article

Kutscher, T., & Eid, M. (2024). Psychometric benefits of self-chosen rating scales over given rating scales. Behavior Research Methods, 1-25. https://doi.org/10.3758/s13428-024-02429-w

Author

  • Heather Hill is a Professor at St. Mary’s University. She has conducted research on the mother-calf relationship and social development of bottlenose dolphins in human care. She also studied mirror self-recognition and mirror use in dolphins and sea lions. Most recently, she has been studying the social behavior and cognitive abilities of belugas, killer whales, Pacific white-sided dolphins, and bottlenose dolphins in human care. She has also been known to dabble in various aspects of human cognition and development, often at the intersection of those two fields.

    View all posts

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like