Finding the Waldolance among sedans: Verbal cues can guide search for societally important vehicles

Finding an image among other images is a basic task—visual search—has been a staple of the vision scientist’s toolkit for decades. Even if one has no interest in visual search itself, the basic paradigm of searching for a target among an array of images is wonderfully useful for investigating memory, attention, and individual differences.

In a typical visual search experiment, participants are informed of their target by being shown an image of the target—like searching for Waldo in a scene. A remarkable (although often overlooked) fact is that we can also simply tell one another what to search for using words.

One can be told to find “the man in a striped shirt” — or, provided the searcher has the right kind of knowledge — to “Find Waldo.” Let’s call these verbal cues.

But why and how does this work? After all, a perceptual cue—even a relatively poor one like the one in the figure below— still shares many visual features with the target and so can directly inform the visual process.

Credit: Martin Handford

In contrast, there is nothing visual about spoken language (or so it would seem). The word “Waldo” is neither person-shaped, nor striped. “Waldo” doesn’t even have a knit cap. So how is it then that we can ask someone to search for something merely by using a word?

One answer may be that on being verbally told what to search for, the searcher proceeds to compare each object to a memory activated by the word or phrase. For example, asked to search for Waldo, the searcher would compare each character with their memory of what Waldo looks like. Woman in yellow dress… not Waldo. Crusader clutching his head… not Waldo. Red-and-white striped man of ambiguous age… Waldo!

Something like this likely happens on occasion. But as many studies have shown (for example, here, here, and here) verbal cues can also guide search. Just as a visual cue like a picture of Waldo can create perceptual expectations of what to search, so a verbal cue (a description, or simply a name of a familiar object or person) can create visual expectations that affect where a person first looks when presented with a search display.

One of the factors determining the extent to which verbal cues provide useful guidance depends on how perceptually uniform members of a given category are (e.g., see here and here). All else being equal, a cue like “Waldo” should be more effective in guiding visual search than a cue like “man.”

In a recent article published in the Psychonomic Society’s journal Attention, Perception, & Psychophysics, Michael Hout and colleagues studied verbal cueing of a variety of vehicles.

The authors compared people’s search times for “societally important vehicles”—such as ambulances, school buses, and police cars—to people’s search performance for more conventional vehicles, such as sedans, SUVs, and delivery trucks. Hout and colleagues reasoned that to the extent that societally important vehicles have been designed to be more perceptually uniform, verbal cues—“search for the ambulance”—will guide search more effectively than cueing less perceptually uniform categories such as “sedan”.

On each trial, participants saw a verbal cue (e.g., “an ambulance shown from the front”) followed by a display of 20 images, one of which (the target) matched the description. The non-matching images (distractors) mismatched in a variety of ways. Two of the twenty were street signs and so completely outside of the target category. Fifteen were what the authors called “irrelevant distractors”—cars that did not match the cued category, e.g., a sedan when the target category was an ambulance. The remaining two distractors were “relevant distractors”, e.g., ambulances from the side and back instead of from the front.

Participants searched for the target object and when they thought they found it, pressed the spacebar. At this point the images were replaced by numbers, and 2 seconds later, participants had to indicate which of two numbers provided as choices corresponded to the target location. This was a way of verifying that participants had indeed located the object in the earlier display. The figure below illustrates the procedure.

Not surprisingly, societally important vehicles were found faster (by as much as a full second!) and with lower error rates than civilian vehicles. But the question the authors were interested in was not just how well participants could find an ambulance compared to an SUV or sedan — after all, ambulances are designed to be especially noticeable — but whether a search for an ambulance was guided more effectively than search for more conventional vehicles. To find out, the authors measured eye movements as participants were searching. Taking eye movements allowed the authors to quantify guidance in several ways. The first was simply the amount of time it took to initially fixate their eyes on the target. As the first graph below shows, participants were faster to do so when searching for important vehicles. A second measure was whether participants visited different numbers of distractors when searching for important vs. civilian vehicles. Regardless of which kind of vehicle they searched, participants were about equally likely to fixate on a partial match, e.g., an ambulance from the side instead of the front. However, search for important vehicles elicited fewer eye movements toward civilian vehicles than vice versa. The last graph shows a similar result using the proportion of distractors.

What allows for the greater guidance when searching for important vehicles? One answer may be diagnostic color information. Not all yellow vehicles are school buses, but a yellow vehicle is more likely to be school bus than a non-yellow vehicle. In a follow-up experiment, new participants performed the same search task, but with all color information removed. The difference in guidance between important and civilian vehicles now shrank considerably. This result is consistent with the idea that a verbal cue activates visual representations of what Yu and colleagues have called “category consistent features,” for example, yellowness when searching for a school bus.

That verbal cues can guide search in this way raises a number of interesting questions concerning how language and perception interact. One such question is whether there is a difference between visual and verbal cues. Does it matter if one is cued to find Waldo with a picture of Waldo or a just a word? Considering that visual cues actually contain visual content, it is hardly surprising that they tend to be more effective in guiding search. But as shown in an earlier paper by Walenchok, Hout, and Goldinger, the advantage of visual over verbal cueing is surprisingly small, especially considering that spoken words do not actuallyhave  any visual content. Moreover, while visual cues that match the target exactly are very helpful, searchers pay a substantial cost when cued imprecisely, for example, with a brown cow when the target is a spotted cow, even when the task calls for searching for any cow. In contrast, verbal cues may allow searchers to “transcend the tyranny of the specific” enabling for efficient searching of entire categories.

Article focused on in this blog post:

Hout, M. C., Robbins, A., Godwin, H. J., Fitzsimmons, G., & Scarince, C. (2017). Categorical templates are more useful when features are consistent: Evidence from eye movements during search for societally important vehicles. Attention, Perception, & Psychophysics, 79, 1578-1592. DOI: 10.3758/s13414-017-1354-1.

 

Author

  • Gary Lupyan’s primary research interest is understanding how language shapes the human mind. To what extent is human cognition, actually language augmented cognition? To answer this question, Gary uses a wide variety of techniques that attempt to manipulate linguistic variables and observe the consequences of these manipulations on putatively nonlinguistic behavior. Gary’s methods include lab-based and crowdsourced behavioral studies, neural network modeling, statistical corpus analyses, and neurostimulation techniques such as transcranial direct current stimulation (tDCS) and transcranial magnetic stimulation (TMS). In addition to understanding effects of language on cognition and perception, Gary is deeply interested in what environmental circumstances led to the emergence of language. Language marked a major transition in the history of life by providing a secondary information transmission medium—culture—and its evolution was, as far as we know, a singular event in the history of life on earth. Gary attended Carnegie Mellon for graduate school, working with Jay McClelland, followed by postdoctoral work in cognitive neuroscience at Cornell University and University of Pennsylvania. Since 2010 he has been an assistant professor of psychology at University of Wisconsin-Madison.

    View all posts

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like