Eating dinner or grandma? Patterns of intonation are crucial to comprehension

Your tone of voice can tell others a lot about what you mean. Which intonation you use in a sentence matters and your intonation can help listeners figure out the critical difference between “Let’s eat, Grandma!” and “Let’s eat Grandma!”

This is only one of a number of examples showing just how the same words can have very different meanings depending on intonation. Furthermore, intonation is as fundamental an aspect of language as any other (such as syntax and semantics), so it is important to understand how we learn to use intonation and to understand its use by others.

To understand tone better, we return to the concept of contrasting sounds like we have discussed before here, here, and here. Those sounds are ubiquitous in all languages, in signed languages as well as spoken variants. Simple sounds like “b” and “p” mostly differ on a single dimension: when speakers produce the sound “b”, their vocal folds are vibrating, but when they produce the sound “p” they are not. As with any other contrast in language, we can treat intonation generally as existing on a continuum, although it can often be more multidimensional than the b-p contrast.

Intonation, in contrast to individual sounds like “p” and “b”, is something that applies to an entire sentence or word. While intonation is an acoustic property of language that affects the sound of a word, patterns of intonation that distinguish between the two earlier “grandma” sentences unfold over a longer period of time. In languages like English, intonation is not used to distinguish between sounds. In other languages like Mandarin Chinese, tones distinguish between words—we illustrated this in an earlier post with a video.

Intonation is typically measured by looking at the frequencies present in the speech signal. Higher pitch and rising intonation are both signaled by higher frequencies. The most important of these is called F0, or the “fundamental frequency.” Fundamental frequency is a good predictor of whether a word has just been said for the first time in a conversation, whether the speaker is a man or a woman, or an adult or a child. Similarly, patterns of intonation can change from high pitch to low pitch and vice versa or remain flat.

Just like with categorization of pairs of sounds, identifying the intonational pattern we hear and assigning a label to it is difficult for speakers. Even experts may not agree on whether they heard one pattern or another in a sentence. In addition to this, identifying intonations is incredibly context-dependent, especially because how a speaker says “grandma” depends on everything they said earlier in the sentence. And even once you’ve got all that information, the exact intonation you hear may not be a perfect indicator of what the speaker means.

Relative to other aspects of speech comprehension, intonation has been difficult to study in English. Thankfully, researchers Chigusa Kurumada, Meredith Brown, and Michael Tanenhaus tackled the question of which factors affect whether we think grandma is dinner or whether she is just hungry in a paper just out in the Psychonomic Bulletin and Review.

Source: San Diego zoo.

The researchers started with the premise that intonation exists on a continuum. This property means that it is possible to gradually turn sentences said with one particular intonation into another intonation. Kurumada and colleagues started with a sentence like “It looks like a zebra”, and mixed the audio properties with a recording containing the same words, but with a completely different meaning such as “It LOOKS like a zebra” (but it isn’t, it’s an okapi, which looks surprisingly similar to a zebra at first glance).

If these two end points seem hard to distinguish in your head, you can try saying both of them out loud, and also listen to contrasts between the most extreme examples here and here. More specifically, the value of the fundamental frequency (F0) discussed earlier moves up and down over time in different ways depending on what kind of meaning the speaker might be trying to convey. In fact, the two most extreme examples (line 1 and line 12 in the figure below) almost have an opposite pattern to each other.

By mixing these two intonation patterns together, the researchers created a twelve-step continuum between the neutral form of the sentence with a sentence that has a very strong intonation that carries with it a specific meaning. This mixing pattern is reproduced below. The difference between the dark blue line (step 1, highly similar) and the dark red line (step 12, mixing maximally dissimilar intonations) is obvious as the lines on one seem to almost predict the opposite value on the other.

For the first experiment, Kurumada and colleagues asked whether there was a relationship between where an utterance fell on the continuum above and whether participants thought a speaker was talking about something familiar (the zebra) or something novel (the okapi) in a two-alternative forced choice task. Participants listened to different recordings of “It looks like a zebra” with different intonations while looking at a picture of a zebra and a picture of an okapi. When they heard the recording, participants had to click on the picture that they thought the speaker was talking about. The more a sentence sounded like “It LOOKS like a zebra”, the more they responded that the speaker was talking about the okapi (and not the zebra).

In the second experiment, the team then asked whether context matters. As we have seen, it can be difficult to make judgments about what tone a sentence has, but maybe more context, experience, and especially feedback can help. In this experiment, there was an initial training phase where Kurumada and colleagues provided participants with the same sentences as before, but immediately after making a decision to click on either the zebra or the okapi, participants heard a continuation of the sentence that gave them feedback as to which meaning the speaker had intended. These two meanings could indicate the zebra (“because it has black and white stripes all over its body”) or the okapi (“because it only has stripes on its legs”).

The feedback was structured to change what was a good-enough match to the okapi meaning versus the zebra meaning. There were two conditions: If participants were in the no-shift condition, they heard continuations that were consistent with the initial sentence such as “It looks like a zebra and it is, because…” or “It LOOKS like a zebra and it isn’t, because…” In the negative-shift condition, by contrast, participants heard meanings that were inconsistent with the initial information: “It LOOKS like a zebra and it is” or “It looks like a zebra and it isn’t.” Exposure to this incongruent information should make listeners more likely to entertain the less typical interpretations.

To see whether feedback on what intonation the speaker probably meant, the authors looked at how exposure to consistent versus inconsistent feedback affected judgments of ambiguous items (i.e. 6, 7, and 8 in the continuum) in the test phase of the experiment. Unlike the exposure phrase, on test trials listeners only had to make a judgment and never heard the continuation.

Kurumada and colleagues found that 30 trials of incongruent feedback did indeed change listeners’ behavior. Hearing an atypical combination of intonations led to changes in when people thought the speaker was talking about the okapi or the zebra. Thus, something that looked like a zebra was more likely to be understood as an okapi when the continuations clarified the meaning for several trials. This demonstrates how flexible the categories are that we use to process different patterns of intonation.

This work is among the very first to look at how brief but consistent exposure to a particular pattern of intonation in English can change perception of that pattern, or at least how people use the acoustic form of what they hear to categorize those signals.

Since intonation plays such an important role in communication (both spoken and written, if you use capital letters or exclamation points like I do), it will be exciting to see how this line of research develops. It is particularly important to discover how we manage to learn patterns of intonation by generalizing across speakers and contexts at the same time as we adapt when we encounter new ones.

And in the meantime, you can hone your intonation skills using this training video:

Featured article in this post:

Kurumada, C., Brown, M., & Tanenhaus, M. K. (2017). Effects of distributional information on categorization of prosodic contours. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1332-6.

Author

  • Cassandra Jacobs is a graduate student in Psychology at the University of Illinois. Before this, she was a student of linguistics, psychology, and French at the University of Texas, where she worked under Zenzi Griffin and Colin Bannard. Currently she is applying machine learning methods from computer science to understand human language processing under the direction of Gary Dell.

    View all posts

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like