How the voices around us shape our own speech

Have you ever visited a foreign country or region and found your own speech being influenced by the accent of the native people? Or perhaps you live far from home, and when you go back, your speech shifts back to the local accent? This happens to me every time I visit my family in Baltimore, MD—I suddenly sound like Tracy Turnblad’s parents in Hairspray.

Two people sitting and engaged in conversation.
Speakers most likely aligning their accents in discourse. Source: pexels.com.

We often converge our speech to match the people around us. This process develops at a very young age. Some children have strong speech patterns that reflect the adults in their community. Take, for example, 8-year-old Jackson from Indiana, who talks about his love of tractors in this video. Jackson has picked up a particular regional cadence and lexicon because that’s what surrounds him.

Vocal convergence can happen at higher levels, like word choice and grammar, as well as lower levels, like acoustic-phonetic properties of speech. One theory suggests that we converge our speech to conform with our conversation partner—to build a social attunement in talking. However, what if this convergence is more a result of internal processes? That is, perhaps speech perception has a direct influence on speech production in the moment of speaking. Perhaps perception and production use shared mechanisms that shape the structure of our speech in real-time. This is the question that Bradshaw and colleagues (pictured below) investigated in a recent paper in Psychonomic Bulletin & Review.

Authors of the featured article “Sensorimotor learning during synchronous speech is modulated by the acoustics of the other voice.” Abigail R. Bradshaw (first), Emma D. Wheeler (second), Carolyn McGettigan (third), Daniel R. Lametti (last).

The researchers looked at vocal convergence through a task of synchronous speaking. Saying the same thing at the same time is common in certain settings, like stadiums and places of worship. People in these contexts may adapt their rate of speech, loudness, pitch, etc., according to the speech they hear around them.

A sport stadium with crowds cheering
Fans of synchronized cheering in crowd. Source: pexels.com.

For the current study, the authors looked at perturbations in formants—the resonant frequencies of speech. As vowel formants shift, the sound will morph into a different vowel category. For example, changing the F1 and F2 values in the word “head” can make it sound like “had.” In this study, the researchers used 50 sentences spoken by a female speaker, and they manipulated the formants into two conditions:

  • Condition 1: F1 increased, F2 decreased
  • Condition 2: F1 decreased, F2 increased

Participants read the sentences aloud at the same time as one of these speaker conditions. After a baseline period, the participants began to hear their own speech through altered feedback. Real-time feedback on their own speech was altered by increasing F1 and decreasing F2. In response to perturbation, speakers often try to adapt their speech to correct for the difference. In this case, adaptation would involve speakers trying to lower F1 and raise F2 to balance out the perturbation in their feedback.

Participants who heard the Condition 2 speaker (the “congruent” group) showed significantly more adaptation than those who heard the Condition 1 speaker (the “incongruent” group). That is, the speaker with lower F1 and higher F2 matched the direction of adaptation, so those participants adapted more to the perturbation in feedback. Conversely, when participants heard the speaker with higher F1 and lower F2, they showed a weaker adaptation effect.

Illustration of experimental manipulations in the study.

This difference in adaptation effects suggests that vocal convergence and speech motor adaptation both operate on acoustic targets—and those targets are flexible based on experience with other voices.

Bradshaw commented,

“Our findings underscore the importance of studying the phenomenon of speech motor learning ‘outside the vacuum’ of repetitive solo speech, in more naturalistic speaking contexts involving interactions with other voices (the central reason for which we speak).”

This research is an exciting step toward fine-tuning theories of speech production, and one step closer to understanding synchronous speech… something that Garth and Kat could work on.

Featured Psychonomic Society article

Bradshaw, A.R., Wheeler, E.D., McGettigan, C., & Lametti, D.R. (2024). Sensorimotor learning during synchronous speech is modulated by the acoustics of the other voice. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-024-02536-x

Author

  • Brett Myers

    Brett Myers is an Assistant Professor in the Department of Communication Sciences and Disorders at the University of Utah. He received his doctorate from Vanderbilt University, where he studied with Duane Watson and Reyna Gordon. His research investigates planning processes during speech production, including parameters related to prosody, and their role in neural models of motor speech control.

    View all posts

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like