We all take speech for granted. We are able to say things to others without thinking about how we do that. We may struggle to know what to say when we are left speechless, but once we gather our thoughts, we can utter them without difficulty.
Once you consider speech production more carefully, however, it reveals its full complexity. In addition to the obvious need for sophisticated motor control processes, speech also involves various psycholinguistic components. After all, we need to compose the sentences that our vocal apparatus then turns into speech.
To date, research on speech production has often focused either on the motor aspects or the psycholinguistic aspects with little integration between them. This is beginning to change, and research has now developed integrated theories of speech production. One such theory was recently presented by researchers Grant Walker and Gregory Hickok in the Psychonomic Bulletin & Review.
Their model is known as SLAM, which stands for “semantic – lexical – auditory – motor model”. As the name implies, the model combines a number of different strands of research to explain speech production.
Walker and Hickok’s model is too complex to be fully described in a blog post; however, we can explore some of the inner workings of the model to give a flavor of how it explains speech production. One intriguing aspect of the model is that it aims to capture speech errors in aphasic patients in order to explain normal speech production. People with aphasia have difficulties with speech production, which may range from an occasional difficulty finding the right words to a complete loss of the ability to speak. There is a long tradition in cognitive science to infer the workings of a functioning system from the way in which it fails. For example, the study of amnesia has enabled researchers to infer how unimpaired memory might be structured.
Walker and Hickok likewise used deficits in aphasia to infer how normal speech is produced. They first presumed that speech representations are encoded (in part) in two parts of the brain; namely in a sensory-auditory cortex and in motor cortex. There is plenty of evidence that unimpaired speech involves a link from lexical knowledge to an auditory representation, as for example revealed through brain imaging and plenty of evidence that auditory and motor representations interact rather directly. The need for a second pathway from lexical to motor systems, in addition to the auditory involvement, arises from the fact that even patients who have grave difficulty linking new auditory representations to speech are nonetheless capable of generating familiar material. Those patients are often unable to repeat a nonword out loud (such as “turping” or “novolt”), thereby revealing their difficulty with linking new auditory representations to speech. But at the same time, those patients can engage in fluent speech based on their existing lexical knowledge, thereby revealing a further link from lexical knowledge direct to the motor system.
Based on a variety of additional facts, such as that comprehension deficits in aphasia tend to recover more than production deficits, Walker and Hickok were furthermore able to stipulate that the lexical–auditory links are always stronger than the lexical–motor mappings.
How well do those assumptions work?
Walker and Hickok embodied their assumptions in a computer simulation of a picture-naming task. In the picture-naming task, people are shown a picture of an everyday object and are asked to name the object out loud. In the simulation, a naming trial begins not with a picture but with a boost of activation delivered to semantic representations that capture the meaning of the object being presented. Those activations then cycle between semantic and lexical representations, until one of the lexical representations emerges as the most activated unit. From there, after some more feedback cycles, the most active phonological representations are selected to commence output of the word. Production errors can occur due to the influence of noise as activation levels decay over time.
Using this simulated production process, the model was applied to picture-naming data from 255 aphasic patients who exhibited 5 different types of aphasia (e.g., Broca’s vs. Wenicke’s and so on). The question of interest was whether the model would manage to capture the various patterns of production errors of the patient population.
The results showed that the model performed very well, and better in fact than a precursor model (the SP model developed by Foygel and Dell). To illustrate, consider the data in the figure below:
Of greatest interest are the green and burgundy bars which represent the data from a particular patient (green) and the SLAM predictions (burgundy). It is clear that the model captures the pattern in the data: About 60% of the time the model produces the correct word in response to a picture (e.g., “cat” in response to a picture of a cat). In a further 20% of trials the model—like the patient—produces a neologism; that is, a newly created utterance that is not a word (e.g., “cak”). The other errors are also captured by the model. Overall, SLAM outperformed previously proposed computational models.
The SLAM model is thus a viable contender to explain speech production in aphasic patients but also, by implication in unimpaired people. The SLAM model falls into the broad class of “dual-route” models, which are models that explain behavior by positing separate but interacting processing streams. In SLAM, there is the lexical-auditory stream and there is the lexical-motor stream, with the former being stronger than the latter.
We may take speech for granted, but whenever we say anything, we engage a complex system that involves multiple neural structures. This redundancy is necessary for speech to be learned in the first place (because without an auditory representation that can provide a target for the motor system to mimic during speech development, we could not learn to talk), and it also provides us with some protection against complete failure even if we acquire a form of aphasia.