Filling-in words but in-filling music: Differentiating domains in short-term memory

Language and music unfold in time in similar ways. Just like we do not produce an arbitrary sequence of words, music follows a set of principles, and notes that fail to follow them can sound “ungrammatical.” Both language and music also often show hierarchical structure, with the pace of speech and music often having what feels like phrases, or “chunks” that stick together. But how do we keep track of it all? How can we remember chunks and sequences?

Chunking a sequence of sounds, pictures, or events requires remembering what has happened. To keep track of these events, we use some kind of short-term memory (STM) that allows us to store not just events but also the order between them. Some researchers have proposed that we use the same cognitive architectures to remember sequences of visual information as well as verbal information. Evidence that the same short-term memory abilities are used across tasks comes from errors in immediate serial recall.

Errors in immediate (verbal) recall, in which participants recall a list of words in the order they heard them, show a couple of interesting patterns. First, people tend to recall the initial and final information that they studied better than words or images presented in the middle (primacy and recency effects).

Second, people often make ordering errors belonging to two types. Consider the following list of items: cat, orange, star, balloon, clock, tree. The first class of ordering error is a “fill-in transposition”, and the second class is an “infill transposition.” In fill-in transpositions, participants recall a word too early. For example, they might recall a word like “star” in the “orange” position. Critically, they are then likely to recall “orange” immediately afterwards, so they might say: cat, star, orange, balloon, clock, tree. For infill transpositions, participants do the opposite and recall not the word that should have been there, but the order that came after, such as: cat, star, balloon, clock, orange, tree. Fill-in transpositions are about twice as likely as in-fill transpositions. Below is a schematic showing fill-in and infill transpositions in verbal serial recall.

Some evidence suggests that these effects are the same with visual materials such as pictures. Serial position effects (primacy and recency) look very similar to those in verbal recall, and the swapping errors seen above occur at similar rates for pictures as for words. This suggests that short-term memory is at least somewhat domain-general (task-agnostic).

Despite the parallels between visual and verbal information, relatively little is known about other domains of serial processing, and it is unclear whether there are task-specific components. For this reason, we can again turn to the similarities and differences between music and language. Music unfolds over time and contains chunks just like language does, but it uses different sources of knowledge. Because everyone speaks a language, but not all people play instruments, musicians may show different (better) short-term memory abilities for musical sequences than non-musicians. If this is the case, then some aspects of short-term memory may, after all, be domain specific.

The broader question becomes, do we rely on similar cognitive abilities to chunk both music and language?

Researchers Simon Gorin, Pierre Mengal, and Steve Majerus have provided us with new insights about musical and verbal short-term memory in a recent study published in the Psychonomic Society journal Memory and Cognition. Importantly, the authors put music and verbal production on a level playing field by developing a novel musical task that was more closely related to verbal short-term memory tasks than has been used in previous studies.

The researchers used a verbal (number repetition) and a tone production task. In the verbal task, participants heard lists of increasing length, from six to nine numbers, that they then reproduced by placing cards on a desk into the order of presentation. In the tone production task, participants instead used a computer screen and clicked on tones, displayed from lowest to highest, in the order of presentation (see figure below). The order in which participants clicked on the tones is conceptually similar to the order the cards were sorted into in the verbal task.

To assess the importance of serial position (primacy and recency effects), the authors counted up the proportion of items that were correct in each position. For errors, the authors looked at both how far away a response was from its actual position — that is, the distance of displacement̶ — and how many fill-in and infill errors they saw in the verbal and music tasks.

Gorin and colleagues found that musicians typically outperformed non-musicians. The drop in performance as a function of list length seemed to differ between verbal and musical sequences, with musicians showing a slightly different forgetting pattern than non-musicians (see the two figures below; the Y-axis shows proportion correct).

In terms of accuracy within a list, the serial position curves differed considerably between music and verbal recall. The canonical U-shaped curve with primacy and recency was found for verbal information, but the effect was much smaller for sequences of musical notes and only held for the longest lists (5 and 6 tones), whereas the verbal serial position effect was present at the shortest lists (6 words).

The patterns of swaps and other ordering errors were also very different between modalities. In the verbal recall task, people produced more than twice as many fill-in errors compared to in-fill errors. The same was true for non-musicians in the music production task. By contrast, musicians did not produce more infill than fill-in errors for the tone production task, suggesting that learning to produce music changes the strategies available to those participants for short-term memory for music.

Overall, the results obtained by Gorin and colleagues show that verbal and musical short-term memory rely on similar processes, at least in non-musicians. At the same time, musicians show advantages in processing overall, and seem less sensitive to making the same kinds of errors in reconstructing musical sequences than non-musicians, which suggests that short-term memory can be somewhat domain specific. These differences might arise because musicians process musical sequences differently, perhaps as sequences of relations between sounds, which could change how the sounds are represented in memory relative to non-musicians.

This study is one of the first to really highlight how seemingly very similar input (music and language) may rely on different types of processing, even though they may also share cognitive resources.

Psychonomic Society article featured in this post:

Gorin, S., Mengal, P., & Majerus, S. (2018). A comparison of serial order short-term memory effects across verbal and musical domains. Memory & Cognition, 46, 464-481. DOI: 10.3758/s13421-017-0778-0.

Author

Cassandra Jacobs

Cassandra Jacobs is a graduate student in Psychology at the University of Illinois. Before this, she was a student of linguistics, psychology, and French at the University of Texas, where she worked under Zenzi Griffin and Colin Bannard. Currently she is applying machine learning methods from computer science to understand human language processing under the direction of Gary Dell.
View all posts

1 Comment

Matthew Hearne says:

October 31, 2018 at 11:05 am

Thank you for this post, as I am a musician as well as an undergraduate student in Psychology.
I am confused about a passage in which you report the fill-in: infill ratios for both participants: “The patterns of swaps and other ordering errors were also very different between modalities. In the verbal recall task, people produced more than twice as many fill-in errors compared to in-fill errors. The same was true for non-musicians in the music production task. By contrast, musicians did not produce more infill than fill-in errors for the tone production task,” In the report of non-musicians you give the ratios, but, in the report of musician errors it seems to me that the two are in agreement. This perception I have seems to be what was reported in the abstract of this study: “Serial order errors in both tasks were characterized by similar transposition gradients and ratios of fill-in:infill errors. These effects were observed for both participant groups, although the transposition gradients and ratios of fill-in:infill errors showed additional specificities for musician participants in the musical task.” Would you explain this to me in a way that I could remedy my conflicting apprehension of the two reports?