Can a “fish” be friends with a “dish” – a new open-access resource for phonological network research

A common trope in high school movies involves the protagonist being given a tour of the various student cliques, where distinctive groups like the rowdy football players, the fashionable mean girls, and the school bullies are all seated at different cafeteria tables and looking unfriendly to anyone not in their group.

While reality is more nuanced than that, it is the case that members of social groups tend to show similarities with each other. In fact, social network analyses allow the identification of clusters of people with similar characteristics, influential individuals who have numerous connections within and across groups, and individuals with fewer and less diverse connections with others.

Perhaps more surprisingly, language itself can be analyzed using a similar approach, treating lexicon entries as if they were “friends” in a social network. John Alderete, Sarbjot Manni, and Paul Tupper (pictured below) developed a series of phonological similarity networks and described the process and how they can help advancing psycholinguistics research in their recent paper published in Behavior Research Methods, a journal of the Psychonomic Society.

*John Alderete (left), Sarbjot Manni (middle), and Paul Tupper (right), authors of the featured study.*

Friendly words to share openly

Beyond its use for researching relationships between people, network analysis can be used to analyze the characteristics of any form of interconnected nodes that form a network. This allows for a similar approach to be used to analyze, for example, psychometric data or geographically distributed power grids.

While psycholinguistics datasets have been using this approach, few open-access datasets exist, an observation that constitutes one of the main motivations for the authors to conduct their study. Specifically, they decided to develop phonological similarity networks and to characterize them in terms of their network properties.

But how does one treat psycholinguistic data points as if they were “friends” on a social network? What would it mean for a word to have many or few “friends”? What would a clique mean for a group of words?

To address those challenges, the authors focused on the sounds of words rather than their written representations. Specifically, they started by using a well-established corpus of English words based on film and television subtitles (the SUBTLEX-US, freely available through the Psychonomic Society), then transformed the written words of the corpus to their phonological representations (i.e. to focus on the “sounds” of the words), and then created a list of neighboring nodes for each word (where a neighboring node was a similar word that differed only by the addition, deletion, or substitution of a single phoneme).

To make this more concrete, take, for example, the written word “fish.” After converting it to a phonological representation (using the International Phonetic Alphabet), the written “f” corresponds to the sound /f/, the written “i” corresponds to the sound /ɪ/, and the written “sh” corresponds to the sound /ʃ/. Thus, the sound of the full word can be written as /fɪʃ/, and one of its neighboring nodes would be /dɪʃ/ (which represents the sound of the written word “dish”).

Returning to our social network analogy, /fɪʃ/ and /dɪʃ/ would be “friends”; they would likely have more friends, such as the written representation of “wish”, and all of them would belong to a larger clique of words ending with the sounds /ɪʃ/. At the same time, we can also easily see how a word like “neurotransmitter” (/nʊroʊtrænsmɪtər/) would likely have much fewer friends than “fish” (/fɪʃ/).

Network characteristics

Using this procedure, the authors created a series of phonological similarity networks, analyzed their properties, and compared them to those of other corpora.

Some of their main findings included:

Networks include a “giant component” (i.e., a sub-network with the larger number of connected words), a series of lexical islands (smaller groups of connected words), and some “hermit” nodes (isolated words not connected with any other word).
The size of the giant component is comparable to that observed for other languages.
The words in the networks tend to be connected to other words through a relatively small number of intermediary nodes, an observation known as the network displaying a “small-world” property.
The networks are highly robust, meaning they can tolerate the deletion of a large number of highly connected nodes without a dramatic effect on the proportion of words that belong to the giant component. See Fig 2.
The network properties are generally comparable to the properties of other psycholinguistic networks, which are not fully available in an open-access model.

Effects of removing random nodes (failures) as compared to removing nodes by their number of connections with other nodes in descending order (attacks), as it can be seen, the effects of both removal processes are similar until reaching ~39% of node removal, where a sharp decline in the proportion of nodes in the largest component is observed for attacks only.

The authors made all the digital resources available on the GitHub project page to support researchers in advancing and exploring language network science. A very friendly objective, I would say!

Featured Psychonomic Society article

Alderete, J., Mann, S. & Tupper, P. Open-access network science: Investigating phonological similarity networks based on the SUBTLEX-US lexicon. Behavior Research Methods 57, 96 (2025). https://doi.org/10.3758/s13428-025-02610-9

Author

Jonathan Caballero

Jonathan Caballero is a cognitive and behavioral scientist specializing in social perception and its role in decision-making. Currently, he is a postdoctoral researcher at McGill University, in Canada, where he conducts studies addressing the role that verbal and non-verbal cues play in the perception of social situations, personal traits, and affective inferences and how this information influences social interaction and ultimately health and well-being in healthy and clinical populations. His research is done using a combination of perceptual, behavioral, acoustic, and electrophysiological methodologies. The long-term goal is to generate knowledge of how ambiguous social information guides decision-making and to use this knowledge to inform interventions for improving the quality of social outcomes in clinical populations and in healthy individuals that, nevertheless, are exposed to negative social treatment, such as speakers with nonstandard accents.
View all posts