A common trope in high school movies involves the protagonist being given a tour of the various student cliques, where distinctive groups like the rowdy football players, the fashionable mean girls, and the school bullies are all seated at different cafeteria tables and looking unfriendly to anyone not in their group.
While reality is more nuanced than that, it is the case that members of social groups tend to show similarities with each other. In fact, social network analyses allow the identification of clusters of people with similar characteristics, influential individuals who have numerous connections within and across groups, and individuals with fewer and less diverse connections with others.
Perhaps more surprisingly, language itself can be analyzed using a similar approach, treating lexicon entries as if they were “friends” in a social network. John Alderete, Sarbjot Manni, and Paul Tupper (pictured below) developed a series of phonological similarity networks and described the process and how they can help advancing psycholinguistics research in their recent paper published in Behavior Research Methods, a journal of the Psychonomic Society.

Friendly words to share openly
Beyond its use for researching relationships between people, network analysis can be used to analyze the characteristics of any form of interconnected nodes that form a network. This allows for a similar approach to be used to analyze, for example, psychometric data or geographically distributed power grids.
While psycholinguistics datasets have been using this approach, few open-access datasets exist, an observation that constitutes one of the main motivations for the authors to conduct their study. Specifically, they decided to develop phonological similarity networks and to characterize them in terms of their network properties.
But how does one treat psycholinguistic data points as if they were “friends” on a social network? What would it mean for a word to have many or few “friends”? What would a clique mean for a group of words?
To address those challenges, the authors focused on the sounds of words rather than their written representations. Specifically, they started by using a well-established corpus of English words based on film and television subtitles (the SUBTLEX-US, freely available through the Psychonomic Society), then transformed the written words of the corpus to their phonological representations (i.e. to focus on the “sounds” of the words), and then created a list of neighboring nodes for each word (where a neighboring node was a similar word that differed only by the addition, deletion, or substitution of a single phoneme).
To make this more concrete, take, for example, the written word “fish.” After converting it to a phonological representation (using the International Phonetic Alphabet), the written “f” corresponds to the sound /f/, the written “i” corresponds to the sound /ɪ/, and the written “sh” corresponds to the sound /ʃ/. Thus, the sound of the full word can be written as /fɪʃ/, and one of its neighboring nodes would be /dɪʃ/ (which represents the sound of the written word “dish”).
Returning to our social network analogy, /fɪʃ/ and /dɪʃ/ would be “friends”; they would likely have more friends, such as the written representation of “wish”, and all of them would belong to a larger clique of words ending with the sounds /ɪʃ/. At the same time, we can also easily see how a word like “neurotransmitter” (/nʊroʊtrænsmɪtər/) would likely have much fewer friends than “fish” (/fɪʃ/).
Network characteristics
Using this procedure, the authors created a series of phonological similarity networks, analyzed their properties, and compared them to those of other corpora.
Some of their main findings included:
- Networks include a “giant component” (i.e., a sub-network with the larger number of connected words), a series of lexical islands (smaller groups of connected words), and some “hermit” nodes (isolated words not connected with any other word).
- The size of the giant component is comparable to that observed for other languages.
- The words in the networks tend to be connected to other words through a relatively small number of intermediary nodes, an observation known as the network displaying a “small-world” property.
- The networks are highly robust, meaning they can tolerate the deletion of a large number of highly connected nodes without a dramatic effect on the proportion of words that belong to the giant component. See Fig 2.
- The network properties are generally comparable to the properties of other psycholinguistic networks, which are not fully available in an open-access model.

The authors made all the digital resources available on the GitHub project page to support researchers in advancing and exploring language network science. A very friendly objective, I would say!
Featured Psychonomic Society article
Alderete, J., Mann, S. & Tupper, P. Open-access network science: Investigating phonological similarity networks based on the SUBTLEX-US lexicon. Behavior Research Methods 57, 96 (2025). https://doi.org/10.3758/s13428-025-02610-9