You may have heard that the United States had a presidential election last year. You may have also heard that the winner of that election was an outsider, a “straight-talker,” and an anti-establishment candidate. Enough material to fill the Library of Congress has been written on the content of what (first candidate, and now) President Donald Trump says (or tweets), and how it is different from the politicians around him. Journalists have covered his bizarre speech patterns and his strange pronunciation relative to other politicians, but what about how he relates his ideas to each other – his “semantic space”?
In psycholinguistics, a semantic space is the set of connections between concepts. Semantic spaces place more closely related ideas next to each other. Crucially, individuals vary widely in some aspects of their semantic spaces. For example, I might associate “soccer” with “exciting” and “fun”, whereas you might associate “soccer” with “boring” and “droll.” We might agree, though, that “sport” is fairly close to “soccer.” Researchers have begun to measure semantic spaces by looking at the frequency with which words co-occur as a proxy for how concepts are related to each other.
In the current landscape of political polarization in the United States, one common refrain is that the two major political parties and their supporters are in their own media bubbles. Republicans read, watch, and listen to other Republicans, while Democrats only interact with Democrats. If this were the case, it stands to reason that the two parties’ semantic spaces would also be demonstrably different. For example, when Republicans talk about “progress,” they may be more likely to mention “business,” whereas Democrats may be more likely to mention “welfare.”
If Trump is truly the outsider he claims to be, maybe his semantic space will be drastically different as well. Moreover, the semantic spaces of one political party should be reflected in the language use of that party’s supporters, and may be a guide to a voting behavior.
To address this set of questions, researchers Li, Schloss, and Follmer analyzed the speech content of presidential primary candidates for three recent elections (2000, 2008, and 2016 – years when there was not an incumbent from one party). After cleaning the data, the researchers built semantic spaces using a neural network approach similar to latent semantic analysis (LSA) called word2vec. To simplify, word2vec analyzes the co-occurrences of words to determine what words are likely to be related. The analysis creates a matrix in which each row is that word’s similarity to every other word in the set. This matrix of similarities (of every word with every other word) is the semantic space.
The semantic spaces were made up of commonly used politically-relevant words, which were spoken at least 5 times per speaker and were not disproportionately used by any particular speaker. The semantic spaces were compared by correlating one semantic space with another. The researchers created separate semantic spaces for each politician, and combined semantic spaces across party and year (for example, Democratic primary candidates in 2008). The right side of Figure 1 (below) depicts how similar the two major parties are to each other and over time. Note that the Republicans’ semantic spaces are more similar to each other than to the Democrats’ semantic spaces.
Figure 1. Semantic space similarity by party and year. For (a), (b), and (c), the upper right of each matrix is Democratic similarity, whereas the lower left is the Republican similarity. The multidimensional scaling plot (d) depicts the Euclidean distance of the semantic spaces between each party and year.
So what about individual politicians? In Figure 2 (below), you can see the same multidimensional scaling approach, but this time Clinton and Trump are separately plotted. Two things stick out. First, Clinton clusters with all politicians from both parties from the most recent election, as well as 2000 and 2008. Second, as an author of the study, Dr. Ping Li, put it in email correspondence to me: “Trump’s semantic space is much closer to the everyday ‘mundane’ semantic space (closer to the Fyshe vectors) whereas Clinton’s space is much closer to the politically averaged spaces.” Fyshe vectors, named after computational linguist Alona Fyshe, represent words based on their co-occurrences in a corpus of over 16 billion words. They can be thought of as expressing the semantic space of the common English lexicon.
Figure 2. Multidimensional scaling plot (a) and similarity histogram (b) of the semantic spaces of political parties compared to individual candidates – Trump and Clinton – and compared to everyday language use (Fyshe).
Politicians, at least at the highest levels of US government, seem to have distinct semantic spaces. What about average voters? In a separate study, Li and colleagues collected data on Amazon’s Mechanical Turk from people who self-reported identifying as either a Democrat or a Republican. The researchers asked participants arrange the same set of 50 words that they used for the semantic space analysis of presidential candidates into groups. The researchers then used a decision-tree algorithm to generate rules for each word, based on its proximity to other words, with the goal of identifying that individual’s political party.
For example, if Democrats were more likely to put “police” closer to “women”, one rule of the decision-tree might be “if the distance between police and women < 18 words, call this person a Democrat.” The number of rules allowed for each decision tree was restricted to 10, and the model was tested using leave-one-out cross-validation. That means that the model was created using data from all subjects but one, then that final subject was classified using the same set of rules. This procedure is repeated for every subject, and the percentage of subjects classified correctly is the accuracy for that individual word.
The researchers found that the majority of the words led to accurate classification of political party, merely based on their proximity to other words. In other words, political party could be predicted based on an individual’s semantic space of political concepts. The table below shows the accuracy associated with each of the words.
One potential weakness of this study is that it is not clear whether the Mechanical-Turk semantic spaces constructed from average citizens are similar to that of the politicians they vote for. The sorting task did not elicit enough data to permit the researchers to compare this directly. “However,” Dr. Li wrote in email correspondence, “based on existing literature of semantic space (LSA and the like), we think that voters build their own semantic spaces using the basic mechanisms of word association and spreading activation as captured by the computational algorithm.”
A remarkable aspect of this work is the ambition of its scope. Using psycholinguistic approaches, political party affiliations can be re-drawn in the context of semantic similarities and differences; and voting behavior can be predicted. If we truly are ensconced in political media bubbles, realizing that we see, discuss, and conceptualize the world differently may be the key to forming stronger lines of communication. For politicians, (re)learning how to speak to ordinary Americans may change political rhetoric. More broadly, this study heralds the promise of linguistic analysis for tapping into semantic space in real world domains. The approach may help uncover why some students have trouble in educational settings; how lawyers represent clients in front of judges and juries; and more.
Reference for the article discussed in this post:
Li, P., Schloss, B., & Follmer, D.J. (2017). Speaking two “Languages” in America: A semantic space analysis of how presidential candidates and their supporters represent abstract political concepts differently. Behavior Research Methods. DOI: 10.3758/s13428-017-0931-5.