Identifying threatening language

In October 2008, ‘Year2183’ posted a message on the anti-Muslim website ‘Gates of Vienna’, arguing that Muslims should be forcibly deported from Norway. Three years later, on 22 July 2011, the same individual posted and e-mailed a 1500-page document describing his extreme-right ideology and the extensive preparations that he made before killing 77 people in his attacks later that day. Unfortunately, this sequence of events is not unique. Researchers and security professionals are currently working hard to understand, identify and mitigate online extremism that may be a precursor to real-life violence.

One way to study extremism and threat assessment is through psycholinguistic dictionaries. These lists of categorized words allow researchers to detect the prevalence of different types of language in large collections of text. To aid those who are interested in understanding or identifying threatening and violent language, researchers Isabelle van der Vegt, Maximilian Mozes, Bennett Kleinberg, and Paul Gill (pictured below) created the Grievance Dictionary – a new publicly available psycholinguistic dictionary.

The dictionary and its creation are described in an article recently published in the Psychonomic Society journal Behavior Research Methods. In addition, the dictionary is freely available at https://osf.io/3grd6, along with instructions for use and additional details about its development.

Van Der Vegt Figure1 authors — *Authors of the featured article*

To create the new linguistic dictionary, the authors went through multiple phases of development. First, they contacted 21 researchers in the field of threat assessment and terrorism research. Each expert imagined that they were assessing whether a piece of text signals a threat to commit violence and to answer the question, “What do you look for in the text to assess its threat level?”

The authors then distilled the expert responses into 79 different categories related to the content of the message (e.g. direct threat, violence), emotional processes (e.g., anger, frustration), mental health (e.g., psychosis, paranoia), communication style (e.g., unusual grammar, incoherence), and metalinguistic factors (e.g., font, graphics). The resulting list was then narrowed down to 22 categories that could be represented in a dictionary format.

At that point, 13 PhD students were given the list of 22 categories and were asked to write down all the words that came to mind for each category. This process led to 1,951 seed words that were used to start creating the dictionary. These seed words were built upon using two computational linguistics tools. One provided “cognitive synonyms” for each word (e.g., knife = dagger, machete, shiv) and the other selected the 10 nearest neighbors for each word within a vector space of similarity. Thus, the 1,951 seed words became the 24,322 words that moved forward.

Van Der Vegt Figure2 eg words — *Example words from each of the 22 categories*

Next, 2,318 participants from the online research platform Prolific rated how well each word fit the given category. Each participant rated 100 word-category pairs selected from the final list of words. Words were removed from the database if more than 50% of the respondents did not know the word, and words with the same stem (e.g., friendship, friendly, friend) were combined to leave 20,502 words in the dictionary.

The resulting dictionary can be used in multiple ways. One is to only include the 3,643 words that are good matches to their respective category (average rating greater or equal to 7 out of 10). Researchers can then take a text and calculate the proportion of the text that belongs to each category.

Van Der Vegt Figure3 — *Example of how the dictionary can be used to determine the proportion of a text that belongs to each category*

When the authors used this approach, the dictionary was able to distinguish between violent and non-violent texts. Chunks of text from lone-actor terrorist manifestos contained a higher proportion of words from the dictionary than control did neutral text from blogs and forums. The manifestos also scored higher than did posts from a right-wing extremist forum.

One promising use of the resulting dictionary is to help quantify differences between large collections of text. For example, the authors propose asking, “Are right-wing extremists more paranoid than left-wing extremists?” and “Do jihadists discuss weaponry more than right-wing extremists?” By comparing texts across specific categories, researchers can better understand violent extremists and their language use.

Featured Psychonomic Society article

van der Vegt, I., Mozes, M., Kleinberg, B. et al. (2021). The Grievance Dictionary: Understanding threatening language use. Behavioral Research Methods. https://doi.org/10.3758/s13428-021-01536-2

Identifying threatening language

Featured Psychonomic Society article

You may also like

First-case scenario: Primacy effects depend on reading direction

Sleeping on banara and the fate of banana: consolidation of new words in the lexicon

Seeing not just any evil: Eye Movements, Typos, and and Autocorrects