Looking at Toto or Kansas: The Tyranny of Film versus Top-Down Cognition

What are your favorite, best-ever movie quotes? Is it “I’ll have what she’s having”? Or “Toto, I’ve a feeling we’re not in Kansas anymore”? What about “This is the beginning of a beautiful friendship”? If you are unsure, here is a list of the best 100 movie quotes of all time according to Hollywood.

But movies are not just about the lines and the scripts, even though we have considered their importance in two previous posts here and here. After all, the movies are called movies because they contain lots of moving things. To a cognitive scientist, the most interesting moving things may not be the pictures themselves but the eyes of the audience.

Where do we look while watching a movie? And how does where we look affect our comprehension of a movie and vice versa? If we don’t look at Humphrey Bogart, do we still reckon his friendship with Claude Rains will be beautiful? If we already know that Toto is no longer in Kansas, do we still look at the furry little dog on the screen?

A recent article in the Psychonomic Society’s journal Cognitive Research: Principles and Implications explored those types of questions. Answers to those questions are not just of psychological interest, but they also address a long-standing controversy among movie directors and cinematic theorists.

Some directors believe that they have the power to make their audience look exactly where they want them to look. Steven Spielberg once noted that “… if we were watching a tennis match, you’d see that perfect synchronicity of heads going left-right, leftright. The same thing in a movie theatre, when the … audience is galvanised, … all watching the same things, all knowing where to look at the exact same time.” This is often called the “Tyranny of Film.”

By contrast, other filmmakers believe that each viewer sees and understands a different film. For example, Quentin Tarantino once said that “If a million people see my movie, I hope they see a million different movies.”

Researchers John Hutson, Tim Smith, Joseph Magliano and Lester Loschky were particularly interested in how viewers’ seeing—that is, their eye-movements—is affected by “top-down” processes such as the viewers’ task, and the viewers’ mental model of the scene.

To examine the role of top-down processes requires a film clip that is not highly structured and rapidly edited, but that leaves some room for visual exploration both in terms of the length in between cuts and the amount of content in each shot. Hutson and colleagues used one of the most famous long scenes in film history, namely the opening of Orson Welles’ Touch of Evil. This shot is more than 3 minutes long and depicts events at a Mexico-USA border crossing in the 1950s. To make sense of the remainder of this post you would probably want to watch the clip:

This famous shot has been cited by film theorists as forcing the viewer to “… exercise at least a minimum personal choice. It is from his attention and his will that the meaning of the image in part derives.” If you watch this clip, you will notice how the car—which the viewer knows to contain a ticking bomb—occasionally leaves the frame, moving ahead of the walking couple, before it has to stop for traffic or animals on the road and thus re-appears again in our field of view. Because we expect the bomb to go off at any moment, this teasing disappearance and appearance of the car not only creates considerable suspense but it also means that we would likely be looking at the car whenever we get a chance. But what if we didn’t know about the bomb because the first few seconds of the clip were missing? Where would we look?

This is precisely the manipulation used by Hutson and colleagues. In the “context” condition, viewers watched the entire clip, whereas in the “no-context” condition participants watched the same opening scene of Touch of Evil but with the first 18 seconds (the time it takes to plant the bomb) removed.

The results of interest are shown in the figure below, using a measure of gaze similarity across the part of the clip that was seen by participants in both conditions.

The orange and blue lines in the top graph indicate the amount of synchrony of eye movements between people. A value of zero refers to average synchrony, whereas positive (negative) values indicate greater (lesser) synchrony. It is clear that in the final scene of the couple kissing, there was considerable synchrony between participants and between conditions: Everyone was focusing on the kiss. This is further illustrated by the heat maps in the thumbnails at the bottom (panel c), which show the distribution of gazes for each condition separately. The heat maps also show the image when there is maximal divergence between the two conditions (panel b).

Statistically, there was no overall difference between the two conditions. That is, panel b notwithstanding, people tended to look at the clip in synchrony (or not) irrespective of whether or not they know about the bomb in the car. Moreover, when each condition was compared to a baseline distribution consisting of randomly shuffled eye movements, both conditions were found to exhibit more synchrony than this random baseline. In other words, each group exhibited a significant—but time-varying—extent of synchrony of eye movements across participants. No matter whether we know about the bomb or not, we all tend to look in the same place at certain times whereas at other times we tend to explore the screen in an idiosyncratic manner.

The same pattern arises when the fixations on the car are analysed. The next figure shows that the two conditions do not differ on this measure (an impression confirmed by statistical analysis).

These results are quite intriguing because they suggest that even when a crucial piece of knowledge is absent, people watch the movie in the same way as do people who have that critical knowledge and who therefore can form a more accurate mental model of events.

The results suggest that Steven Spielberg got it right: the director can make us look where he or she wants us to, irrespective of what our mental model of the event is, in the same way that we cannot help but be in synchrony while watching a tennis match:

Does this mean that the “Tyranny of Film” is inescapable?

In two further experiments, Hutson and colleagues provided some boundary conditions on this apparent resistance of eye movements to contextual manipulations.

In one follow-up experiment, a condition was introduced in which even more context was removed from the clip. That is, participants never saw the couple entering the car, thus rendering it tangential to the unfolding scene. In that condition, participants did not take much notice of the car when it first appeared, unlike the conditions in which the couple had been seen getting into the car.

In a second follow-up experiments, participants were told that they would have to draw a detailed map of the locations in the scene from memory after the clip was finished. This manipulation also introduced some divergence in car-fixations compared to the context condition in the main study.

Notwithstanding those boundary conditions, Hutson and colleagues conclude that their data largely support the Tyranny of Film hypothesis. Irrespective of our understanding of a scene, and irrespective of what we expect to get out of a clip, we tend to focus on the same elements of a scene. Our movie viewing is largely driven by the bottom-up elements of the movie itself, not our cognitive top-down processes.

Psychonomics article focused on in this post:

Hutson, J. P., Smith, T. J., Magliano, J. P., & Loschky, L. C. (2017). What is the role of the film viewer? The effects of narrative comprehension and viewing task on gaze control in film. Cognitive Research: Principles and Implications. DOI: 10.1186/s41235-017-0080-5

