Someone is talking, but where are they? Visual search and Zoom

At this point, we’ve probably all spent more time than we want to think about on Zoom calls. Whether that’s meeting with students or collaborators when we’re spread across countries (or the planet), or committee meetings, or even social events, if you’re reading this, you’re probably more familiar than you want to be with a screen full of people’s faces in boxes. Sometimes, it’s the right tool, but it’s unlikely that you look forward to, say, a departmental meeting over Zoom when everyone is in their own little box.

That all being said, you’ve probably spent enough time on Zoom (or other videoconferencing software, depending on whether you’re made to use something like WebEx or Teams) to have some intuitions. Maybe you’ve had a hard time finding who is speaking on a call during a discussion, or maybe you’ve been the one leading the call, and you kept missing someone’s hand when they wanted to speak. Now, this is annoying if you’re on the call, but from a vision science perspective, why are both of these tricky?

Well, Yelda Semizer and Ruth Rosenholtz took this irritant and turned it into a pair of experiments in order to understand what was going on here, and to maybe provide some insights into both how search works in the world and how we might improve our experience in our next Zoom meeting.

Authors: Yelda Semizer (left); Ruth Rosenholtz (right; in a cluttered environment that makes her harder to find).

So, what makes this hard?

Let’s say you’re running a discussion over Zoom, and you’ve got 20 people on the call, and let’s assume they’ve all got their cameras on for a change, so you’re not just looking at a sea of black boxes with names. When you set up your Zoom, maybe you decided to play with the backgrounds – so it doesn’t look like your office or your bedroom or the slightly quiet corner of your basement where you try to work from home. The thing is, when everyone else on the call does that, you wind up with some pretty variable backgrounds – and, as the authors talk about in the paper, a lot of clutter in the window on your screen.

So, what’s clutter?

Clutter, or as we often talk about it in vision science, crowding, is when you’ve got lots of objects near each other – and thinking about the displays we all see on Zoom, those displays, where you have maybe 9 or 16 or 25 faces in one window, with their backgrounds, are certainly cluttered! What that means from a perceptual standpoint is that it’s hard to notice small changes, much less identify something specific. Or, if you’ve been able to get away from Zoom (or Teams) lately, a good example of clutter might be your colleague’s desk that’s covered in stuff, where they can’t find anything.

Clutter exists outside your computer screen (from Cottonbro Studio, Pexels).

Turning Zoom struggles into an experiment

Since their focus was on the problem of clutter and how different backgrounds impact our ability to find who is speaking on Zoom (or who has put their hand up to talk next), Semizer and Rosenholtz chose to simplify what we do on Zoom so they could study it in the lab. Rather than having a screen full of talking, moving (or very bored) faces, they built their own stimuli that look like a screen full of faces on Zoom, ranging from a call with four people all the way to a call with 25 people. In one experiment, they asked participants to find the speaker – highlighted with a green box – and tracked their eyes while they did it.

So, what does clutter actually do?

Critically, what they found is that clutter makes it harder to find what you’re looking for in these environments, which feels a lot like how it can be hard to find that one post-it on your desk (or maybe your colleague’s desk). This is particularly problematic when it comes to that big Zoom call, rather than the call with three other collaborators, and that leads to their second experiment, where participants were looking for a raised hand.

Plot from paper (Figure 5) showing how increased clutter is related to longer search times.

That turns out to be a particularly nasty problem when we think about cluttered interfaces – finding that little raised hand is challenging to begin with, and clutter doesn’t make things any better.

So, where do we go from here?

Since Zoom and videoconferencing isn’t going away (even if some of us want it to), what can we learn from studying the vision science of these interface questions? Well, if we’re thinking about how to make this better, the authors suggest that moving beyond simple visual cues – like the green outline for a speaker, or the little yellow hand for someone raising their hand – and incorporating signals that are harder for users to miss. So, maybe we’ll all have a better time if that hand dances to get our attention rather than just floating there in a little box on Zoom.

Featured Psychonomic Society article

Semizer, Y., & Rosenholtz, R. (2025). The effect of background clutter on visual search in video conferencing. Cognitive Research: Principles & Implications, 10, 40. https://doi.org/10.1186/s41235-025-00643-4

Author

  • Wolfe Ben Thumbnail

    Benjamin Wolfe is an Assistant Professor in the Department of Psychology at the University of Toronto, Mississauga. His research sits at the intersection of applied and basic vision science, including questions of visual perception in driving, improving readability and extending our understanding of visual perception in real-world settings.

    View all posts

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like