This is the first in a series of posts on online data-collection.
The popularity of collecting behavioral data online continues to rise. The reasons are many: ease of getting large numbers of participants, relatively low cost, and access to a more diverse population. Early concerns that online data collection is inherently unreliable are gradually evaporating, bolstered by numerous studies showing that basic experimental results found in the lab, including those using reaction times as a dependent variable, are easily replicated with online methods and populations. In a recent survey, Gureckis and colleagues reported that of the 135 participants who reported reviewing papers containing online samples, 66% treat it like any other data (though this may vary by discipline).
Online data collection has an older history than many realize: In 1996, the Society for Computers in Psychology (ScIP) hosted a symposium on Internet-based research. In a new paper in the Psychonomic Society’s journal Behavior Research Methods summarizing the 2016 ScIP Symposium on Internet research, Christopher Wolfe provides a fascinating look back to the 1996 symposium, describing some of the advances that have been made in the two decades and the extent to which modern online data collection has lived up to the promises of ScIP ’96.
One way in which online data collection has lived up to its promise is popularity. In 2016 alone, Google Scholar reports 8420 hits for “Mechanical Turk”—the most popular resource for obtaining participants. And those are just articles mentioning Amazon Mechanical Turk by name.
Another fulfilled promise is cost: Participants in the 1996 symposium listed low cost as one advantage of online data collection, and this continues to be one of its major perceived advantages.
Wolfe recounts that in 1996 “some scholars talked about going to conferences and exchanging ideas with colleagues from other institutions in the morning, collecting data online in the afternoon, and then bringing those data to conversations that same evening.” He concludes that despite the popularity of online data collection, “this has not been the experience of most researchers.” But that it has been the experience of some researchers (myself among them), can be viewed as a testament to the power of online data collection.
But as researchers take to the Web in greater number and with more confidence, there are persisting concerns with how online research is conducted. Wolfe, along with Krantz and Reips in an accompanying paper in the Psychonomic Society’s journal Behavior Research Methods describe some of these pitfalls. They include “studies with titles that are full of demand characteristics”, and “dysfunctional or biasing form elements, such as selection menus with pre-selected content options.” Krantz and Reips attribute the problem to a lack of education, pointing out that in their review of textbooks for undergraduate research methods classes, none were found to cover online methodologies. An additional problem, perhaps more severe, is a continued lack of technical knowledge in designing and programming experiments for the Web, and failures to take user-design seriously.
A major perceived advantage of online data collection identified by early adopters was the ability to collect much more data than is typical of lab-studies and to target difficult-to-access populations by being able to bring the experiment to the subject instead of the opposite. Although many online experiments have sample sizes similar to lab studies, the long tail of sample sizes has gotten much longer (e.g. see studies by Stafford, Mitroff, Salganik, and Germine).
Wolfe, Krantz, Reips point out, however, that few studies take full advantage of the possibility to target specific populations. Part of the problem is that some hard-to-reach populations can be hard to reach online as well. But it is certainly easier than hanging flyers on campus and having people physically come in to a lab. In an email, Wolfe writes, “I can’t help but think that we are missing out on something important by being too dependent on Psychology Subject Pools.”
Beyond the increased potential for generalization of our theories, the greater diversity of participants and larger sample sizes greatly facilitate the study of individual differences (e.g., see here and here.)
It is a curious fact that a continued source of resistance by some researchers to collecting data online has been a concern that people who do studies for money on sites like Mechanical Turk are not representative. This it true—see here for MTurk demographics—but these samples are nevertheless much more representative in just about every respect than psychology participant pools.
Below are some general tips that may be helpful for those who are starting out in online data collection and those who already collecting data online, but are unsatisfied with the quality of their data or feel constrained by the methods available to you. I am primarily drawing on my own experiences that involve collecting data online from many thousands of participants using both Amazon Mechanical Turk and traditional participant pools.
- Celebrate the differences. A common mistake made by those starting out in online data collection is importing experimental designs made for the lab into web browsers. But what works for the lab does not necessarily work for the web. For example, it makes little sense to bring participants into a lab for a 5-minute study. And so we commonly increase power by engaging fewer participants in more extended experiments (200-500 trials or more are the norm in cognitive psychology studies). But such lengthy studies are a poor fit for the online environment because they demand far more of casual user’s undivided attention than they are willing to give (see also point 3). But this need not be a shortcoming! Collecting data from many more participants using much shorter experiments opens the door to investigations impractical in the lab. For example, we have collected data from hundreds of participants tasked with making simple drawings—a 20 second task—producing highly meaningful data in the aggregate. Short (but validated) tasks can unmask individual differences that are inadvertently hidden in the lab.
- More control is not always better. Psychologists are trained to value a high degree of control in data collection. Collecting data online requires giving up much of this control. On the one hand this introduces additional sources of noise. Perhaps some of your subjects have the TV blaring in the background. Or perhaps they got distracted by Facebook while in the middle of your task. Some of these noise effects can be overcome with a larger sample size. But as Wolfe points out, this is only true “if the errors are random with respect to experimental condition”. If one is conducting a between-subject design and one of the conditions is longer or more difficult, participants in this condition may be more likely to drop out or be distracted. An easy solution is to try to stick with within-subject designs whenever possible. Alternatively, Wolfe points out, that there exist “techniques to prevent unequal dropout rates among conditions … including high-hurdle, seriousness checks, [and] items included to test for lazy responding”. Uncorrelated sources of noise can also be a benefit in that showing the effectiveness of a manipulation in a noisier environment speaks to its robustness. Wolfe writes that “a more elaborate lab experiment with core findings replicated on-line with a more diverse sample makes for a strong empirical manuscript.”
- Understand the user’s perspective. Resources like Amazon Mechanical Turks are not just pools of “workers.” They are communities in which participants communicate, organize, and discuss. It is a surprise to many researchers to discover that their lab’s tasks are discussed on Reddit and rated on Turkopticon. It is very valuable for researchers to spend some time being participants. Participating in other people’s online experiments will give you a sense of what works and what doesn’t. If you’re using MTurk, take some time to work through a variety of other tasks (so-called HITs) that are being posted. The vast majority are not experiments, but it is important to look at them to get a sense of what kind of HITs people on MTurk are most familiar with. This will help you write more informative task descriptions and more helpful instructions.
- Consider the incentives. Psychology pool participants and communities like Amazon Mechanical Turk workers do not have all the same incentives. Understanding the incentives of online participants is key to ensuring high data quality. If you make it easy for participants to respond randomly or to skip critical trials, some of them will. The incentive to “cheat” — get paid for completing a task without complying with instructions—increases as the task gets longer and better paid. Cheating on a $2 task means a participant receives $2. Getting caught causes them to lose $2 and is a point against their online reputation. The same calculus applied to a $0.25 task means risking the same loss of reputation for only a $0.25 gain. Longer and higher-paying studies therefore need more precautions in place to incentivize high quality responding.
- Take design seriously. Just as a well-designed and responsive website is likely to engage us more than a poorly designed and laggy one, a well-designed study will likely lead to improved compliance and lower dropout rates. For example, if you are asking a series of questions or showing a series of images, having to scroll to reach the ‘Submit’ button is likely to irritate people (as it would probably irritate you). Test the design on a variety of browsers and remember that some participants may be completing using phones or tablets. If this is undesirable, require use of conventional browsers.
In the next post, I will delve deeper into some of the possibilities that online data collection affords. Did you know it is possible to monitor whether participant moved the focus off the window or switched to a different program? To record audio from the web browser? To know the exact size of the participant’s screen? That it is possible to track and record the user’s mouse? (commercial websites have been doing it for years!) All this and much more coming soon!
If you would like to learn more about specific kinds of online data collection, please comment below, or tweet at us.
Articles featured in this post:
Wolfe, C. R. (2017). Twenty years of Internet-based research at SCiP: A discussion of surviving concepts and new methodologies. Behavior Research Methods. DOI: 10.3758/s13428-017-0858-x.
Krantz, J. H., & Reips, U.-D. (2017). The state of web-based research: A survey and call for inclusion in curricula. Behavior Research Methods. DOI: 10.3758/s13428-017-0882-x.
1 Comment
Hi Gary Lupyan:
thank you for featuring our 2016 SCiP symposium. As an early (1994) pioneer of running experiments over the Web I am most amazed and excited what has come of it! 🙂 And a bit concerned, as you noticed from our publication you discussed, that many of the methodological and technological basics often do not (yet) enter the curricula for young experimenters at our universities.
Mechanical Turk is a good case: Did you know that the thousands of psychological experiments that were done over the last few years using this service were all done with the same about 7200 MTurkers? This means a large chunk of Psychology relies on a much smaller and very well-trained sample of participants, compared to the hundreds of thousands participants in participant pools at campuses that were involved in Psychology studies before.
Further, in our own 2011 study and many others since it was shown that MTurkers show very unusual behaviors that often mean lower quality of data.
I am very much looking forward to your blog’s next edition. Keep up the good work!
Best wishes from Europe,
Ulf
P.S. Fascinating paper on the universal triangle! Way to go!