Shrinking boundaries during sequences of sequential sampling

It’s hot, very hot. And you are traipsing through the jungles of Sumatra in pursuit of that final bilingual participant who is as conversant in Minangkabau as she is in English. You need to test her in your lexical decision task to fulfil the sample size requirements of your OSF preregistration.

You nervously scan the green tapestry all around you and, squinting with the sweat in your eyes, you see this:

Source: Getty Images

What is the correct course of action?

Two options come to mind: You can decide very quickly that this might be a lurking danger and turn around (and forget about complying with your preregistration). Alternatively, you can continue scanning and wipe your eyes, and a little while later arrive at a far more unambiguous conclusion (although it is unclear how that will affect your compliance with the preregistration):

Source: Getty Images

This dilemma represents the essence of the speed-accuracy tradeoff: act quickly on the basis of a hunch or plod along carefully until you can be certain of the answer.

Another way to express that dilemma is as a “stopping problem”: how long should I continue to sample information from the environment before I can make a decision? The first attempt to resolve the stopping problem dates back to 1945, when Abraham Wald proposed the “sequential probability ratio test” (SPRT).

The idea behind the SPRT is that as each new piece of evidence becomes available, a cumulative sum of log-likelihood ratios is computed, and once that sum exceeds (or falls below) a predetermined upper (lower) threshold, a decision is made. A log-likelihood ratio is the logarithm of the ratio of two probabilities, namely the likelihood of two outcomes expected under two alternative hypotheses.

The SPRT has wide applicability, including in forensic settings such as detecting excess deaths in the medical system due to lax hospital procedures or due to a physician who was a prolific serial killer.

The basic idea of sequential sampling of evidence in the lead-up to a decision is also embodied in a family of psychological models known as sequential sampling models, which we have discussed previously here and here.

Just as in the SPRT, people are presumed to accumulate evidence over time until the sum total of evidence crosses a decision boundary.

So where and how does one place the decision threshold? Clearly, the further away the boundary is from the point at which evidence accumulation starts (usually 0), the longer it will take to reach it and the more accurate the decision is likely to be. Conversely, the closer the boundary is to the origin, the more rushed the decision will be and the less likely it is to be accurate.

Until recently, it has been widely assumed that the response boundaries remain constant during a decision. At first glance, this makes considerable sense because if you are scanning medical records to detect a serial killer among physicians by scanning the annual rate of deaths in comparison to a baseline, a change in the boundaries over time appears entirely unmotivated.

In human decision making, however, it turns out that there are good reasons for why the response boundaries might move closer together over time during a single decision.

A recent article in the Psychonomic Bulletin & Review tackled this issue and examined when it is optimal for decision makers to adjust their decision boundaries during a trial. Researchers Gaurav Malhotra, David Leslie, Casimir Ludwig, and Rafal Bogacz focused on the role of rewards in the overall decision-making context.

Suppose a decision maker has to make multiple decisions, as is typically the case in the context of a laboratory experiment (although it may be less typical of the encounter with animals in the Sumatran jungle). Every time a correct decision is made, the decision maker collects a reward. Every time an incorrect decision is made, no such reward is available (and in the case of tigers in the jungle, there may be further adverse consequences that are difficult to mimic in the laboratory).

How can a decision-maker maximize the rate of rewards across the entire set of decisions? The rate of rewards per unit time is tied to the placement of the response boundary, all other factors being equal. If the boundaries are too close to the origin, there will be many errors and hence many missed opportunities for rewards, even though many very quick decisions will be made in a given time. Conversely, if the boundaries are set too far apart, accuracy will be increased but decisions will take longer, and thus there may also be fewer rewards altogether in a given time.

Malhotra and colleagues worked out the optimal placement and behavior of decision boundaries under a variety of circumstances so as to maximize a decision maker’s rate of reward. Their results, based on mathematical and computational analysis, are illustrated in the figure below.

Each panel in the figure shows the optimal actions for a single trial based on the difficulty of the task. The black squares in each panel correspond to states that are converted into a decision. One decision, such as “run away from the tiger” is represented by the top set of squares, and the other decision, such as “don’t worry it is just autumn leaves”, is represented by the bottom set. The gray squares in the middle represent states in which it is optimal to wait. The boundaries between the gray and black squares represent the decision bounds.

On each trial, the decision-making process commences at the point 0,0: there is no evidence, and nothing has been sampled. The process then proceeds from left to right as information is being sampled once a stimulus becomes available (e.g., a bit more of an orange patch among the green leaves). Depending on the difficulty of the task, each new sample will nudge the evidence either up or down, where up corresponds to evidence for one decision and down to evidence for another.

Consider the center panel first. This represents a difficult task in which samples are effectively non-informative. That is, the evidence that you sample at each step in time is equally compatible with either response alternative. It can be seen that under those circumstances, the optimal policy is to make a decision straight away—there are no gray squares in the panel. Intuitively, this makes sense: if you jump to a conclusion instantly, your decision cannot be above chance and you will earn a reward only half the time (assuming equal base rates for the two alternative stimuli). However, if the stimulus is non-informative, then no amount of waiting will improve your performance, and hence your reward will be maximized by making as many—effectively random—decisions as possible in the time available.

Now consider the panel on the left. The stimulus is informative, and therefore the longer you wait, the more likely you are to accumulate evidence in one direction or another. Hence, during the first 4 rounds of sampling, it is always optimal to wait for more evidence, because even though that adds time to the next opportunity for a reward, that delay is compensated for by the increase in accuracy. After that, if the next sample nudges the total evidence above the top (or below the bottom) boundary, it is optimal to respond because the likelihood of earning a reward has become sufficiently large.

Finally, consider the panel on the right. This represents a situation in which the difficulty of the trial—that is, the informativeness of the stimulus—randomly varies from trial to trial. On some trials, the stimulus is informative. On other trials, it is not. Under those circumstances, the optimal decision boundaries vary over time within a trial, and ultimately converge to zero. To maximize one’s rewards, one should start out trying to be accurate and withhold a decision until sufficient evidence has accumulated to cross a boundary. Should that fail, and should one’s state still be in the gray area after 10 or more samples, then it becomes increasingly likely that the stimulus is non-informative on that trial, in which case one should ultimately respond at random. This point is reached if one is still uncertain after about 40 samples.

Malhotra and colleagues explored numerous different aspects of their model and of different task environments. The upshot of their extensive analysis was that the critical determinant of whether or not boundaries should move over time during a single trial was the mixture of difficulty across trials. Whenever difficulties were inter-mixed, under some conditions the optimal boundaries became time-dependent. Intriguingly, they did not always converge, as in the above figure, but in some circumstances diverged over time as well.

Lest one think that this analysis is entirely theoretical, and therefore simply an exercise in describing optimality, Malhotra and colleagues showed in a companion paper that people are surprisingly (though imperfectly) sensitive to the possibilities for maximizing reward that are afforded by letting decision boundaries be time-dependent within a trial.

Psychonomics article focused on in this post:

Malhotra, G., Leslie, D. S., Ludwig, C. J. H., Bogacz, R. (2017). Time-varying decision boundaries: insights from optimality analysis. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1340-6.

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like