Keywords

1 Introduction

Realizing an effective human-agent team (HAT) is a challenging problem for several reasons [1]. A common approach to bridging the wide gap in characteristics and capabilities between human and nonhuman agents is to force the computer agents to emulate human communication patterns, effectively limiting the data stream such that a human can easily parse it. In many contexts, operating solely at the speed of traditional human input and output is sufficient; however, some HATs could benefit greatly from a faster interface. Treating the human as a sensor is one way to speed up communication from human to agent.

Whether reading real-time physiological signals or viewing human output with varying levels of confidence, modeling the human as a sensor allows computer agents to take action beyond responding to discrete commands [2]. By evaluating the past, present, and future state of this human sensor, many decisions and actions can be performed at the agent’s pace, rather than the human’s. One instance of this would be the application of the rapid serial visual presentation (RSVP) technique to the context of target detection.

Finding target images (e.g., images of threats) in a large collection of video and imagery data is a difficult problem. While computer vision algorithms can perform this task very rapidly and well in many environments, humans are currently far faster at adapting to significant contextual changes in the task (e.g., the landscape shifts from rural to urban, targets of interest change from fighter aircraft to insurgents carrying concealed weapons, etc.). A HAT can leverage the strengths of the human and the agent for target detection [3], and RSVP has been shown to enable humans to find target images much faster than self-paced viewing [4, 5].

In applying RSVP to a HAT’s target detection task, the goal is to identify targets in the database of images the agent has labeled with significant uncertainty. However, some of the uncertainty can be reduced, if the factors that affect human performance in RSVP paradigms are known. Task performance can be characterized in experiments with known image labels. Unfortunately, it can be challenging to quantify such performance when response time variability exceeds the inter-stimulus interval [4, 6]. RSVP target detection performance is commonly characterized by hit rate (HR) and false alarm rate (FAR), but determining whether a given response is a hit or false alarm requires attributing the response to a particular stimulus, and response time variability makes such attribution difficult.

The need to determine whether a response is a quick reaction to the most recent stimulus or a slow reaction to the prior stimulus has given rise to a couple common attribution methods. The windowing method establishes a window of time after each target stimulus and determines any response in that window to be a hit. Alternatively, the distribution approach estimates a response time probability density function (RT-PDF) to assign responses to stimuli. Previously, we have described a regression method that builds on the distribution approach to provide better estimates of HR and FAR, particularly when the stimulus presentation rate is high and/or the FAR is non-zero [7]. Unfortunately, this regression method uses a heuristic based on the windowing method to estimate the RT-PDF, so it tends to lose performance under the same conditions that cause the windowing method to perform poorly.

Here, we introduce a maximum likelihood estimation (MLE) method that generally outperforms both the windowing method and the regression method, supplying more accurate estimates of the HR, FAR, and RT-PDF. These more accurate estimates of target detection performance can also improve the detection of target images in applications where it’s not known in advance which images are targets or nontargets.

2 Methods

2.1 Estimation Methods

Windowing and distribution are two classes of methods commonly used to estimate HR and FAR in RSVP target detection tasks. Previously, we described a regression method to improve such estimates [7]. Now we describe an MLE method to further improve on HR and FAR estimation.

Established Methods

Windowing and Distribution. The windowing approach labels as a hit any response that falls within a certain window of time (typically from 0 to 1000 ms post-target) after a target stimulus. Any responses that don’t match that criteria are labeled as false alarms. The number of targets hit is then divided by the total number of targets to compute the HR, while the FAR is determined by dividing the number of false alarms by the number of nontarget stimuli. Different implementations of this method vary in how responses are scored when more than one response falls within a response window and when a response falls within more than one response window.

The distribution approach makes use of an estimate of the response-time distribution to assign responses to specific stimuli [8]. The estimated value of a given stimulus’s RT-PDF at the time of a response determines the likelihood that the response was caused by that stimulus. The likelihood for each potentially causal stimulus is divided by the sum of the likelihoods for all such stimuli for normalization [9]. Figure 1 depicts these two methods.

Fig. 1.
figure 1

Timelines illustrating existing response assignment methods. Blue, red, and green hash marks represent onset times of non-target stimuli, target stimuli, and responses, respectively. (Color figure online)

Regression Method.

Our previously described regression method is based on the distribution method using an apportionment function. The distribution method produces biased estimates of HR and FAR, and the size and direction of those biases depend on the true HR and FAR. By regressing out those errors, we get better estimates. We have shown previously how the regression method outperforms the windowing and distribution methods, but it has two drawbacks. First, the regression method assumes normal error variance of the apportionment function, but that is not tenable, given that it is bounded between 0 and 1. Second, it uses a heuristic derived from the logic of the window method to estimate the response-time distribution, which can introduce inaccuracies from that method.

Proposed Method.

Here we detail using an MLE to improve HR and FAR estimation beyond the regression method. In order to carry out MLE, a model is needed that provides the probability of getting a particular result given some parameter values. The results of an RSVP experiment are the set of times (B) at which a button was pressed. The probability that a button press occurs at a given time depends on the probability that a preceding stimulus elicits a response and the probability that a response to a particular stimulus would land at that time.

The probability that a stimulus (S i ) elicits a response is modeled as the hit rate (h) when the stimulus is a target (i.e., s i T) and the false alarm (f) rate when the stimulus is a nontarget (i.e., s i NT). The response time probability density function is modeled as an ex-Gaussian [10], which describes the probability density function of the sum of a Gaussian random variable and an exponential random variable. It therefore has three parameters: μ and σ describing the Gaussian variable and τ describing the exponential variable.

To summarize, the model is parameterized by h, f, μ, σ, and τ. Let θ ={h, f, μ, σ, τ}. The probability that a given stimulus s i elicits a response at time t parameterized by θ is:

$$ P\left( {s,t;\theta } \right) = \left\{ {\begin{array}{*{20}c} {0 , \delta t \le 0 } \\ {h \times f\left( {\delta t;\mu , \sigma ,\tau } \right), s_{i} \in T} \\ {f \times f\left( {\delta t;\mu , \sigma ,\tau } \right), s_{i} \in NT} \\ \end{array} } \right., $$
(1)

where f(δt; μ, σ, τ) is the ex-Gaussian density function, and δt is the difference between the onset of s i and t. To compute the probability that one (or more) responses occurred at a time, we compute the complement of the probability that no response occurs at that time:

$$ P_{r} \left( {t;\theta } \right) = 1 - \prod\nolimits_{i} {1 - P\left( {s_{i} , t;\theta } \right).} $$
(2)

With this function, we can compute the probability of an overall result B as:

$$ P\left( {B;\theta } \right) = \prod\nolimits_{{t \in {\text{B}} }} {P_{r} \left( {t;\theta } \right) \times \prod\nolimits_{{t \notin {\text{B}} }} {1 - P_{r} \left( {t;\theta } \right).} } $$
(3)

To compute the second term in the preceding equation, we discretize time by dividing the entire time-course of the experiment into windows of time (time bins) spanning 10 ms each. We have found that increasing the time resolution beyond this value has limited practical effect and increases analysis time dramatically.

The hit rate and false alarm rate are related via the total number of responses:

$$ h \times N_{T} + f \times N_{NT} = N_{B} . $$
(4)

The number of targets (N T ) and the number of non-targets (N NT ) are known. The number of button press responses in the result, N B , can be considered fixed. As a consequence, only h need be estimated rather than both h and f. This simplifying assumption reduces the number of parameters in the model, reducing variance of the estimate.

Given this model of the probability of a result parameterized by \( h,f,\mu ,\sigma ,\tau \) and with a fixed N B , a minimizer is used to find the values of the parameters \( \theta \) that minimize \( - \ln P\left( {B;\theta } \right). \) We used MATLAB’s fminsearch, using a logistic linker function for h and log linker for \( \mu ,\sigma , \) and \( \tau . \) Initial values for these parameters were drawn randomly from a beta function with parameters 2 and 0.65 for h and from lognormal distributions with mean and standard deviation of -1.5, 0.4; -1.5, 0.3; and -2.2, 1 for parameters \( \mu ,\sigma , \) and \( \tau \), respectively. Because fminsearch is not guaranteed to arrive at a global minimum, it should be run several times with different initial values to ensure convergence. In our simulations, we compared the log likelihood of the solution obtained from fminsearch to the log likelihood of the simulated parameters, and if the likelihood of the estimate was more than 1 log unit worse than the likelihood of the true parameters, the minimizer was re-run. A MATLAB implementation of this MLE method will be available at https://github.com/btfiles/RPE.

2.2 Evaluating the Methods

To evaluate the MLE method, we ran simulations to compare its performance against the traditional windowing method and our previous regression method. This consisted of simulating responses based on known HRs and FARs and then analyzing the resulting data with all three methods to determine the accuracies with which they recover the HR and FAR for various realistic true values. The 25 conditions simulated were comprised of five values of HR (0.50, 0.75, 0.90, 0.95, and 1.00) combined with five values of FAR (0.1, 0.05, 0.01, 0.005, and 0.001), and each condition was simulated 500 times to collect performance statistics. All simulations and analyses were performed using custom scripts in MATLAB versions 2015a and 2017a (MathWorks, Natick, MA).

Simulating the Responses.

For each simulation, we generated a randomized RSVP experiment with a stimulation rate of 4 Hz, a target proportion of 10%, and a total stimulus count of 2400. Then, a random subset of all targets and non-targets was selected to generate responses such that the simulated rate of each block was as close as possible to the HR and FAR for the condition, while still having a whole number of responses. When a response was generated, a random draw was taken from an ex-Gaussian distribution (where μ = 0.35, σ = 0.10, and τ = 0.25), and a response event was added at that time after the generating stimulus.

Analyzing the Simulations.

After simulating all of the responses necessary to generate the target HRs and FARs, the resulting stimulus and response timelines were analyzed using the windowing, regression, and MLE methods described above. Each simulation generated an estimate of HR and FAR from each method, and estimates of the response-time distribution parameters were generated by the regression and MLE methods. Each of these estimates was compared against the known simulated value to assess estimation error. Estimation error is summarized as root mean squared error (RMSE) and mean estimation error. The former provides an estimate of how far the estimator is likely to be from the true value, and the latter indicates whether the estimator is biased toward positive or negative errors.

3 Results

3.1 Performance Per Condition

For each simulation, the estimation errors for each method were calculated as the estimated rate minus the “true” simulated rate. The resulting 500 error values per rate type per condition per estimation method are depicted in Fig. 2 as box plots. Similarly, the error values for the regression and MLE methods’ estimations of the three response-time distribution parameters are shown here.

Fig. 2.
figure 2

Estimation errors of windowing, regression, and MLE methods for each condition.

For many conditions, the MLE method provides superior estimates of the HR and FAR, but it tends to have a wider range of variation in cases where the HR is 90% or higher and the FAR is 5% or higher.

3.2 Aggregate Performance

Averaging across all conditions, we obtain the aggregate performance shown in Fig. 3 and Table 1. While the windowing and regression methods tend to overestimate the HR (respective errors 8.771 × 10−3 and 1.6457 × 10−2) and slightly underestimate the FAR (respective errors −9.0 × 10−4 and −5.8 × 10−4), the MLE method estimates the HR and FAR with almost no bias either way (respective errors −2.52 × 10−3 and 3.58 × 10−4).

Fig. 3.
figure 3

Estimation error collapsed across conditions. The appearance of so many outliers is reasonable, given that each method’s box plot is comprised of 12,500 estimations.

Table 1. Average estimation error collapsed across conditions. The response-time distribution parameters are not estimated by the window method.

Table 2 gives RMS errors for the various estimates, showing that the MLE method’s HR and FAR estimates are the most accurate, with respective RMS errors of 0.021508 and 0.002393, when compared to the windowing and regression methods’ RMS errors for HR (0.032351 and 0.027795, respectively) and FAR (0.003542 and 0.00243, respectively). Also, the MLE method provides more accurate estimates of all three response-time distribution parameters than the regression method. While both methods provide similar estimates of mu, the regression method exhibits noticeably worse performance when estimating sigma or tau.

Table 2. RMSE of estimates collapsed across conditions. The response-time distribution parameters are not estimated by the window method.

4 Discussion

The purpose of these simulations was to measure the MLE method’s performance at recovering the true simulated HR and FAR, and the results above show that the MLE outperforms both the classic windowing method and the regression method. While all three methods can exhibit estimation error in excess of 10%, these inaccuracies are not equivalent.

The windowing method tends to overestimate the HR when the true HR is relatively low and/or the true FAR is relatively high. This is to be expected, thanks to its simplistic benefit-of-the-doubt approach. This approach is vulnerable to incorrectly classifying false alarms as hits when they fall too close to a target. On the other hand, this method performs very well when the human makes very few mistakes, such that the true FAR is very low.

While the regression method generally makes smaller estimation errors than the windowing method, it is still somewhat susceptible to overestimating the HR in the same situations that the windowing method does. This is because the regression method estimates the RT-PDF using a windowing-based heuristic, which partially counteracts the gains from using linear regression to correct such systematic errors.

The MLE method provides even more accurate HR and FAR estimates than the regression method, in part because it can simultaneously estimate the HR, FAR, and RT-PDF. Furthermore, the regression method is hampered by its treatment of non-Gaussian error as Gaussian. Overall, this results in the MLE method providing the most accurate performance estimates, which more strongly resist the effects of varying true values of HR and FAR.

4.1 Assumptions and Tradeoffs

The MLE method makes some assumptions based on the data model, including a constant hit rate, a fixed number of responses, and response times which are identically and independently distributed as an ex-Gaussian. However, we know that responses aren’t strictly independent because simultaneous responses are treated as a single response and because the attentional blink phenomenon typically prevents the human from perceiving images that appear shortly after a target image [11, 12]. Furthermore, HR and FAR probably aren’t strictly constant over the course of an experiment, since some targets are simply easier to detect than others.

In cases of very high HR and very low FAR, the regression method can appear to outperform the MLE method. This is likely an effect of the MLE representing proportions as the logistic transformation of real numbers, which maps positive and negative infinity to zero and one, respectively. This means that a minimizer can achieve results that are arbitrarily close to zero and one, but it will never equal those values. In contrast, the regression approach simply truncates estimates to keep them between 0 and 1. This truncation can make the regression method appear more accurate.

When the true HR is very high, the true FAR is very low, and responses are mostly faster than 1 s, the human’s performance upholds the heuristic of the windowing method very well. This enables the windowing method to provide highly accurate estimates of HR and FAR that outperform the MLE method. However, in many contexts, such high human performance is unlikely, and it would indicate that the human’s throughput was not even approaching capacity. Thus, based on its overall better estimation of HR and FAR, the MLE method proposed here would seem the best choice for estimating HR and FAR

4.2 Application

In real-world applications, such as improving a HAT tasked with target detection, the goal of the RSVP target detection paradigm is likely to identify target images from a large set of imagery. Not knowing the target status of the images precludes direct application of the MLE method. However, previous solutions to this problem, such as a Bayesian formulation to estimate target probability [8], would benefit from the more accurate HR and FAR estimates provided by the proposed MLE method. Furthermore, the more accurate estimate of the response time distribution derived from the MLE method should improve the performance of image classification based on button presses using methods derived from the distribution method.