The effect of reward on listening effort as reflected by the pupil dilation response

ABSTRACT Listening to speech in noise can be effortful but when motivated people seem to be more persevering. Previous research showed effects of monetary reward on autonomic responses like cardiovascular reactivity and pupil dilation while participants processed auditory information. The current study examined the effects of monetary reward on the processing of speech in noise and related listening effort as reflected by the pupil dilation response. Twenty‐four participants (median age 21 yrs) performed two speech reception threshold (SRT) tasks, one tracking 50% correct (hard) and one tracking 85% correct (easy), both of which they listened to and repeated sentences uttered by a female talker. The sentences were presented with a single male talker or, in a control condition, in quiet. Participants were told that they could earn a high (5 euros) or low (0.20 euro) reward when repeating 70% or more of the sentences correctly. Conditions were presented in a blocked fashion and during each trial, pupil diameter was recorded. At the end of each block, participants rated the effort they had experienced, their performance, and their tendency to quit listening. Additionally, participants performed a working memory capacity task and filled in a need‐for‐recovery questionnaire as these tap into factors that influence the pupil dilation response. The results showed no effect of reward on speech perception performance as reflected by the SRT. The peak pupil dilation showed a significantly larger response for high than for low reward, for the easy and hard conditions, but not the control condition. Higher need for recovery was associated with a higher subjective tendency to quit listening. Consistent with the Framework for Understanding Effortful Listening, we conclude that listening effort as reflected by the peak pupil dilation is sensitive to the amount of monetary reward. HighlightsThe effect of monetary reward on listening to speech in noise was investigated.Participants were told that they could earn a high or low monetary reward.No effect of monetary reward on speech perception performance was observed.Peak pupil dilation was larger for high than for low monetary reward.Results suggest listening effort to be sensitive to the amount of monetary reward.


Introduction
Talking to a friend is often considered to be rewarding, and this is what motivates people to initiate a conversation and what usually keeps them motivated to stay engaged while continuing talking, even when listening is effortful. According to the Framework for Understanding Effortful Listening (FUEL, Pichora-Fuller et al., 2016), 'when and how much effort we expend during listening in everyday life depends on our motivation to achieve goals and attain rewards of personal and/or social value'. When listening becomes too demanding or when we do not recover from high levels of effort while listening we may lose motivation (Brehm and Self, 1989;Pichora-Fuller et al., 2016).
The motivational intensity theory (Brehm and Self, 1989) states that motivational arousal occurs when a task is sufficiently difficult, within one's capacity, and is justified by the magnitude of reward. When a task becomes too difficult (exceeds capacity), there will be little or no mobilization of energy (no effort will be put in). The greater the reward, the greater the amount of energy (effort) a person is willing to mobilize. A high or low reward assigned before the start of a block (Richter and Gendolla, 2009) can affect the level of motivation. Richter (2016) examined the effect of monetary reward on effort-related cardiovascular reactivity, indexing sympathetic nervous system activity, while participants performed an auditory discrimination task. The results showed an effect of reward on pre-ejection period (PEP) reactivity, an indicator of sympathetic activity, in the difficult task condition. The pupil dilation response, also reflecting autonomic nervous system activation, is similarly sensitive to reward (Bijleveld et al., 2009;Knapen et al., 2016). In a study by Bijleveld et al. (2009) participants had to listen to, memorize, and report back 2 or 5 digits, while each trial was preceded by high or low monetary reward. Their pupil response was significantly larger for the high reward, but only for the difficult 5-digit condition. These outcomes indicate that the effect of reward can be measured objectively by the assessment of autonomic cardiac responses and pupil dilation.
Listening effort is defined by FUEL as the deliberate allocation of resources, as reflected by pupil dilation, to overcome obstacles in goal pursuit when carrying out a listening task (Pichora-Fuller et al., 2016). The allocation of more task related resources results in a larger pupil dilation response (Kahneman and Beatty, 1966). The pupil response is an autonomic response related to activity balance of the sympathetic and parasympathetic nervous systems (Kahneman, 1973;Loewenfeld and Lowenstein, 1999). The pupil response to speech-in-noise processing is widely used as an objective measure of speech processing load (listening effort) (e.g., Kuchinsky et al., 2012;Piquado et al., 2010;Winn et al., 2015; for a review see, . Mean pupil dilation (MPD) reflects the average processing load in a specified time window while peak pupil dilation (PPD) reflects the maximum processing load (Zekveld et al., 2011). Hence, both MPD and PPD reflect changes in listening effort, but theoretically the MPD has higher sensitivity to changes in duration of effortful listening. PPD latency has been found to be related to the speed of cognitive processing (e.g., Hy€ on€ a et al., 1995) and the baseline pupil size prior to the pupil response is considered to reflect an autonomic response that provides information about an individual's state of arousal in anticipation of the amount of cognitive resources needed for the task at hand (e.g., Aston-Jones and Cohen, 2005).
Research shows that high levels of fatigue (McGarrigle et al., 2016;Wang et al., 2018) are associated with a smaller pupil dilation response. Wang et al. (2018) showed a negative correlation between need for recovery (NFR) and the PPD as measured during processing of speech in noise, indicating that lower NFR was associated with a larger pupil response. As explained by Wang et al. (2018), NFR can be regarded as an intermediate state between exposure to stressful situations at work and daily life fatigue. According to FUEL, fatigue can affect how we evaluate our task demands, which could affect the available capacity of cognitive resources. In line with Wang et al. (2018), FUEL predicts that a high level of fatigue results in a decreased available capacity of resources in order to preserve energy (Pichora-Fuller et al., 2016). Additionally, FUEL predicts that high levels of fatigue lower the motivation to achieve goals. Hence, the NFR questionnaire as introduced by van Veldhoven and Broersen (2003), shown in Table 1 of their study, was included in this study.
Finally, according to the motivational intensity theory (Brehm and Self, 1989), the level of motivational arousal (effort) has to occur within one's capacity. WMC and the ability to inhibit irrelevant linguistic information, associated with speech performance (capacity), can be measured with the size-comparison span (SICspan) task (S€ orqvist et al., 2010;S€ orqvist and Ronnberg, 2012). Interestingly, individual differences in SICspan performance have also been shown to affect the pupil response (Koelewijn et al., 2012b;Wendt et al., 2016). Participants with a larger WMC and better ability to inhibit irrelevant information showed a larger PPD when processing speech masked by speech. Hence, we were interested in whether WMC is related to the effect of reward. To investigate this, the SICspan task was included in this study.
The main purpose of this study was to investigate whether motivation has a mediating effect on listening effort as reflected by the pupil dilation response. Therefore, we tested the effect of reward on the perception of speech masked by a single talker at relatively easy and difficult intelligibility levels, and in a control condition using speech in quiet. We also examined the effect of reward on the simultaneously recorded pupil dilation response. We hypothesized improved performance and a larger PPD (more effort) in the high reward than in the low reward condition. In addition, based on the results of Richter (2016), we hypothesized that the effect of reward would be strongest in the hard listening condition. Note that based on the Motivational Intension Theory (Brehm and Self, 1989), a 'hard' condition, although resulting in performance below a required minimum score, should not be so difficult that participants give up on a task, while an 'easy' condition resulting in higher performance levels should also not be too easy, as people tend to become easily distracted. Ohlenforst et al. (2017) showed an inverted U-shape curve with the largest PPD around 50% intelligibility and an intermediate PPD at approximately 85% intelligibility for speech masked by a single talker. Speech processing in quiet should result in a small PPD and should show no effect of reward. Additionally, in line with FUEL (Pichora-Fuller et al., 2016) we expected participants with a relatively high NFR and/or a smaller WMC to show a smaller PPD (Wang et al., 2018) and a smaller effect of reward, on the PPD. This is the case because low capacity and high fatigue can lower our motivation to put effort into a task when demands are high.

Participants
Twenty-four normal hearing adults (8 males, 16 females), recruited at the VU University and VU Medical Center, participated in the study. Ages of the participants ranged from 18 to 52 years with a median age of 21 years. The sample size was based on a moderate effect size of (intentional) attention related processes on the PPD, as observed in a previous study . Normal hearing was defined as pure-tone thresholds less than or equal to 20 dB HL at the octave frequencies 0.25e4 kHz. Participants' pure-tone hearing thresholds averaged over both ears and over the octave frequencies 1e4 kHz (three-frequency pure-tone average), ranged from À2.5e10 dB hearing level (HL) (mean ¼ 3.2 dB HL, standard deviation (SD) ¼ 3.6 dB). Participants had no history of neurological diseases and reported normal or corrected-to-normal vision. They were native Dutch speakers and provided written informed consent in accordance with the Ethics Committee of the VU University Medical Center, Amsterdam.

SRT task and reward
Speech perception was measured using the adaptive speech reception threshold (SRT) task (Plomp and Mimpen, 1979). Target sentences were everyday Dutch sentences (Versfeld et al., 2000), uttered by a female talker. An example of an everyday sentence is 'Hij maakte de brief snel open', which directly translates to 'He quickly opened the letter'. The sentences were masked by a single male talker, at two difficulty levels (further referred to as 'easy' and 'hard'), or were presented in quiet (control condition). Participants were asked to repeat each sentence. For the 'easy' and 'hard' conditions, the target intelligibility was 85% and 50% correct sentence recognition, respectively. These conditions were presented in a blocked fashion. Importantly, while participants were unaware of the intelligibility levels at the start of each block, they were informed about the difficulty of each condition (i.e. hard, easy, or control), and were told that they could earn a high (5 euros) or low (0.20 euro) reward when repeating 70% or more of the sentences correctly. A within-subject design with six blocks was applied: intelligibility condition (3 levels) x reward (2 levels). Each block contained 25 trials (sentences) and during each trial, the pupil diameter was recorded.
The single-talker masker contained concatenated sentences from another set uttered by a male talker. The masker had a longterm average frequency spectrum identical to that of the target speech signal. The value of the SRT (dB SNR), was estimated separately for each reward level, for speech presented at 50% and at 85% target intelligibility levels (sentences correct) using a weighted updown method (Kaernback, 1991). The sentence was scored as correct only if each word was repeated correctly and in the right order. For each condition, the target speech level was fixed at 55 dB SPL. The onset of the masker was 3 s prior to the onset of the target sentence and continued for 3 s after the offset of the target sentence. The length of each trial co-varied with the length of the presented sentence, which had a mean duration of 1.84 s (range 1.35e2.70 s). At the end of the trial a 1000-Hz prompt tone was presented for 0.5 s after which participants were instructed to respond. Manipulation of intelligibility level and reward level resulted in a total of six conditions that were presented in a blocked fashion. Each block contained 25 trials and the order of the blocks was counterbalanced (Latin square) over participants. Prior to the experiment, participants were familiarized with the easy and hard listening conditions by listening and responding to 10 practice sentences each.
During and after performing the SRT tasks, listeners did not receive any feedback. After each block, participants were asked to rate their effort, performance, and tendency to quit performing the task (Koelewijn et al., 2012a;Zekveld et al., 2010). For the effort rating, participants indicated how much effort it took on average to perceive the speech during the last block. This was rated on a visual analogue scale from 0 ('no effort') to 10 ('very effortful'). For the performance rating, they were asked to estimate the percentage of sentences they had perceived correctly. This was rated from 0 ('none of the sentences were intelligible') to 10 ('all sentences were intelligible'). Finally, participants were requested to indicate how often during the last block they had abandoned the listening task because the task was too difficult. This was rated from 0 ('this happened for none of the sentences') to 10 ('this happened for all of the sentences').

SIC-span task
The SICspan (S€ orqvist et al., 2010; S€ orqvist and Ronnberg, 2012), is a visual task that measures WMC and the ability to inhibit irrelevant linguistic information. During this task participants were asked to make relative size judgments between of items (e.g., Is LAKE bigger than SEA?) by pressing the 'J' key for yes and 'N' for no on a QWERTY keyboard. Each question was followed by a single word they had to remember, which was semantically related to the object items in the sentence (e.g., RIVER). Sentences and words were presented on screen in black (target words, upper case Verdana, vertical visual angle of 0.88 ; non-target words, lower case Verdana, vertical visual angle of 0.71 ) on a light grey background. Ten sets containing two to six size comparison questions were presented in ascending order. After completion of a set, participants were asked to verbally recall the to-be-remembered words in order of presentation. Because the size comparison items and to-beremembered words (within each set) were from the same semantic category, the size-judgment items from the questions had to be inhibited while recalling the to-be-remembered words. Between sets, the semantic categories differed. The SICspan score used in this study was the total number of correctly remembered items independent of order, which leads to a maximum score of 40. The higher the score the better the performance.

Need for recovery scale
The NFR scale (see Table 1, van Veldhoven and Broersen, 2003) was used to assess NFR after work. Participants had to respond 'yes' or 'no' to 11 statements related to how they feel at the end of a working day. For example, "I find it difficult to relax at the end of a working day" or "When I get home from work, I need to be left in peace for a while". The total NFR score (as a percentage) was calculated by dividing the number of 'yes' responses by the total number, after which the outcome was multiplied by one hundred. A higher score represents a higher NFR.

Apparatus and procedure
All testing was performed in a sound treated room. After recording the participant's audiogram and testing near vision acuity, participants filled in the NFR questionnaire, and they performed the SIC-span and SRT tasks. During the SIC-span and the SRT tasks, participants were seated in front of a computer screen (Dell, 17 inch) at 65 cm viewing distance. During the SRT test, the pupil diameter of both eyes was measured at a 60 Hz sampling rate using an infrared eye tracker (SMI RED250mobile System). The light intensity of the LEDs attached to the ceiling of the room was adjusted by a dimmer switch such that, for each participant, the pupil diameter was around the middle of its dynamic range as measured by examination of the pupil size at 0 lx and 750 lx. For the SRT task, audio in the form of wave files (44.1 Hz, 16 bit) was presented diotically by an external soundcard (asus Xonar Essence One) through headphones (Sennheiser, HD 280, 64 U). All tests were presented by a lap-top computer running Windows 10 (HP ZBook, SMI RED250mobile System). The whole procedure, including measurement of pure-tone hearing thresholds, near vision acuity, performing the SIC-span task, calibrating the eyetracker, practicing and performing the SRT tasks, and a 10-min break halfway through the SRT task took 2 h. At the end of the session participants were informed about their performance on the SRT task. They received 10.40 euros reward in addition to the 7.50 euros hourly rate.

Pupil data analysis
The first five trials of each block were excluded from the analyses. For the pupil diameter traces for the remaining 20 trials per condition, zero values within the time window of 1 s before and 4.3 s after sentence onset were coded as blinks. Traces in which more than 20% of their duration consisted of blinks were excluded from further analysis (2.8% of all trials). For the remaining traces, blinks were removed by linear interpolation between the fifth sample before and eighth sample after the blinks. The x-and ycoordinate traces of the pupil center (reflecting eye movements) were "deblinked" by application of the same procedure. Trials for which these coordinate traces contained eye movements within the time window of 1 s before and 4.3 s after sentence onset and deviating more than 10 from fixation on the x-or y-axis were removed from analysis (3.8% of the remaining trials). A five-point moving average smoothing filter was passed over the de-blinked pupil traces to remove any high-frequency artifacts. All remaining traces were baseline corrected by subtracting the trial's baseline value from the value for each time point within that trace. This baseline value was the mean pupil size within the 1-s period prior to the onset of the sentence, when either listening to the speech masker alone (hard and easy conditions) or no sound (quiet conditions). The baseline period is shown by the left and middle dotted vertical lines in both plots in Fig. 2. Average traces in each condition were calculated separately for each participant. Within the average trace, MPD (mm) was defined as the average pupil dilation relative to baseline within a time window ranging from the start of the sentence to the start of the response prompt, shown by the middle and right dotted vertical lines in both plots in Fig. 2. Within this same time window, the PPD (mm) was defined as the largest value relative to the baseline. The latency of the PPD (ms) was defined relative to the sentence onset. Finally, for each participant and each condition the average pupil diameter at baseline was calculated.

Statistics
For all dependent behavioral (SRT), pupil (MPD, PPD, PPD latency, baseline), and self-rated (effort, performance, and quitting) variables we performed 2 Â 2 analyses of variance (ANOVA) with condition (easy and hard) and reward (high and low) as the repeated measures within-subject variables (Table 3). Since no SRTs were measured for the control conditions, the dependent variables that were measured in quiet were analyzed separately by means of two-sided paired-samples t-tests. For the correlation analysis between the SRTs, PPDs, and the self-rated variables, these were first averaged over all masked conditions. Additionally, for the PPD difference score was calculated for the effect of reward by subtracting the average score for the low reward conditions from the average score for the high reward conditions. Control conditions were excluded from these calculations due to ceiling effects on the rating scores. Pearson correlation coefficients were calculated to assess the relationships between the resulting average rating scores, PPD difference scores, and the SICspan scores. Finally, a nonparametric Spearman's r was calculated to examine each relationship between the resulting average values and difference scores and the NFR scores, as the distribution of the NFR scores was skewed.

Results
For the SICspan task, the average score was 30.0 (SD ¼ 3.3). The average NFR score was 17.8% (Median ¼ 9.1%, SD ¼ 18.5%). Average SRTs and subjective ratings as a function of reward for all conditions (easy, hard and control) are presented in Table 1 and average pupil measures as a function of reward for all conditions are presented in Table 2. Average SRTs are plotted in Fig. 1, and average pupil traces over participants for each condition are plotted in Fig. 2.

Control conditions
No effect of reward was observed for any of the parameters (all t < 1) for the control (speech in quiet) conditions. Mean performance for speech reception in quiet was 99.8% whole sentences correct for both reward conditions.

SRTs
Analysis of the SRTs (see Table 3) revealed a significant main effect of task difficulty (F [1,23] ¼ 168.35, p < .001), as indicated by the lower SRTs for the hard (50%) than for the easy (85%) condition. No significant main effect of reward (F < 1) or interaction effect between reward and task difficulty (F < 1) was found.

Pupil measures
Analysis of the MPDs (see Table 3) revealed a significant main effect of task difficulty (F [1,23] ¼ 4.67, p ¼ .041). No significant main effect of reward (F [1,23] ¼ 3.71, p ¼ .067) or interaction (F < 1) was found. Analysis of the PPDs revealed a significant main effect of task difficulty (F [1,23] ¼ 5.51, p ¼ .028) and a main effect of reward (F [1,23] ¼ 4.30, p ¼ .049). No interaction (F < 1) was found. A larger PPD for the high than for the low reward condition was observed. Apart from the trend for an effect of task difficulty on the pupil baseline (F [1,23] ¼ 3.43, p ¼ .077), there were no significant main effects or interactions for the PPD latency and pupil baseline (see Table 3).

Self-rated scores
Self-rated effort, performance, and quitting all showed a significant main effect of task difficulty (see Table 3), indicating that the hard conditions were rated as more effortful and resulted in lower performance and a higher quitting rate than the easy conditions. No significant main effect of reward or interaction effect was found.

Correlation analyses
The SICspan and NFR scores were not correlated with one another or with the average SRT and PPD (see Table 4). For the selfrated scores, there was a positive correlation (Spearman's r ¼ 0.46, p < .05) between NFR and quitting rate, such that participants with a higher NFR reported a higher quitting rate. The PPD reward difference score (i.e. the difference between the low and high reward conditions) was not correlated with the SICspan (p ¼ .945) or NFR (p ¼ .887) scores.

Discussion
The results showed a significantly higher PPD for the high reward than for the low reward when participants processed Table 1 Average speech reception threshold (SRT) and self-rated effort, performance, and quitting scores, for each reward level for the easy, hard and control (quiet) conditions. Shade shows the speech in quiet conditions.

Table 2
Average mean pupil dilation (MPD), peak pupil dilation (PPD), PPD latency, and pupil baseline for each reward level for the easy, hard and control (quiet) conditions. Shade shows the speech in quiet conditions. speech masked by a single talker. This effect occurred in the absence of an effect of reward on the SRT. This means that reward led to an increase in effort without any measured behavioral change. The effect of task difficulty was reflected by larger MPD and PPD values for the hard than for the easy condition (e.g., Zekveld et al., 2010). In contradiction to the Motivational Intensity Theory (Brehm and Self, 1989), the results showed no significant interactions, i.e. stronger effect of high reward than low reward in the 'hard' listening condition than in the 'easy' condition. Self-rated effort, performance, and quitting were all affected by task difficulty but not by reward. Interestingly, the correlation analysis revealed that participants with a low NFR reported a lower quitting rate. However, no relation was found between NFR and the PPD, while this relationship was observed by Wang et al. (2018).
The current results demonstrate that monetary reward influences the pupil response. Monetary reward is known to affect motivation (Brehm and Self, 1989), and according to FUEL (Pichora-Fuller et al., 2016), listening effort can be modulated by changes in motivation. Despite the effect of reward on the PPD, no behavioral effect of reward was observed. Speech perception is largely automatic and highly efficient, so trying even harder will not result in improved performance. Still, the control system responsible for the allocation of resources could be increasingly activated. However, Carver (2006) made a distinction between the fulfillment of goals, which is driven by motivation and may apply to finishing the task at a sufficient performance level, and feelings related to sensing one's rate of progress. Based on this, an alternative explanation of the observed effect is that the effect of monetary reward on the PPD reflects arousal partly related to positive feelings rather than just motivation (see Chiew, 2011). However, for positive feelings to occur during the task, one needs trial-by-trial feedback in order to monitor performance and perceive that the task is done better than required (Carver, 2006). Since in this study the level of reward was only mentioned at the start of a block (Richter, 2016), and no feedback on performance was provided, participants received no information about their progress. Additionally, there was no effect of reward on self-rated performance or quitting. Hence, we don't consider the effect of reward on the PPD as resulting from positive feelings, but rather from motivation. Still, positive emotions instead of motivation cannot be ruled out as an explanation for the current results and this is something to take into account in future research. Note, the lack of feedback, in contrast to the study of Richter (2016) that provided feedback on a trial-by-trial basis, might also explain the absence of reward-related behavioral change in the current results.
There was an effect of reward on the PPD for speech in a background talker but not for speech in quiet. Less expected, and not consistent with Richter (2016), there was an effect of reward on the PPD for the 85% intelligibility condition, which was clearly above the 70% required to obtain the reward. The fact that participants underestimated their performance, as shown by their average performance rating of 7 on a 10-point scale, suggests that the easy condition was perceived as more difficult than it actually was. This, may have warranted more motivational arousal (Brehm and Self, 1989) and therefore no interaction between reward and task difficulty for the masked conditions. Note, that early pupillometry research by Kahneman et al. (1968) did show an effect of reward on the pupil response during performance of an easy task. However, in the study of Kahneman et al. (1968) participants were rewarded on a trial-by-trial basis and therefore the observed response might have reflected positive feelings rather than motivation (Carver, 2006). As anticipated, reward was not reflected by the pupil baseline, as measured before sentence onset. This suggests that reward does not necessarily affect an individual's state of arousal (e.g., Aston-Jones and Cohen, 2005). Still, there was a trend for the effect of difficulty on the pupil baseline, which is in line with previous studies showing an increased baseline for difficult listening conditions (e.g., Koelewijn et al., 2014).
Although we did not observe a behavioral benefit for high compared to low reward on the SRT, other aspects of performance not captured by the SRT could have been affected (e.g. recall, Ng et al., 2013). This is an issue that deserves exploration in future research. Importantly, we now know that in sufficiently difficult listening conditions (50%e85% speech intelligibility at roughly À12 to À6 dB SNR) the PPD during speech processing is affected by the participants' level of motivation. We also know that when conditions become too difficult participants tend to give up (Ohlenforst et al., 2017). This should be considered both when designing an experiment and when interpreting the results. For instance, the level of listening effort can be modulated by motivation when a task is either too easy (e.g., speech in quiet) or too difficult, and differences in the pupil response between participants could be partly explained by differences in motivation.
There was a positive correlation between NFR and quitting rate. This suggests that people with a higher NFR are more likely to quit the task they are performing. According to FUEL, when demands get too high, one might no longer put effort into a task. The evaluation of demands can be affected by the level of fatigue. However, the expected decrease in PPD, as hypothesized and shown by Wang et al. (2018), was not observed in the current results. The absence of this effect can be explained by the fact that the NFR scale was validated for people who were occupationally active, as was the case for the participants in the study of Wang et al. (2018), and the scale might be less valid for the students included in the current study.
To conclude, consistent with the motivational intensity theory (Brehm and Self, 1989), we showed an effect of reward on listening effort (PPD) when the tasks were sufficiently difficult (masked vs. control), and NFR scores were correlated with quitting rate. SICspan scores were not correlated with any of the other outcome measures, suggesting that cognitive capacity for this homogeneous sample of participants did not influence the impact of reward. Table 3 Outcomes of SRT, pupil measures, and subjective ratings using 2 Â 2 ANOVAs with repeated measures 'task difficulty' (hard, easy) and 'reward' (high, low).  Importantly, one consequence of the current outcome, as pointed out by Richter (2016), is that in order to explain changes in the pupil response in terms of changes in listening effort or resource allocation (Pichora-Fuller et al., 2016), we need be aware of and acknowledge the mediating effects of motivation on resource allocation, that itself can be affected by manipulations of the independent variable under investigation. In other words, changes in motivation can account for changes in the pupil response and also for part of the observed variance in pupil size between people. Future research should investigate whether motivation, when affected by other factors than monetary reward (e.g., intrinsic factors), also has an impact on listening effort.