Reading times and temporo-parietal BOLD activity encode the semantic hierarchy of language prediction

When poor acoustics challenge speech comprehension, listeners are thought to increasingly draw on semantic context to predict upcoming speech. However, previous research focused mostly on speech material with short timescales of context (e.g., isolated sentences). In an fMRI experiment, 30 participants listened to a one-hour narrative incorporating a multitude of timescales while confronted with competing resynthesized natural sounds. We modeled semantic predictability at five timescales of increasing context length by computing the similarity between word embeddings. An encoding model revealed that short informative timescales are coupled to increased activity in the posterior portion of superior temporal gyrus, whereas long informative timescales are coupled to increased activity in parietal regions like the angular gyrus. In a second experiment, we probed the behavioral relevance of semantic timescales in language prediction: 11 participants performed a self-paced reading task on a text version of the narrative. Reading times sped up for the shortest informative timescale, but also tended to speed up for the longest informative timescales. Our results suggest that short-term dependencies as well as the gist of a story drive behavioral processing fluency and engage a temporo-parietal processing hierarchy.


Introduction
In natural speech processing, rich semantic context guides a listener's expectation on upcoming speech. At the same time, not all context is meaningful (e.g., after a slip of the tongue) or relevant (e.g., after a change of subject). Here, we ask how the human brain orchestrates the multitude of semantic timescales underlying natural speech to build up predictions on upcoming speech. Hasson, Chen & Honey (2015) found that larger timescales of speech like paragraphs are processed in higher cortical areas but short timescales like words in lower cortical areas. As the predictive coding model of Friston (2009) proposes that predictions are fed back from higher to lower cortical areas, we hypothesized that listeners exploit the semantic timescales of speech to inform speech prediction. On the neural level, we expected that the timescales of semantic prediction are organized along an auditory dorsal processing hierarchy. Additionally, we expected predictive timescales to propel behavioral ease of language processing. The present study investigated the neural underpinnings of semantic prediction in an fMRI listening task and the behavioral relevance of different semantic timescales in a self-paced reading task.

Participants
Sixty-three participants (18-78 years) took part in the fMRI listening study. Here, we analyzed a subset of 30 younger participants (16 female, 18-31 years). Another

110
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 11 participants (9 female, 20-27 years) took part in the self-paced reading study. All participants were righthanded healthy German native speakers.

Materials and Design
Our stimulus set consisted of a spoken and a written version of the same narrative: We split a one-hour audio recording of a natural narrative told by a female German speaker in eight blocks and converted the spoken narrative to text. The narrative contained 9,446 words (4,451 content words).
In the fMRI study, participants listened to all eight blocks of the narrative embedded in a competing stream (SNR 0 dB) of resynthesized natural sounds. In the self-paced reading study, participants read the narrative word-by-word on a noncumulative, left-aligned display and pressed a button to reveal the next word. Participants were presented with four randomly chosen blocks. In both experiments, each block was followed by three multiple-choice questions on the plot of the story.

Data Acquisition and Analysis
Operationalization of semantic predictability. We used an established stimulus set manipulating the semantic predictability of sentence-final keywords (Erb et al., 2012) to validate that the similarity of word embeddings (i.e., vector representations of meaning; Mikolov et al., 2013) is a measure of semantic predictability. Keywords of predictable sentences correlated stronger with the average vector of all preceding words when compared to unpredictable keywords (p < 0.0001; Figure 1). Figure 1: Correlations between average context and keyword embeddings differentiate predictable (green) from unpredictable sentences (orange); *** p < 0.0001.
We modeled semantic predictability at five timescales corresponding to a logarithmic increase in context length (i.e., 1-24 words) by computing the similarity between the embedding of each content word in the story and each timescale's average word embedding. FMRI listening study. During the listening task, we acquired continuous whole brain 3-T fMRI data (2.5 mm isotropic voxels, TR = 947 ms, TE = 28 ms). Initially, we calculated the intersubject correlation of the bloodoxygen-level dependent (BOLD) signal (Nastase et al., 2019) across the whole cortex to determine brain regions consistently engaged in speech processing. All further analyses were limited to those parcels (Glasser et al., 2016) along the temporo-parietal pathway with at least 60 % of vertices in the top 30 % of correlation coefficients. Next, we used vertex-wise ridge regression within a fourfold cross-validation scheme (6 training blocks, 2 testing blocks) to project the BOLD signal onto the semantic timescales of speech prediction. Individual best-timescale maps were derived from the encoding model, smoothed and submitted to a group-level cluster-permutation test.
Behavioral self-paced reading study. Like in the fMRI data analysis, we used ridge regression to project reading times onto the semantic timescales of speech prediction.

Results
On the behavioral level, participants read faster when the shortest and-less consistently-the longest timescales were more predictive of a presented word. However, predictive power of medium timescales did not affect reading times (Figure 2). On the whole cortex, the intersubject correlation showed consistent modulation across participants for those brain areas implicated in the language network ( Figure 3A). Along the auditory dorsal pathway, we found two distinct clusters (p < 0.0001): Increased activity in the posterior portion of superior temporal gyrus was coupled to short timescales informative of the next word, whereas parietal regions like the temporoparietal junction and angular gyrus were most responsive to predictive long timescales ( Figure 3B).

Conclusions
In this study, we showed that the timescales of semantic prediction are organized along an auditory dorsal processing hierarchy; posterior temporal regions code for concrete and temporally close semantics predictive of speech, whereas parietal convergence zones code for abstract and temporally more distant semantics. Reading times suggest that short-term dependencies (~ 0.5 s) as well as the gist of a story (~ 25 s) drive behavioral processing fluency. Next, we will use a measure of semantic predictability fine-tuned to the unique hierarchical structure underlying the context of each word by incorporating a deep neural network trained to determine the probability of an upcoming word given the semantics at a timescale.