Children's processing of written irony: An eye-tracking study

Ironic language is challenging for many people to understand, and particularly for children. Comprehending irony is considered a major milestone in children's development, as it requires inferring the intentions of the person who is being ironic. However, the theories of irony comprehension generally do not address developmental changes, and there are limited data on children's processing of verbal irony. In the present pre-registered study, we examined, for the first time, how children process and comprehend written irony in comparison to adults. Seventy participants took part in the study (35 10-year-old children and 35 adults). In the experiment, participants read ironic and literal sentences embedded in story contexts while their eye movements were recorded. They also responded to a text memory question and an inference question after each story, and children's levels of reading skills were measured. Results showed that for both children and adults comprehending written irony was more difficult than for literal texts (the "irony effect") and was more challenging for children than for adults. Moreover, although children showed longer overall reading times than adults, processing of ironic stories was largely similar between children and adults. One group difference was that for children, more accurate irony comprehension was qualified by faster reading times whereas for adults more accurate irony comprehension involved slower reading times. Interestingly, both age groups were able to adapt to task context and improve their irony processing across trials. These results provide new insights about the costs of irony and development of the ability to overcome them.


Introduction
In the novel Harry Potter and the Half Blood Prince (Rowling, 2005, pp. 327-328), Harry is leaving from the Weasley house and Mrs. Weasley says to him: "Promise me you will look after yourself… stay out of trouble…" Harry responds to her: "I always do Mrs. Weasley. I like a quiet life, you know me." Mrs. Weasley chuckles to his answer, and anyone familiar with Harry Potter knows that his life is far from quiet, and he is not really meaning what he is saying. In fact, Harry is being ironic.
Children, like adults, encounter ironic language in their daily lives. Ironic language is present in many contexts, from family conversations (Pexman, Zdrazilova, McConnachie, Deater-Deckard, & Petrill, 2009;Recchia, Howe, Ross, & Alexander, 2010) to literature, as in the example above from the Harry Potter book series. However, previous research has demonstrated that irony is challenging for children to understand (e. g., Capelli, Nakagawa, & Madden, 1990;Dews et al., 1996;Hancock, Dunham, & Purdy, 2000;Harris & Pexman, 2003). Although challenging, comprehension of irony is an important skill for children. For example, comprehending irony is considered a major milestone in the development of children's social cognition, as it goes hand in hand with the development of understanding others' beliefs, intentions, and attitudes (Peterson, Wellman, & Slaughter, 2012; see also Bosco, Tirasa, & Gabbatore, 2018;Del Sette, Bambini, Bischetti, & Lecce, 2020). Thus, it is not surprising that deficits in irony comprehension have been shown to be related, for example, to feelings of social exclusion (Kim & Lantolf, 2018).
Most of the previous studies on children's irony comprehension have presented ironic and literal language examples using illustrated stories (e.g., Banasik-Jemielniak & Bokus, 2019;Filippova & Astington, 2008;Köder & Falkum, 2021;Winner & Leekam, 1991) and puppet shows (e. g., Climie & Pexman, 2008;Harris & Pexman, 2003). These studies have helped map the development of children's irony comprehension, but the issue of how children process and comprehend written irony remains ☆ This paper is a part of special issue ''20 Years of XPrag''. underexplored. Children's processing has been examined in a few studies using variations on the visual world paradigm, where children's eye gaze and/or reaching to response objects is tracked (e.g., Climie & Pexman, 2008;Köder & Falkum, 2021;Nicholson, Whalen, & Pexman, 2013;Whalen, Doyle, & Pexman, 2020). This paradigm, however, does not capture the very earliest moments of children's irony processing, and does not afford the fine-grained processing information that has been acquired in reading time studies with adults (see e.g., Olkoniemi & Kaakinen, 2021 for review). As such, the issue addressed by Noveck, Bianco, and Castry (2001), of the costs (and benefits) of figurative language processing compared to nonfigurative language, has been underexplored in the case of children's irony appreciation. The current paper represents an important step toward addressing that issue, investigating children's ironic vs literal language processing with greater precision than has previously been achieved.
It has been stated that, since children are not proficient readers, it would not be feasible to use reading time methodologies to study children's written irony comprehension (Nicholson et al., 2013;Pexman, 2008). In the current study, however, we challenged this assumption, using eye-tracking methodology to investigate comprehension and processing of written irony by 10-year-old-children and comparing their performance to that of adults. With eye-tracking methodology it is possible to tap into the detailed time-course of processing written language during reading (Rayner, 1998). We further capitalized on the facts that: Finnish children have high reading literacy (Mullis, Martin, Foy, & Hooper, 2017), the reading performance of 10-year-olds is expected to be similar to that of adults for the reading of literal language (see Blythe & Joseph, 2011, for a review on reading development), and 10-year-olds are still developing irony comprehension skills (Glenwright & Pexman, 2010;Nicholson et al., 2013;Pexman, Glenwright, Krol, & James, 2005).
Children's increasingly accurate irony comprehension across middle childhood has been linked to a number of cognitive and linguistic skills. Filippova and Astington (2008) characterized irony comprehension as a form of advanced theory of mind reasoning. While several studies have found that second-order theory of mind is important to irony appreciation (Filippova & Astington, 2008;Happé, 1993;Hayashi & Ban, 2021;Massaro, Valle, & Marchetti, 2013;Nilsen, Glenwright, & Huyder, 2011;Sullivan, Winner, & Hopfield, 1995;Winner & Leekam, 1991), other studies have failed to find evidence for this relationship suggesting that second-order theory of mind is not a necessary condition for irony comprehension (Angeleri & Airenti, 2014;Bosco & Gabbatore, 2017;Massaro, Valle, & Marchetti, 2014;Panzeri, Giustolisi, & Zampini, 2020). Children's irony comprehension has also been linked to executive function skills including cognitive flexibility (Zajączkowska & Abbot-Smith, 2020), inhibitory control (Caillies, Bertot, Motte, Raynaud, & Abely, 2014), and working memory (Godbee & Porter, 2013). Recently, Mazzarella and Pouscoulous (2020) suggested that children's irony comprehension depends on their emerging vigilance toward deception, or epistemic vigilance. Epistemic vigilance affords children the capacity to evaluate a speaker's honesty and distinguish lies from ironies. Certainly, these various cognitive skills are related to each other and to children's developing language abilities and their relationships (and directionality of those relationships) are yet to be determined.
In the present study we investigated irony comprehension and processing of 10-year-old children, and based on previous studies we expected children of this age to be proficient in their irony comprehension, but still less accurate in comparison to adults. However, the previous studies have not presented children with written irony, but rather irony in other modalities, and manner of presentation could influence the results. One could argue that narrated stories, which are used often in the experiments with children, would be easier for children to understand than written stories as the latter lack ironic tone of voice to aid the comprehension process. Indeed, some studies have shown that intonation can facilitate children's comprehension of ironic phrases (Ackerman, 1983;Capelli et al., 1990). However, the findings related to ironic tone of voice are contradictory, as some studies have found no benefit of intonation for children's irony comprehension (Köder & Falkum, 2021;Winner & Leekam, 1991) or for that of adults (Bryant & Fox Tree, 2005; see also Rivière, Klein, & Champagne-Lavau, 2018). Furthermore, children have shown high levels of comprehension of ironic materials presented only with a neutral tone of voice (Banasik-Jemielniak & Bokus, 2019). On balance, these results suggest that irony comprehension accuracies for written irony may be comparable with those found in previous studies.

Development of children's reading skill
Children's processing of written irony will necessarily be influenced by their reading skills. Reading skills can be thought to develop through stages, in which children first process words via smaller units (e.g., Ehri, 1995Ehri, , 2005. This is referred to as decoding. With practice, children then start using larger units, which gives way to more comprehensive reading skills (e.g., Perfetti, 2007). It has been shown that Finnish children develop their decoding skills early on (Seymour, Aro, & Erskine, 2003) and that their literacy skills are high (e.g., Mullis et al., 2017) compared to children in many other linguistic communities. Generally, the reading performance of 10-year-olds is expected to be similar to that of adults for the reading of literal, non-ironic language, although children are expected to be generally slower due to making more and longer fixations, shorter saccades, and more regressions (see Blythe & Joseph, 2011, for a review of eye movement research in developing readers).
With regard to reading comprehension, Zargar, Adams, and Connor (2020) showed that while 3rd grade children (approx. 8-9 years of age) did not show differences in reading times of plausible and implausible words in sentence context, 4th graders (approx. 9-10 years of age) spent more time reading implausible words. This finding is indicative of active comprehension monitoring which emerges around 9 years of age. Indeed, several studies have shown that around 10 years of age, readers can detect inconsistencies in a text, although this varies between the tasks and readers (e.g., Connor et al., 2015;Oakhill, Hartt, & Samols, 2005;van der Schoot, Reijntjes, & van Lieshout, 2012). This is also seen in eye movement patterns, as Vorstius, Radach, Mayer, and Lonigan (2013) reported that 5th graders (on average 11 years of age) had larger rereading times when a sentence was more difficult. Despite this active comprehension monitoring, however, children may still not correctly answer explicit comprehension questions, such as whether the sentence is plausible.
As for strategic reading, Häikiö, Heikkilä, and Kaakinen (2018) showed that 7-to 8-year-old Finnish children did not tend to reread past sentences when reading for comprehension. In contrast, Kaakinen, Lehtola, and Paattilammi (2015) observed that 7-8-year-old children's first-pass reading was slower than that of 9-10-year-old children, but both age groups showed similar reading times for later rereading when reading for comprehension. When compared to adults, 9-10-year-olds spent more time during first-pass reading but there were no differences in probabilities or durations of later look-backs. The reading patterns of younger and older readers were modulated by the task. When reading to answer a "why" question (as opposed to reading for comprehension), younger children focused more on the first-pass reading while older children and adults made more look-backs. The task effect indicates that children were able to modify their reading behavior on the basis of task demands. Furthermore, the difference between younger and older children with regard to the task effect suggests a shift toward more strategic reading. Häikiö et al. (2018) hypothesized that this is due to younger children still being in the process of perfecting their decoding skills while the older children already use higher order comprehension strategies (see also Perfetti, 2007). Taken together, while reading development does continue after 10 years of age (witnessed e.g., in reading speed, Häikiö, Bertram, Hyönä, & Niemi, 2009;Kaakinen et al., 2015), the previous findings imply that around 10 years of age, many readers have developed comprehension and strategic reading skills required for resolving the irony in short stories.

Processing of written irony
Currently we do not know how children resolve written irony. In general, it is assumed that there may be costs associated with processing of figurative expressions, as contrast between expectations and literal meaning needs to be resolved (e.g., Grice, 1975;Noveck et al., 2001). For irony, theories assume that when the ironic phrase is not more familiar than its literal counterpart (e.g., the familiar irony "Yeah right"; see Giora, 2003) and is not supported by preceding context (i.e., there are no cues about forthcoming irony) the ironic statement should be harder to comprehend and take more time to process (Gibbs, 1994). Eyetracking studies of processing of verbal irony with adult participants have shown this to be the case (see e.g., Olkoniemi & Kaakinen, 2021 for review). That is, previous eye-tracking studies measuring comprehension accuracies of written ironies have systematically shown ironic statements being harder to comprehend than literal statements (e.g., Au-Yeung, Kaakinen, Liversedge, & Benson, 2015;Kaakinen, Olkoniemi, Kinnari, & Hyönä, 2014;Olkoniemi, Ranta, & Kaakinen, 2016). The irony comprehension accuracies in these studies range between 75% (Kaakinen et al., 2014, Exp 1) and 90% (Olkoniemi et al., 2016). Moreover, processing of irony in text has been shown to take longer than for its literal counterpart (e.g., Filik & Moxey, 2010). This slowdown is typically seen as increased looking back to ironic than literal target phrase when the phrase has been already read once, and increased returns (i.e., look-froms) to context from the ironic than literal target phrase (e.g., Olkoniemi & Kaakinen, 2021). Based on the theories of irony comprehension the slowdown would reflect interpretation process, in which ironic meaning is integrated to the context (e.g., Grice, 1975;Spotorno & Noveck, 2019). This process is suggested to be dependent on a reader recognizing the discrepancy between literal meaning and the context and/or realizing that a protagonist criticizes a previous expectation or belief (e.g., Spotorno & Noveck, 2019;Wilson, 2013). However, not all of the rereading is necessarily related to comprehension (Olkoniemi, Johander, & Kaakinen, 2019). Olkoniemi, Johander, and Kaakinen (2019) showed that higher probability to lookfrom the ironic target phrase to context was associated with poorer irony comprehension. However, these results should be interpreted with caution, as Olkoniemi et al. used a masking paradigm in their study that disrupted normal reading, which might affect generalizability of the results to normal reading.
Moreover, previous studies have shown trial effects: evidence that the processing of ironic phrases changes across an experimental session (Olkoniemi et al., 2016;Olkoniemi, Johander, & Kaakinen, 2019;Olkoniemi, Strömberg, & Kaakinen, 2019;Spotorno & Noveck, 2014). For example, in the study by Spotorno and Noveck (2014) participants read short stories containing ironic and literal sentences at their own pace, one sentence at a time. They found that participants showed higher reading times for ironic than literal target sentences in the beginning of the experimental session, but the effect wore off toward the end of the experiment (they call this Early-Late effect). They also found the same effect for the sentence following an ironic target phrase (this is typically called the spillover region in eye tracking studies). These results suggest that when the reader repeatedly encounters ironic statements, there will be an expectation of forthcoming irony, making the processing of ironic statements easier and hence more similar to that of literal statements. Thus, readers adjust to the experimental context and the costs of irony are attenuated.
Currently, we do not know how the costs of irony that have been observed for adult processing times (overall and across trials) might translate to children. Most theories of irony comprehension do not address developmental changes. However, theories that take individual differences into account, such as the parallel constraint-satisfaction framework (Pexman, 2008) and the predictive coding theory of irony (Fabry, 2021), could be extended to make the prediction that as irony comprehension ability is still developing in children, they should show more difficulty in processing and comprehending irony than adults. The increased processing effort could be seen as extra effort that children would need to invest in integrating ironic meaning to text context, in comparison to adults. This prediction is admittedly not very precise, but it is consistent with previous findings from studies on children's irony comprehension.

Present study
We recruited 4th grade Finnish elementary school children (10-yearolds) and compared their comprehension and processing of written irony to that of adults. Previously, it has been suggested that processing of written irony cannot be studied with children, because their reading skill is not sufficiently developed (Nicholson et al., 2013). However, Finnish children develop decoding skills early (Seymour et al., 2003) and have high reading literacy (Mullis et al., 2017). In addition, as the reading performance of 10-year-olds is similar, albeit slower, to adults for the reading of literal, non-ironic language, it is likely that effects of written irony could be observed in a sample of Finnish 10-year-old children.
We expected that children would show similar reading patterns, and similar comprehension of the literal target statements, to that of adults. At the same time, we expected that children would struggle with their irony comprehension, as indexed by lower comprehension scores for ironic than literal statements. We expected that children would show above guessing-level irony comprehension accuracy, whereas adults were expected to show near-ceiling performance (e.g., Capelli et al., 1990;Olkoniemi & Kaakinen, 2021). Successful comprehension should be seen as increased rereading of ironic statements in comparison to their literal counterparts. For children, the increase in rereading time for ironic statements should be more pronounced than for adults. Alternatively, it could also be that children do not recognize the ironic intent at all; if so, there should be no difference in processing ironic statements and their literal counterparts.
Finally, it should be noted that the materials used in the present experiment were designed for children. Thus, the materials might be too simple for the adults (see e.g., Schroeder et al., 2022), which might affect the results for adults. In the context of the present experiment this might mean that the irony effects in comprehension and processing times are not particularly large and can only be seen in the beginning of the experiment (i.e., trial effects, or Early-Late effect).

Preregistration
This study's sample size, materials, hypotheses, and planned analyses were preregistered on Open Science Framework (https://osf.io/f zbjn) prior to data collection. In addition, analysis scripts and data are available via OSF: https://osf.io/e2ysx/

Participants
A total of 70 children and young adults participated in the study. The children were 35 4th grade children (15 female, M = 10:4 years, SD = 0:3, range 9:8-10:8) recruited from two classrooms in two different Finnish elementary schools in Varsinais-Suomi region. At the time of testing, they had received approximately three years and three months of formal reading instruction. All the children were native Finnish speakers, had normal or corrected-to-normal vision, and no known reading difficulties.
We assessed children's abilities that have been shown to affect children's irony comprehension: level of language skill (i.e., reading comprehension and technical reading skill), working memory capacity, and empathy skill (e.g., Filippova & Astington, 2008;Godbee & Porter, 2013;Nicholson et al., 2013). We measured these to establish that children participating into the study were within typical range for these skills and the results would reflect normative development, and to facilitate comparison with samples in future developmental studies. Technical reading skill was measured using the word fluency subtest of Lukilasse II (Häyrinen, Serenius-Sirve, & Korkman, 2013), in which the children had to read correctly as many words as possible from a list of 105 words within 120-s time-limit. The average score was 81.80 (SD = 12.74, range 54-104). Reading comprehension was measured using the Maze task (Ronimus, Tolvanen, & Hautala, 2022). The task was comprised of 16 texts with 4 words missing from each text. Based on the textual cues, participants had to choose a correct word for each missing word from four options. The average score was 41.53 (SD = 7.76, range 27-56). Children's working memory capacity was assessed with the Digit Span subtest of WISC-IV (Wechsler, 2010). The average raw score was 13.03 (SD = 2.04, range 10-21). Last, children's empathy skill was assessed using the Index of Empathy for Children and Adolescents (Bryant, 1982) that was translated in Finnish for this experiment (Cronbach's α = 0.64, 95% CI [0.48, 0.80]). The average score was 12.94 (SD = 2.79, range 5-20). Children's parents signed a written informed consent form prior to the experiment, and verbal assent was asked from each child upon arrival to the experiment. The children received candy or stickers for their participation.
In addition, 35 University of Oulu students (27 women, M = 24, SD = 4.52, range 19-38) participated and received a cafe voucher for their participation. Adults only completed the written irony reading task. All the adult participants were also native Finnish speakers and had normal or corrected-to-normal vision. Each participant gave written consent.
The study was conducted in accordance with the Declaration of Helsinki. The Ethics Committee for Human Sciences at the University of Turku approved the study.

Apparatus
Eye movements were recorded using EyeLink Portable Duo, and EyeLink 1000 Plus eye-trackers (SR Research Ltd. Ontario, Canada) at 500 Hz sampling frequency. EyeLink Portable Duo was used with children, and EyeLink 1000 Plus with adults. With Portable Duo, the stimuli were presented on a 17.3" Asus ROG G752V laptop monitor, and participants were seated 60 cm from the screen. With EyeLink 1000 the stimuli were presented on a 24" Asus VG248QE monitor, and due to larger monitor size, they were seated 90 cm from the screen. With both monitors, refresh rate of 120 Hz and a resolution of 1920 × 1080 pixels were used. Chin-and-forehead rest was used to stabilize the head of the participant.

Materials
In the study, each participant read a total of 26 experimental stories on a computer screen (font: Courier New, font size: 27, line height: 3) while their eye movements were recorded. Half of the stories contained literal, and the other half ironic, target phrases. There were literal and ironic versions of each story (26 stories × 2 text types, resulting in 52 experimental stories). Each story consisted of 4-5 sentences, and their length varied between 20 and 41 words (M Words = 29.33, SD Words = 4.53) (see Table 1 for an example). The stories started with context sentences: First, there were 1-2 context sentences that gave general description in which the events of the story happened, and these were the same in both literal and ironic versions (M Words = 9.04, SD Words = 3.63). Second, there was a context sentence that contained critical information (i.e., critical context) that made the following phrase either ironic or literal (for ironic versions M Words = 9.31, SD Words = 3.08; and literal versions (M Words = 9.15, SD Words = 2.88). Context sentences were followed by a target phrase that was the same in both literal and ironic versions of the stories (M Words = 4.24, SD Words = 0.86), but could be interpreted as literal or ironic depending on critical context. The target phrase was followed by a spillover region (see e.g., Rayner & Duffy, 1986) describing who had uttered the target phrase (M Words = 2.27, SD Words = 0.60). Last, every story ended with a neutral sentence describing how the story events ended (M Words = 5.46, SD Words = 1.29). The spillover region and the end of the story sentences were the same between literal and ironic versions. The boys need to hurry to the next lesson. Ironic Beginning Onni and Aleksi are in the school canteen.

Critical Context
Onni gathers a little food on the plate but eats only a small portion of it. Target Phrase "Well, you were hungry," Spillover Region Aleksi says to him. End The boys need to hurry to the to the next lesson.

Questions
Inference Question Did Aleksi think that Onni was hungry? Text Memory Question Were the boys at home?
Another function for the first and last sentences of the story was to prevent the target phrase and spillover region being the first or last sentence of the story, as these are typically read differently. Only one version of each story was shown to the participants. The presentation order of the stories was randomized. Participants' memory for the story context and comprehension of the intended meaning of the target statement were also assessed after reading of each story (examples of the questions are presented in Table 1). For both types of questions, correct answers were counted, and the proportion of correct answers was computed.
In preparation for the main study, two separate rating studies were conducted to test: (1) how familiar the target statements were as ironic in comparison with literal meaning and how natural the experimental stories were, and (2) how accurately ironic intent was comprehended in the experimental items by children. An Internet survey tool was used to collect the data (Webropol, www.webropol.com).
A group of 18 children aged 7-14 (M Age = 10.28, SD Age = 1.81, 5 males, 12 females, and 1 who chose not to answer) participated in the second rating study tapping into the comprehensibility of the experimental items. All the children were native Finnish speakers, and none of them participated in the actual experiment. Proportion of correct answers to comprehension questions for literal target phrases was higher (M = 0.92, SD = 0.27) than for ironic target phrases (M = 0.63, SD = 0.48). However, the accuracy scores were in expected range based on previous studies with children (e.g., Nicholson et al., 2013).

Procedure
Participants were tested individually. Upon arrival, participants were informed that the experiment assessed reading. The specific nature of the experiment was explained to participants when the experiment was over. Before the reading task, the eye-tracking system was introduced to each participant, and the experimental procedure was explained. The eye-tracker then was set up and calibrated using a 9point calibration screen. Participants were instructed to read each story for comprehension at their own pace. Each story was presented on one screen. Participants were told to press the space bar on the keyboard when they finished reading the paragraph. After each story, two questions were presented one at a time. Participants answered the questions by pressing designated "Yes" and "No" buttons on the keyboard. After the participant answered the second question, the next story was presented. For children, the reading task was followed by the Digit Span, word fluency test, and Index of Empathy for Children and Adolescents. The experimental sessions lasted about 30-50 min. The reading comprehension task (i.e., the Maze task) was completed by children in their class group after the eye movement registrations.

Preprocessing of data
Fixations shorter than 50 ms were either merged with a nearby fixation (if the distance between the fixations was <1 • ) or removed from the data. Sentence-level measures were calculated for the target phrase, the critical context, and the spillover region from the eye movement data (Hyönä, Lorch, & Rinck, 2003). First-pass reading time is the summed duration of fixations made within the sentence during first reading of the sentence. First-pass reading time was further divided into forward-fixation time (i.e., summed duration of fixations landing on unread parts of the sentence during first-pass reading) and number of first-pass rereading fixations (i.e., summed number of fixations made reinspecting a sentence before moving on in the text). Number of look-back fixations is the summed number of fixations returning to the sentence from other parts of text made after the first-pass reading, and number of look-from fixations is the summed number of look-back fixations that were initiated from the sentence. All the measures were analyzed for the target phrase, and first-pass reading time was analyzed for the spillover region. As for the critical context, the content of the critical context sentences was not tightly matched between Story Types (literal vs. ironic), which made reliably comparing reading times not possible, consequently, probability to look-back (binomial measure) was analyzed.
The exclusion criterion for participants' comprehension accuracy was that they should have more than one correctly comprehended ironic item. This criterion was chosen instead of using data-driven cut-off point (e.g., ±2SD), as this would have left participants with only a single correctly comprehended item in the data. Seven participants (2 children and 5 adults) had below acceptable level irony comprehension accuracy, and their data were excluded from the analyses, resulting in a total of 63 participants (33 children and 30 adults). Observed means and standard deviations of the measures are presented in Table 2.

Analyses
The data were analyzed with linear or generalized linear mixedeffects models (Baayen, Davidson, & Bates, 2008) using R statistical software (Version 4.1.2; R Core Team, 2021), except for the dependent variables with the problem of zero-inflation (i.e., number of first-pass rereading fixations, number of look-back fixations, number of look-from fixations, and probability to look-back to critical context) which were analyzed using glmmTMB package (Brooks et al., 2017).
Separate models were built for each eye movement measure for the different text regions (target sentence, critical context, and spillover region), and for comprehension and text memory question accuracies. Story Type (Literal vs. Irony) and Age Group (Adults vs. Children) were fitted in the models as successive difference contrast coded fixed effects. Trial Order (i.e., order in which the items were presented) was fitted in models as a centered continuous variable. Last, effect of Item-Comprehension (i.e., whether intended meaning of the target phrase was comprehended Correctly vs. Incorrectly) was added to the models as a treatment coded fixed effect variable, in which correct answer was set as baseline. Thus, the model intercepts, in which Item-Comprehension is fitted, assess reading in which intended meaning of the target phrase was correctly comprehended (see Schad, Vasishth, Hohenstein, & Kliegl, 2020 for review of contrasts), and not reading time over correctly and incorrectly comprehended items (see Ferreira & Yang, 2019 for discussion about comprehension). The Item-Comprehension measure was fitted only in models concerning reading. The text memory question accuracy values were close to ceiling across the conditions, consequently, trial order was not fitted into the respective model to keep them as simple as possible. In the rest of the models, we explored three-way interactions that included Story Type. Participants and items were fitted into the models as random intercepts, and Story Type and Age Group were fitted as random slopes. If a model failed to converge, it was trimmed top-down starting with removing covariance between random effects (Barr, Levy, Scheepers, & Tily, 2013;Brauer & Curtin, 2018). The reading time measures were skewed and consequently logarithmically transformed (best fitting transformation was selected for the data using Box-Cox Power transform). As for the zero-inflated models, the predictors to be fitted in the zero-inflation part of the models were determined by comparing models with each other. We started with a model with only an intercept, followed by models in which we added Age Group (as it was the main cause of zero inflation), Age Group ✕ Story Type, and finally all the predictors and their three-way interactions. The model providing the best fit to the data (lowest AIC value) was selected.
The exact degrees of freedom were difficult to determine for the tand z-statistics estimated by mixed-effects models, leading to a problem determining exact p-values (Baayen et al., 2008). Consequently, degrees of freedom or p-values are not reported; statistical significance at the 0.05 level is indicated by values of t and z > |1.96|. For the sake of brevity, only significant main effects of Story Type or Age Group, and interactions involving Story Type are reported. Significant main effects are reported in text and for interactions the model estimates, and their 95% CIs, are illustrated in figures. All the model summaries are reported in Appendix A Tables A1 -A9.

Changes to the statistical analyses in comparison to pre-registration
There were some changes that we needed to make for the preregistered analyses. First, in the rereading measures (i.e., first-pass rereading time, look-back time, look-from time, and probability to look-back to critical context) there were more zero-values than expected. This made it problematic to analyze the data using linear or traditional generalized linear mixed-effect models (see e.g., Baayen et al., 2008). To be able to take this into account, we analyzed the number of fixations instead of rereading times using glmmTMB package (Brooks et al., 2017) in order to be able to analyze count data (e.g., number of fixations) that have higher number of zero values. In sentence-level processing fixation times and number of fixations are tightly related (in our sample, correlations between reading times and number of fixations ranged between r = 0.96-0.98). Second, while preregistering the experiment we failed to mention the analysis of trial effects, although it was likely that they would occur with adults (see Olkoniemi & Kaakinen, 2021 for review). Consequently, in order to reliably compare reading of irony between adults and children trial effects were taken into account. Last, as comprehension accuracies were lower than expected (see Table 2), we decided to take the Item-Comprehension into account in the models. This solution has two advantages: First, we are able to explore how successful vs. non-successful comprehension of the intended meaning of the target phrase is reflected in the reading data. This has not been previously possible, as all the previous experiments on the processing of written irony have been conducted with adults. Second, this allowed us to use all the items, thus, strengthening the models. Finally, as adults showed variance in their irony comprehension accuracies (see Table 2), we were also able to compare adults to children in the models concerning relationships between irony comprehension accuracy and reading.

Comprehension and text memory questions
The descriptive statistics for proportions of correct answers on text memory and inference questions are presented in Table 2. Overall, we observed a ceiling effect in text memory question performance, suggesting that all participants were attentive to the task. Consequently, the model for accuracy in answering text memory questions did not show effects of Story Type (literal vs. irony), Age Group (adults vs. children), or their interaction (see Table A1).
The model for comprehension question accuracy showed two main effects (see Table A2). First, there was an effect of Story Type (literal vs irony), indicating that the ironic phrases were harder to comprehend than their literal counterparts, β = − 2.16, 95% CI [− 2.34, − 1.97], z = − 23.09. Second, there was an effect of Age Group, indicating that children showed lower comprehension accuracy than adults, β = − 2.30, 95% CI [− 3.14, − 1.46], z = − 5.38. These effects were qualified by a three-way interaction between Story Type, Age Group, and Trial Order, β = − 0.67, 95% CI [− 1.03, − 0.32], z = − 3.70. This interaction indicates that adults had lower proportion of correct answers to comprehension questions for ironic than literal target statements in the beginning of the experiment, but that the effect wore off toward the end of the experiment (see Fig. 1). Similarly, children showed lower accuracy in ironic than literal target phrases in the beginning of the experiment but showed less improvement toward the end than adults. Overall, children showed lower comprehension accuracy scores than adults, and the difference was larger for ironic than literal phrases.

Reading of the ironic and literal stories
The model for first-pass reading time on the target phrase revealed an effect of Age Group (see Table A3). This effect indicates that children showed longer first-pass reading times than adults, when the intended meaning was correctly comprehended, β = 0.81, 95% CI [0.66, 0.95], t = 11.14. The model did not show effects of Story Type (literal vs. irony). Similarly, the model for forward-fixation time on the target phrase revealed an effect of the Age Group (see Table A4), indicating that children showed longer forward-fixation times than adults, β = 0.69, 95% CI [0.56, 0.83], t = 10.31. This model did not show effects related to Story Type either.
The model for number of first-pass rereading fixations on the target phrase showed an effect of Age Group (see Table A5). This indicates that children did more first-pass rereading than adults, when the fixations between literal and ironic target phrases (see Fig. 2). However, when the comprehension question was answered incorrectly, adults did more first-pass rereading for ironic than literal target phrases, whereas children showed no difference between Story Types. In addition, zero-inflation model for first-pass rereading fixations showed an effect of Age Group, indicating that children had a lower  probability of making exactly zero first-pass rereading fixations than adults, β = -1.03, 95% CI [-1.67, -0.39], z = -3.17. The model for number of look-back fixations on the target phrase revealed a three-way interaction between Story Type, Age Group, and Trial Order, β = 0.50, 95% CI [0.24, 0.76], z = 3.30 (see Table A6). This interaction indicates that, in the beginning of the experiment, adults made higher numbers of look-backs to ironic than literal target phrases, but the effect turned the other way around toward the end of the experiment (see Fig. 3). Children showed an opposite trend making higher numbers of look-backs to literal than ironic target phrases in the beginning of the experiment, but the effect turned the other way around toward the end of the experiment. In addition, the zeroinflation model for look-back fixations on the target phrase showed a main effect of Age Group, indicating that children had a higher probability of making exactly zero look-back fixations than adults, β = 2.09, 95% CI [1.80, 2.39], z = 13.82 The model for number of look-from fixations made from the target phrase showed an interaction between Story Type and Trial Order, β = − 0.27, 95% CI [-0.45, -0.08], z = -2.81. This indicates that in the beginning of the experimental session readers made higher numbers of look-froms from the ironic than literal target phrase, but the effect turned the other way around toward the end of the experiment (see Fig. 4A). In addition, the zero-inflation model for look-from fixations from the target phrase showed a main effect of Age Group, indicating that children had a higher probability for making exactly zero look-back fixations than adults, β = 1.74, 95% CI [1.43, 2.05], z = 11.08.
The model for first-pass reading time on the spillover region showed an effect of Age Group (see Table A8). This effect indicates that children showed longer reading times in comparison to adults, when the intended meaning of the target phrase was correctly comprehended, β = 0.68, 95% CI [0.54, 0.83], z = 9.32. The model showed no main effect of Story Type, nor interactions related to Story Type. Last, the model for probability to look-back to critical context revealed two main effects (see Table A9). First, there was an effect of Story Type, indicating that readers had a lower probability of look-backs to critical context of ironic than literal stories when the intended meaning of the target phrase was correctly comprehended, β = -0.34, 95% CI [-0.68, -0.003], z = -1.98. Second, there was an effect of Age Group, indicating that children were less likely to look-back to critical context from subsequent text parts than adults (when the target phrase was comprehended correctly), β = -2.58, 95% CI [-3.37, -1.79], z = -6.39. Moreover, the model revealed an interaction between Story Type and Trial Order, β = -0.68, 95% CI [-1.03, -0.32], z = -3.76. This indicates that in the beginning of the experiment readers showed slightly higher probability of looking back to critical context of ironic than literal stories, but this effect turned the other way around toward the end (see Fig. 4B).

Relationships between reading of ironic texts and irony comprehension
The results showed that there was large variance in irony comprehension accuracies between participants, especially in children. In addition, successful vs. non-successful comprehension of ironic and literal phrases was related to processing of irony. However, we do not know how individuals' irony comprehension ability is related to processing of ironic meaning. Thus, we explored the question of whether better irony comprehenders exhibit different processing patterns for ironic stories. Relationships between reading of ironic stories and irony comprehension were analyzed by building separate models in which only reading data of correctly interpreted ironic items were considered. As a new variable, Comprehension Accuracy of the participants was fitted into the models as a continuous variable. Before fitting Comprehension Accuracy, the measure was scaled per Age Group as good vs. poor comprehension accuracy was different for children and adults (see Table 2). For the sake of brevity, only significant effects involving Irony Comprehension Accuracy and interactions between Age Group and Irony Comprehension Accuracy are reported. All the final models are reported in the Appendix B Tables B1 -B7.
The model for first-pass reading time on ironic target phrases did not show an effect of Irony Comprehension Accuracy (see Table B1). However, the model revealed an interaction between Irony Comprehension Accuracy and Age Group, β = − 0.24, 95% CI [− 0.38, − 0.10], z = − 3.41. This interaction indicates that when adults' irony comprehension accuracy increased their first-pass reading time also increased, but for children the opposite was true (see Fig. 5A). Also, the model for forwardfixation time on the ironic target phrases showed no effect of Irony Comprehension Accuracy (see Table B2), but revealed an interaction between Irony Comprehension Accuracy and Age Group, β = − 0.22,  Fig. 5B). Similar to previous models, the model for number of first-pass rereading fixations on ironic target phrase did not show an effect of Irony Comprehension Accuracy (see Table B3), but revealed an interaction between Comprehension Accuracy and Age Group, β = -0.42, 95% CI [-0.79, -0.04], z = -2.17 (see Fig. 5C).
The model for number of look-back fixations to ironic target phrase did not show an effect of Irony Comprehension Accuracy or an interaction between Comprehension Accuracy and Age Group (see Table B4). The model for number of look-from fixations made from the ironic target phrase did not show an effect of Irony Comprehension Accuracy or an interaction between Comprehension Accuracy and Age Group (see Table B5). The model for first-pass reading time on the spillover region following ironic target phrase did not show a main effect of Irony Comprehension Accuracy (see Table B6). However, the model revealed an interaction between Comprehension Accuracy and Age Group, β = − 0.29, 95% CI [− 0.44, − 0.14], z = − 3.85. This indicates that when adults' irony comprehension accuracy increased their first-pass reading time on spillover region also increased, but for children the opposite was true (see Fig. 5D). Last, the model on the probability to look-back to critical context of the ironic stories did not show a main effect of Irony Comprehension Accuracy nor interaction between Comprehension Accuracy and Age Group (see Table B7).

Discussion
For the first time, by employing eye-tracking, we explored how children process and comprehend written irony in comparison to adults. We expected 10-year-old children to show lower comprehension of intended meaning of ironic but not literal statements when compared to adults. Moreover, we expected adults to show more re-reading of ironic than literal phrases, and that this irony effect would be most apparent in the beginning of the experiment and would wear off toward the end of the experiment. We expected children to struggle in their irony comprehension more than adults and assumed that this struggle would be associated with more pronounced rereading of the ironic stories compared to adults.

Processing of irony
As expected, adults showed increased later rereading of ironic phrases in comparison to literal, and they also made more returns to already read text parts (i.e., look-froms) from the ironic than literal phrase. It seems that these returns were predominantly made to critical context of ironic stories, as the probability to look-back to the critical context of the ironic stories was higher when compared to literal. Moreover, these effects were mediated by trial effects. In other words, increased rereading of ironic stories was seen in the beginning of the experiment, but this effect turned other way around toward the end. These results for adults replicate previous findings of eye-tracking studies on processing written irony (see Olkoniemi & Kaakinen, 2021 for review).
Like adults, children also made higher numbers of returns from the ironic than literal phrase and showed higher probability to look-back to the critical context of the ironic stories. Surprisingly, these effects did not differ between adults and children, and children also showed trial effects for irony. This suggest that both children and adults needed extra processing in order to integrate ironic target phrases with context, consistent with traditional theories on irony comprehension (e.g., Grice, 1975). However, repeated exposure to irony seemed to reduce this need. This trial effect result is similar to that of previous studies with adults (Olkoniemi & Kaakinen, 2021;Spotorno & Noveck, 2014), and suggests that children also become sensitized by the repeated exposure of irony and are able to some extent to improve their performance. However, this sensitization was not similar in all the measures, as trial effect for irony was only observed for adults in look-back fixations to ironic phrase. It should be also noted that the improvement in reading times was not a clear indication of better comprehension, as children showed a relatively minor increase in comprehension accuracy during the course of the experiment. This might be due to the fact that, overall, children's probability to do later rereading was much smaller than for adults.
For adult readers, but not for children, there was a relationship between comprehending the intended meaning of the target phrase and early and later rereading of that phrase. This suggests that adults may have higher sensitivity for difficulties in categorizing the target phrase as being ironic or not. This could be interpreted, as we had hypothesized, as reflecting children's insensitivity for intended ironic meaning. However, it is also possible that reading for comprehension is overall more effortful for children (e.g., Kaakinen et al., 2015). Because of this, it might be that the irony effect is difficult to detect in children's reading, as it may be masked by the higher general effort that children need to invest for the reading task. This possibility is at least partly supported by our data. First of all, children who had higher irony comprehension accuracy showed faster first-pass reading times than those with poorer comprehension accuracy. In other words, those children who are better at categorizing phrases as being ironic need not invest as much processing effort as others. The opposite was true for adults who showed higher comprehension accuracy, suggesting that for adults reading in general is very efficient and irony-related processing is clearly evident in additional processing. This is in line with previous studies on adults' written irony processing (see Olkoniemi & Kaakinen, 2021, for a review).
For both children and adults, better irony comprehension was not associated with the number of look-froms made from the ironic target phrase. This contrasts with previous findings by Olkoniemi, Johander, and Kaakinen (2019), who found in their eye-tracking study on processing written irony that the probability of making a look-from from the target phrase had a negative correlation with comprehension accuracy. However, differences in the tasks used (moving window paradigm was used by Olkoniemi, Johander, & Kaakinen, 2019, which prevented natural reading) and the age groups (Olkoniemi, Johander, & Kaakinen, 2019 studied adults only) makes comparing the results of these studies difficult. It is evident that more research is needed to explore the role of later rereading on irony comprehension.

Fig. 5.
Model estimates for the interactions between irony comprehension accuracy and age group. Note. Panel A: The interaction between Irony Comprehension Accuracy and Age Group on the first-pass reading time on the correctly interpreted ironic target phrase. Panel B: The interaction between Irony Comprehension Accuracy and Age Group on the forward-fixation time on the correctly interpreted ironic target phrase. Panel C: The interaction between Irony Comprehension Accuracy and Age Group on the number of first-pass rereading fixations on the correctly interpreted ironic target phrase. Panel D: The interaction between Irony Comprehension Accuracy and Age Group on the first-pass reading time on the spillover region following the correctly interpreted ironic target phrase. In all the panels y-axis values are model estimates that are back-transformed from log-values, and shaded areas represent 95% Confidence Intervals.
The answer to this question is not that clear: First, it is important to note that this issue was not directly explored in the present study, so strong conclusions cannot be drawn from the results of this experiment. Second, some of the previous studies on children of the same age have shown lower, or about the same, levels of irony comprehension than what was observed in the present study, despite tone of voice information and illustration (e.g., Capelli et al., 1990;Zajączkowska & Abbot-Smith, 2020). For example, in their study, Zajączkowska and Abbot-Smith (2020, Exp 2) presented short video-recorded dialogues containing irony to 10-12-year-old children. Their results showed that average comprehension accuracy for ironic materials that needed social reasoning (they called this complex irony) was on average 55% with accuracies ranging between 0 and 100%. This suggests that task demands in previous studies might have varied, so it is hard to tell how much differences are due to, for example, demands of spoken vs. written modalities and how much to other task demands. Moreover, in our study, like in the study by Zajączkowska and Abbot-Smith (2020), there was large variance in children's irony comprehension accuracies. This shows that it should not be expected that children's irony comprehension ability would develop in an invariant manner. Rather these results suggest that there are large individual differences (see also Loukusa & Leinonen, 2008) and that the ability to comprehend irony develops for each child at a somewhat individual pace. As we know that there are also individual differences among adults in processing and comprehending irony (see e.g., Kaakinen et al., 2014), this is not surprising.
As mentioned, current theories of irony comprehension do not explicitly take developmental changes into account. It is possible, however, to accommodate developmental differences in theories that take individual differences into account, such as the parallel constraintsatisfaction framework (Pexman, 2008) and the predictive coding theory of irony (Fabry, 2021). These theories could be extended to make the prediction that as irony comprehension ability is still developing in children, they should show more difficulty in processing and comprehending irony than adults. In future studies, those theories could be further refined by examining which developing social, cognitive, and linguistic skills are related to children's irony processing. The costs of irony processing may be higher for children, but there is still a great deal to learn about the factors that mitigate that cost.

Conclusion
The present study explored, for the first time, children's processing and comprehension of written irony and compared children's performance to that of adults. The results showed that for both children and adults comprehending irony poses a greater cost than processing of its literal counterpart. Moreover, it seems that for 10-year-old children reading for comprehension is more demanding than it is for adults. Nonetheless, children are able to adapt to task context and improve their irony processing based on previous exposure to irony. The results also show that development of irony comprehension does not happen on a similar timeline for all children; 10-year-olds show large variance between individuals in their ability to comprehend irony. In addition, the results suggest that children who are better in comprehending irony are also faster to resolve the ironic meaning while reading. As the present study is the first study to explore children's processing and comprehension of written irony, more studies are needed to confirm and extend the results.

Data availability
The data and analysis code are made available via Open Science Framework, and the link is provided in the manuscript.          Note. z values > |1.96| are bolded.