The contribution of metamemory beliefs to the font size effect on judgments of learning: Is word frequency a moderating factor?

Tian Fan; Jun Zheng; Xiao Hu; Ningxin Su; Yue Yin; Chunliang Yang; Liang Luo

doi:10.1371/journal.pone.0257547

Abstract

Previous studies found that metamemory beliefs dominate the font size effect on judgments of learning (JOLs). However, few studies have investigated whether beliefs about font size contribute to the font size effect in circumstances of multiple cues. The current study aims to fill this gap. Experiment 1 adopted a 2 (font size: 70 pt vs. 9 pt) * 2 (word frequency (WF): high vs. low) within-subjects design. The results showed that beliefs about font size did not mediate the font size effect on JOLs when multiple cues (font size and WF) were simultaneously provided. Experiment 2 further explored whether WF moderates the contribution of beliefs about font size to the font size effect, in which a 2 (font size: 70 pt vs. 9 pt, as a within-subjects factor) * 2 (WF: high vs. low, as a between-subjects factor) mixed design was used. The results showed that the contribution of beliefs about font size to the font size effect was present in a pure list of low-frequency words, but absent in a pure list of high-frequency words. Lastly, a meta-analysis showed evidence supporting the proposal that the contribution of beliefs about font size to the font size effect on JOLs is moderated by WF. Even though numerous studies suggested beliefs about font size play a dominant role in the font size effect on JOLs, the current study provides new evidence suggesting that such contribution is conditional. Theoretical implications are discussed.

Citation: Fan T, Zheng J, Hu X, Su N, Yin Y, Yang C, et al. (2021) The contribution of metamemory beliefs to the font size effect on judgments of learning: Is word frequency a moderating factor? PLoS ONE 16(9): e0257547. https://doi.org/10.1371/journal.pone.0257547

Editor: Alessandra S. Souza, University of Zurich, SWITZERLAND

Received: January 15, 2021; Accepted: September 6, 2021; Published: September 20, 2021

Copyright: © 2021 Fan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data and analysis code for the current study are available online in OSF (https://osf.io/d894v/).

Funding: This study was supported by the National Natural Science Foundation of China [grant number 31671130, 32171045], http://www.nsfc.gov.cn/, and L. L. is the funding recipient for this study. This study was also supported by the National Natural Science Foundation of China [grant number 32000742], http://www.nsfc.gov.cn/, and CL. Y. is the funding recipient for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Judgments of learning (JOLs) refer to people’s metacognitive predictions regarding the likelihood of remembering studied items on a later memory test, which is an important form of metamemory monitoring [1]. For decades, researchers have focused on how JOLs are constructed [2], and the documented findings suggested that JOLs are inferential in nature and are based on a variety of available cues [3]. An important cue is font size [4–7]. The font size effect on JOLs refers to a well-established phenomenon that items presented in a large font size receive higher JOLs than those presented in a small font size, although font size has little influence on recall performance, reflecting a dissociation between JOLs and memory [8].

Dozens of studies have been conducted to unravel the mechanisms underlying the font size effect on JOLs. According to the dual-basis model of metacognitive judgments [3,9], JOLs are based on both nonanalytic, experience-based cues (e.g., processing fluency) and analytic, theory-based cues (e.g., metamemory beliefs). In the former process, participants apply a nonanalytic heuristic that yields an immediate “feeling of knowing”. In the latter process, participants entail an analytic, deliberate, and conscious deduction. In terms of the font size effect on JOLs, large font size might increase JOLs through experience-based processing fluency (i.e., large items are perceived more fluently than small ones). In addition, large font size might increase JOLs through theory-based processes (i.e., metamemory beliefs that large items are easier to remember than small ones).

Recent studies suggested that theory-based processes may dominate the font size effect on JOLs [10]. For example, Mueller et al. [6] asked a group of participants to make item-by-item JOLs for words presented in either 48 pt (large) or 18 pt (small) font size. Another group of participants was instructed to make global predictions for large and small words in a belief questionnaire. The results showed that item-by-item JOLs were higher for large words than for small ones. Along the same lines, the results from the belief questionnaire showed that participants hold a priori belief that large words are easier to remember than small ones, suggesting that such beliefs may contribute to the font size effect on JOLs. In addition, Mueller et al. directly evaluated the contribution of fluency to the font size effect through two measures: the response times (RTs) in a lexical decision task and the self-regulated study time. Specifically, the lexical decision task measures the time it takes for a participant to decide whether the on-screen letter string is a word or non-word. Half of the words and non-words were presented in small font and the other half were presented in large font. The results showed that RTs for large and small items did not differ significantly. In another experiment, participants were given unlimited time to study large and small words, and self-regulated study time was measured. The results showed that study time did not differ significantly between large and small items. Both measures demonstrated no differences in processing fluency between large and small words, suggesting that processing fluency tends to be irresponsible for the font size effect on JOLs. Based on these findings, Mueller et al. [6] claimed that font size affects JOLs mainly through metamemory beliefs. Following studies employed different methods to further examine whether beliefs about font size contribute to the font size effect on JOLs [4,11,12].

To directly delineate the relationship between JOLs and beliefs, Hu et al. [4] conducted a participant-level regression analysis, in which the difference in JOLs between large and small words (difference in JOLs) was regressed on the difference in belief-based predictions between these categories of stimuli (difference in beliefs). The results showed that the difference in beliefs successfully predicted the difference in JOLs across participants, which supports the belief hypothesis underlying the font size effect. Several recent studies implemented multilevel linear regression models to examine whether beliefs about font size contribute to the font size effect on JOLs [11,12]. For example, Su et al. [11] applied a multilevel moderation analysis and demonstrated that the effect of font size on JOLs was significantly moderated by beliefs about font size. In summary, many studies have provided convergent evidence supporting the important role of metamemory beliefs in the font size effect.

Compatible with dual-process model, Mueller and colleagues proposed the analytic-processing theory, which provides a unique and more detailed account regarding how beliefs may influence JOLs, but does not reject the possible contribution of fluency [13,14]. According to the analytic-processing theory, when people are instructed to make judgments about future memory performance, they engage in an analytic problem-solving mode in which they attempt to reduce uncertainty by searching for cues that may be related to memory. If they can develop a plausible explanation for why a cue may influence memory or retrieve a previously developed explanation from long-term memory, then they will use beliefs about the cue when constructing JOLs. If not, then other factors will drive JOLs (most notably the difference in processing fluency that may arise as people study each item) [14].

The analytic-processing theory emphasizes the contribution of beliefs to JOL formation. In studies which manipulated only single factor such as font size [6] or pair type [14], the variability across items was easy to be detected. Take the font size effect as an example: people presumably notice that some words are presented in large size and others shown in small size, which in turn stimulates them to retrieve their pre-existing beliefs regarding how font size relates to memory performance (e.g., large words are easier to remember than small ones). People then apply such beliefs to make JOLs. However, in natural learning situations, learners frequently encounter multiple cues rather than a single cue. When dealing with multiple cues in making JOLs, people might give different weight to different cues [15]. Will participants engage in the analytic process in which they retrieve a specific belief about how one certain cue influences memory to make JOLs in circumstance of multiple cues? The current study targets to investigate this question. Below, we will briefly summarize previous evidence about whether and how multiple cues affect metacognitive judgments.

Over last years, considerable evidence regarding the impact of various cues on metacognitive judgments has accumulated. By comparison, the question about whether and how multiple cues combine to affect metacognitive judgments has received less attention [16]. Undorf and colleagues [15] are arround the first to explore whether multiple cues jointly affect JOLs. They systematically investigated whether participants integrate multiple extrinsic and intrinsic cues in JOLs. They varied two extrinsic cues (font size and number of study presentations) in Experiment 1, and found that participants integrated both cues in their JOLs. In Experiment 2, they demonstrated that participants could integrate two intrinsic cues (concreteness and emotionality) in their JOLs. When manipulating all four factors simultaneously in Experiment 3, Undorf and colleagues observed that participants could integrated all four cues in their JOLs. Finally, Experiment 4 manipulated font size, concreteness, and emotionality in a continuum rather than in two easily distinguishable levels, and the results showed successful integration of these three cues in JOLs. In conclusion, Undorf and colleagues provided important findings suggesting that participants have a remarkable capacity to integrate multiple cues to construct JOLs.

Later, Undorf and Bröder demonstrated that cue integration was more likely due to strategic integration of multiple cues, rather than reliance on a single unified feeling of ease [17]. In their study, concreteness and emotionality simultaneously acted on both pre-study JOLs (i.e., JOLs that are made before encoding each item, generally reflecting one’s metamemory beliefs) and immediate item-by-item JOLs (i.e., JOLs that are made immediately following encoding each item). These findings imply that metamemory beliefs may contribute to JOL formation in situations of multiple cues (see [17], p. 640).

Another important study of Price and Harrison simultaneously manipulated multiple cues (font size & item relatedness) and explored the bases of JOLs [18]. They collected pre-study JOLs, immediate JOLs, and the combination of both types of JOLs to explore the bases of JOL formation (beliefs or fluency). Their results revealed that, compared to pre-study JOLs, immediate JOLs demonstrated a larger effect of relatedness and a smaller effect of font size. In addition, immediate JOLs showed greater alignment with recall performance, as reflected by higher relative accuracy for immediate JOLs than that for pre-study JOLs in all three experiments. As pre-study JOLs are mainly based on metamemory beliefs [2,6], the difference between pre-study JOLs and immediate JOLs implies that immediate JOLs are likely based on other factors, besides metamemory beliefs.

The above-discussed studies [17,18] tend to suggest that beliefs may contribute to JOL formation in circumstances of multiple cues. However, these studies suffer from limitations in research methods. Based on the comparison between pre-study JOLs and immediate JOLs, researchers can only indirectly conjecture whether or not beliefs contribute to JOLs in circumstances of multiple cues. There is no direct evidence regarding whether beliefs actually contribute to JOL formation in such situations. In other words, concurrently observing that pre-study and standard JOLs vary in the same direction between different levels of a given factor (e.g., large and small font size) is insufficient to conclude that beliefs contribute to JOL construction. Recent studies advocate directly testing whether beliefs (i.e., pre-study JOLs) statistically mediate that factor’s effect on standard JOLs [2,19]. Hence, the current study aims to employ mediation analysis to measure a certain belief’s contribution to JOL formation under circumstances of multiple cues.

The current study focused on the contribution of a priori beliefs about font size to the font size effect on JOLs. Apart from font size, the current study chose word frequency (WF) as another manipulated factor. WF is an inherent characteristic of words and is easy to be manipulated. Moreover, WF exerts robust effects on memory and metamemory judgments. For instance, WF strongly relates to the feeling of familiarity [20]. High-frequency words are perceived as more familiar and are easier to be retained in working memory [21–23]. Previous studies also demonstrated that JOLs are sensitive to WF. Fiacconi and Dollois conducted a meta-analysis to investigate the effect of WF on JOLs [24]. They found a reliable effect of WF on JOLs through integrating results across 17 experiments, with high-frequency words receiving higher JOLs than low-frequency ones. As WF is a reliable factor affecting both memory and metamemory judgments, it is reasonable to manipulate it as another factor in the current study.

As both the dual-process model and the analytic-processing theory assume that beliefs about font size contribute importantly to the font size effect when font size is the only manipulated factor [4,6,11], the current study hypothesized that beliefs about font size may contribute to the font size effect even in circumstance of multiple cues. However, it is also reasonable to expect a minimal role of beliefs about font size in the font size effect. For instance, when dealing with multiple cues (e.g., font size and WF) to make JOLs, participants might assign higher weight to WF than to font size, perhaps because WF is considered more diagnostic of future recall performance than font size. Participants might then pay less attention to font size and consequently might not take pains to retrieve a previously developed explanation about how font size influences memory performance. Therefore, it is possible that beliefs about font size do not mediate the font size effect on JOLs in circumstance of multiple cues.

These two alternative hypotheses were tested in Experiment 1, in which a 2 (font size: large vs. small) * 2 (WF: high vs. low) within-subjects design was employed. WF was manipulated within-subjects in order to serve as the additional cue for JOLs. In this circumstance of multiple cues, whether participants’ pre-existing beliefs about font size contribute to the font size effect was investigated. In Experiment 2, WF was changed as a between-subjects variable, which would no longer serve as an apparent cue for JOLs. Specifically, one group of participants studied a pure list of high-frequency words, and another group of participants studied a pure list of low-frequency words. According to the analytic-processing theory, with font size as the single within-subjects manipulated factor, participants could easily detect font size as an available cue for JOLs and their pre-existing beliefs about font size would be activated, which in turn drives JOLs. If this were true, beliefs about font size would contribute to the font size effect on JOLs in both high-frequency and low-frequency conditions. However, one may doubt if results would differ in high-frequency and low-frequency conditions, as high-frequency and low-frequency words naturally differ in ease of encoding and association to one another [22,23]. In this way, the current study further asks if WF per se moderates the contribution of beliefs about font size to the font size effect on JOLs.

Experiment 1

The purpose of Experiment 1 was to investigate whether beliefs about font size contribute to the font size effect on JOLs in circumstance of multiple cues. In Experiment 1, the procedure was adapted from that of Hu et al.’s Experiment 2 [4]. On the first day, participants made belief-based predictions regarding the relationship between font size and memory in a questionnaire. Twenty-four hours later, they studied large (70 pt) and small (9 pt) words and provided item-by-item JOLs. Participants were not informed about the manipulation of WF.

Method

Participants.

A power analysis was conducted using G*Power to determine the required sample size [25]. According to the effect sizes of the font-size effect documented in previous studies (η_p² ranging from 0.13 to 0.50 [7]), 6–24 participants are required to obtain a significant (α = 0.05) font-size effect at 95% power. Statisticians suggest that when multilevel analysis is performed, the data should be collected from at least 30 participants with more than 30 trials for each participant [26]. In this way, 32 students (25 women; mean age = 21.19, SD = 1.89) were recruited from Beijing Normal University (BNU). Each participant was tested individually in a sound-proofed cubicle, provided written consent, and received 25 RMB as compensation. Experiments 1 and 2 were approved by the Ethics Committee at the BNU Collaborative Innovation Center of Assessment for Basic Education Quality.

Design & materials.

A 2 (font size: large vs. small) * 2 (WF: high vs. low) within-subjects design was used. The principal stimuli consisted of 42 two-character concrete Chinese words extracted from the Chinese word database developed by Cai and Brysbaert [27]. Two words were used for practice and were excluded from data analyses. The remaining 40 words were used in the formal experiment, of which 20 were high-frequency words (with WF ranging from 35.29 to 363.23 per million) and the other 20 were low-frequency words (with WF ranging from 0.06 to 1.28 per million). High-frequency (mean WF = 130.14, SD = 101.28) and low-frequency (mean WF = 0.67, SD = 0.38) words differed significantly in WF, t(19.00) = 5.72, p < .001, Cohen’s d = 1.81.

High-frequency words were randomly divided into two sets, with 10 words in each set. One set was presented in 9 pt (small size), with the other set presented in 70 pt (large size). Set assignment to the font size conditions was counterbalanced across participants. The two sets of high-frequency words did not differ significantly in WF or number of strokes (ps > .20). Low-frequency words were also randomly divided into two sets, with one set presented in large font and the other in small font, and set assignment was counterbalanced across participants. The two sets of low-frequency words did not differ significantly in WF or number of strokes (ps > .80). In summary, for each participant, there were 10 high-frequency words in 70 pt, 10 high-frequency words in 9 pt, 10 low-frequency words in 70 pt, and 10 low-frequency words in 9 pt.

Procedure.

The experiment took place over two consecutive days and consisted of two tasks: (a) a belief judgment task and (b) a study-test task. On the first day, participants undertook a belief judgment task similar to that in Hu et al.’s Experiment 2 [4]. They read the descriptions of a study-test task and saw the rectangles that represented 70 pt and 9 pt font. They were instructed to imagine that they were attending that task and were asked to estimate the numbers of large and small words (out of 20) that they would be able to remember on a later memory test. The order of estimates for large and small words was counterbalanced across participants. This belief questionnaire is delivered one day before the learning task in order to get a pure measure of participants’ a priori beliefs about how font size influences memory performance.

Twenty-four hours later, participants returned to the same laboratory room and took the study-test task. During the study phase, participants studied high-frequency and low-frequency words one-by-one, and these words were presented in either 70 pt or 9 pt. The 40 words were presented in a pseudorandom order with no more than 3 words in the same font size or WF presented consecutively. Each trial began with a blank white screen presented for 500 ms, followed by a word presented at the center of a white screen for 4000 ms. Immediately following the presentation of each word, participants were asked to make a JOL on a scale ranging from 0 (Sure I will not remember it) to 100 (Sure I will remember it). They typed their JOLs into the computer. There was no time limitation for participants to make JOLs. After typing their JOLs, participants pressed ENTER and the next trial initiated.

Following the study phase, a distractor phase was initiated, during which participants were instructed to solve as many arithmetic problems as they could in 3 min. Finally, participants undertook a free recall test, wherein they recalled as many words as they could in 5 min and typed their answers into the computer. No feedback was offered during the test. The experimental procedure is shown in Fig 1.

Download:

Fig 1. Experimental procedure for Experiment 1.

https://doi.org/10.1371/journal.pone.0257547.g001

Results and discussion

Effects of font size (and WF) on belief-based predictions, JOLs, and recall performance.

Table 1 summarizes Means and SDs of JOLs and test performance as functions of font size and WF. Predictions in the belief judgment task and test performance were transformed into percentages. For JOLs, there were several erroneous inputs (e.g., greater than 100) in the current and following experiments, which were treated as missing data. In Experiments 1 and 2, the missing rates (< 0.54%) of JOL data were minimal.

Download:

Table 1. Means (SDs) for JOLs and recall performance in Experiment 1.

https://doi.org/10.1371/journal.pone.0257547.t001

In the questionnaire, participants predicted that they would remember more large words (M = 53.28, SD = 13.83) than small ones (M = 38.75, SD = 13.68), t(31) = 10.89, p < .001, Cohen’s d = 1.92. In the study-test task, a repeated-measures analysis of variance (ANOVA), with font size (large vs. small) and WF (high vs. low) as the within-subjects independent variables and JOLs as the dependent variable, showed that participants gave higher JOLs to large words (M = 56.63, SD = 15.91) than to small ones (M = 51.08, SD = 15.73), F(1,31) = 12.92, p = .001, η_p² = 0.29. In addition, JOLs were higher for high-frequency words (M = 65.47, SD = 14.68) than for low-frequency words (M = 42.18, SD = 17.89), F(1,31) = 116.22, p < .001, η_p² = 0.79. There was no significant interaction between font size and WF, F(1,31) = 0.12, p = .74, η_p² = 0.004.

For recall performance, a 2 (font size: large vs. small) * 2 (WF: high vs. low) repeated-measures ANOVA was conducted. The results showed that participants recalled more high-frequency words (M = 55.00, SD = 15.66) than low-frequency ones (M = 31.72, SD = 12.61), F(1,31) = 130.17, p < .001, η_p² = 0.81. Recall performance did not differ between large (M = 44.53, SD = 16.13) and small words (M = 42.19, SD = 14.08), F(1,31) = 0.73, p = .40, η_p² = 0.02. There was no significant interaction between font size and WF, F(1,31) = 1.77, p = .19, η_p² = 0.05.

Individual-level analysis of the effects of font size and WF on JOLs.

An individual-level analysis focusing on cue utilization was conducted [15]. Participants were coded as reliably basing JOLs on font size if their JOLs were higher for large words than for small ones. Likewise, participants were coded as reliably basing their JOLs on WF if their JOLs were higher for high-frequency words than for low-frequency ones. Meanwhile, effect sizes were also taken into consideration for reliable cue effects in these expected directions and Cohen’s d ≥ 0.20 for small effects is used as a criterion [28].

Sixteen participants (50%) revealed ds ≥ 0.20 for both cues, which is indicative of reliably basing JOLs on both font size and WF. Fourteen participants (43.75%) mainly focused on WF, as indicated by d ≥ 0.20 for the WF effect and d < 0.20 for the font size effect. One participant (3.13%) mainly focused on font size, as indicated by d ≥ 0.20 for the font size effect and d < 0.20 for the WF effect. One participant (3.13%) revealed ds < 0.20 for both cues.

Contribution of beliefs about font size to the font size effect on JOLs.

The main research interest of the current study was to explore whether beliefs about font size contribute to the font size effect on JOLs in circumstance of multiple cues. To explore this question, a multilevel mediation analysis was conducted to explore whether and to what extent the font size effect on JOLs was mediated by beliefs about font size. In addition, Bayes Factor (BF) was computed using the statistical software program JASP [29] to estimate the strength of the evidence for non-significant effects.

Although the belief-based predictions remained the same within the same level of font size, there were still variations for beliefs across cue levels for each participant. Thus, Belief was treated as a variable at the item level in the current multilevel models [19]. In addition, WF was added as an item-level moderator and a moderated mediation analysis was conducted.

As a single observation (e.g., an outlier) can have a substantial influence on the results of a regression analysis [30], it is important to detect influential observations before conducting the regression analysis. In the current research, difference in fits (DFFITS) were used to identify influential data points. DFFITS measures how much an observation affects its fitted value from the regression model. It is a widely-used quantitative measure to detect outliers, better than scatterplots in assessing data points [31]. The R olsrr package was used to depict DFFITS to figure out the influential observations (see analysis code online: detect outliers.R). In Experiment 1, three participants were detected as outliers and were hence excluded from further multilevel mediation analysis. For the remaining 29 participants, Fig 2 shows the relationship between font-size effects on beliefs and JOLs [4].

Download:

Fig 2. The relationship between font-size effects on beliefs and JOLs in Experiment 1.

The font-size effect on beliefs is measured as the difference in belief-based predictions between large and small words. The font-size effect on JOLs is calculated as the difference in mean JOLs between large and small words. Each point represents one individual participant.

https://doi.org/10.1371/journal.pone.0257547.g002

Experiment 1 used the R lme4, lmerTest, and mediation packages to conduct multilevel regression and mediation analyses (Level 1: items; Level 2: participants) [32–34]. Missing data were excluded listwise from multilevel mediation analyses in the current and following experiments. The font size of 70 pt was coded as 1 and 9 pt was coded as 0. Meanwhile, high-frequency words were coded as 1, and low-frequency words were coded as 0. Font Size and Belief were group-mean-centered. Belief was regressed on Font Size in the first model. The random slope of Font Size was excluded because adding this random slope made the model fail to converge [19,35]. In the second model, JOL was regressed on Font Size, Belief, WF, and the interaction between WF and Belief. A random intercept together with the random slopes of Font Size and Belief were included. The coefficients are shown in Table 2, wherein a represents the effect of Font Size on Belief; b represents the effect of Belief on JOL; c’ represents the direct effect of Font Size on JOL when other variables were controlled.

Download:

Table 2. Results from the multilevel mediation models in Experiments 1 and 2.

https://doi.org/10.1371/journal.pone.0257547.t002

The R mediation package was used to conduct mediation analyses [34]. As shown in Table 2, the indirect effect of font size on JOLs through beliefs about font size was not statistically significant when WF was controlled, and the interaction between WF and beliefs was non-significant either, implying that WF did not moderate the contribution of beliefs about font size to the font size effect on JOLs in this experiment (see data and analysis code online).

In order to explore the reliability of the non-significant result about the contribution of beliefs about font size to the font size effect, the Bayesian linear regression was conducted. The difference in belief-based predictions between large and small words (difference in beliefs) was used as the predictor variable. The difference in mean JOLs between large and small words (difference in JOLs) was used as the outcome variable. All the default prior settings in JASP was used [29]. BF₁₀ is a measure of the fit of the data under the alternative model relative to the fit under the null model. Larger BF₁₀ values reflect more support for the alternative model versus the null model. Its inverse, BF₀₁ = 1/BF₁₀, indicates the strength of the evidence for the null model versus the alternative model. In Experiment 1, the data are 2.86 times more likely under the null model compared to a regression model including difference in beliefs (BF₀₁ = 2.86).

In summary, when WF was manipulated within-subjects in order to serve as an additional cue, the results revealed that participants’ beliefs about font size did not mediate the font size effect on JOLs. This result was inconsistent with findings from previous studies [4,11]. We re-analysed data from two previous studies: Hu et al.’s Experiment 2 and Su et al.’s Experiment 2b. Similar to our study, they measured participants’ pre-existing beliefs about font size on the first day. On the second day, participants took a study-test task in which JOLs were measured. Su et al.’s study focused on the simultaneous contributions of ease of learning judgments (EORs) and pre-existing beliefs about font size on JOLs in their Experiments 2a and 2b. In Su et al.’s Experiment 2b, EORs were given after participants had studied and made JOLs, which eliminated the possible influence of EORs on JOLs. Meanwhile, the study time was fixed to 5s per item rather than self-paced. As Su et al.’s Experiment 2b was more similar with our study, we only re-analysed their Experiment 2b data.

Our re-analyses of both studies showed that the indirect effect of font size on JOLs through beliefs about font size was statistically significant (as shown in S1 Table). As the experimental procedure in the current study is similar to those used in previous two studies, we look in details about possible difference in study materials. The median WF in Hu et al.’s study was 20.24 per million words, and was 2.97 per million words in Su et al.’s study. In our Experiment 1, the median WF was 89.00 per million words for high-frequency words and was 0.71 per million words for low-frequency words. Compared to high-frequency words used in the current study, study materials used in both Hu et al.’s and Su et al.’s studies could be categorized as low-frequency words. More importantly, WF was controlled in a small range in their studies and WFs were highly similar for words presented in small or large sizes. In a word, both Hu et al.’s and Su et al.’s studies support the idea that beliefs about font size contribute to the font size effect in a relatively pure list of low-frequency words. However, in the current experiment, WF was manipulated within-subjects. In this circumstance of multiple cues (font size & WF), participants’ pre-existing beliefs about font size contributed minimally to the font size effect on JOLs.

As far as we know, the current study should be the first to provide evidence suggesting that people’s pre-existing beliefs about font size contributed little to the font size effect in circumstance of multiple cues. Compared with results from previous studies, it is possible that the manipulation of WF provided participants with another robust cue for inferencing JOLs, which could have been used as the more dominant cue for JOLs. Participants might then pay less attention to font size and consequently might not take pains to retrieve their beliefs about font size to make JOLs.

In Experiment 2, WF was changed as a between-subjects variable, which would no longer serve as an apparent cue to influence JOLs. Specifically, one group of participants studied a pure list of high-frequency words, and another group of participants studied a pure list of low-frequency words. In each group, WF was controlled in a relatively small range. With font size as a single within-subjects factor, Experiment 2 aimed to figure out if beliefs about font size contribute to the font size effect in both high-frequency and low-frequency conditions. As high-frequency words and low-frequency words naturally differ in semantic characteristics [22,23], Experiment 2 further investigated whether WF per se is a moderating factor of the contribution of beliefs about font size to the font size effect.

Experiment 2

In Experiment 2, the key question was whether WF moderates the contribution of beliefs about font size to the font size effect when it is manipulated between-subjects.