An important insight in the domain of learning and memory is the concept of desirable difficulties (Bjork, 1994; McDaniel, & Butler, 2011). Desirable difficulties are select manipulations or learning conditions that make the initial learning process more difficult, but have the added effect of improving long-term retention. Notable examples of desirable difficulties include having participants generate information from word fragments instead of passively reading intact words (e.g., Slamecka & Graf, 1978), spacing out study sessions instead of massing them (e.g., Carpenter, 2017), and having participants engage in retrieval practice after studying instead of simply restudying the information (Kornell & Vaughn, 2016).

Desirable difficulties can also arise from the perceptual characteristics of the stimulus. It has been shown that providing individuals with words or text in a way that makes the text harder to read can lead to better long-term memory for that material (e.g., Diemand-Yauman, Oppenheimer, & Vaughan, 2011). The benefit associated with processing difficulty brought forth by changing perceptual characteristics of stimuli has been coined the disfluency effect. The disfluency effect is quite surprising as many traditional models of memory posit that encoding operates within a limited capacity channel (e.g., Atkinson & Shiffrin, 1968; Baddeley, 1981). In these models, increased demands on working memory would increase cognitive load, and therefore should be more likely to hurt rather than to help memory, especially if working memory capacity is low.

Nevertheless, the disfluency effect has been shown across a variety of perceptual disfluency manipulations. In a highly publicized paper with more than 200 citations to date, Diemand-Yauman et al. (2011) examined the effects of perceptual disfluency in both the laboratory and classroom using atypical fonts. In Experiment 1, participants studied a list of characteristics associated with space aliens in either a disfluent font (Comic Sans MS and Bodoni MT) or a more common, fluent typescript (Arial). At test, participants showed better memory on a cued-recall task for characteristics presented in a disfluent typescript compared with a fluent typescript (see French et al., 2013; Weltman & Eakin, 2014, for replications of this effect). In Experiment 2, Diemand-Yauman et al. (2011) manipulated the typescript (Monotype Corsiva, Comic Sans Italicized, and Haettenschweiler) of PowerPoint slides and handouts for a single unit in advanced placement (AP) English, history, chemistry, or physics classes. Similar to Experiment 1, the disfluent typescripts resulted in better memory than the fluent typescripts on a unit exam. Several other perceptual disfluency manipulations have shown a similar benefit: masking (i.e., presenting stimuli briefly, approximately 100 ms, and backward masking them, typically with a row of hash marks; e.g., Hirshman & Mulligan, 1991; Hirshman, Trembath, & Mulligan, 1994; Mulligan, 1996; Nairne, 1988), word inversion (Sungkhasettee, Friedman, & Castel, 2011), and high-level blurring (Rosner, Milliken, & Davis, 2015).

Theoretical mechanisms of the disfluency effect

Two explanations have been put forth to account for the memory benefit that arises from perceptual disfluency. According to a postlexical account, after the word has been recognized, the experience of disfluency functions as a subjective metacognitive cue to engage in deeper or more effortful processing of the word, and it is this processing that leads to enhanced memory (e.g., Alter, 2013; Alter, Oppenheimer, Epley, & Eyre, 2007; Diemand-Yauman et al., 2011).Footnote 1 This postlexical interpretation of the disfluency effect is related to the traditional levels-of-processing framework (Craik & Lockhart, 1972) or the more contemporary System 1 and System 2 dual-route model (Evans, 2016; Kahneman, 2011). Within these frameworks, words that are hard-to-read elicit deeper processing, or involve more System 2 processing, than easy-to-read words. It is this deeper processing that occurs postlexically, after a word has been identified, that drives the disfluency effect (Alter et al., 2007).

Alternatively, the compensatory processing account (Hirshman et al., 1994; Mulligan, 1996), which is based off the highly influential interactive-activation framework of word recognition (McClelland & Rumelhart, 1981), ties the disfluency effect directly to processes involved in word recognition itself (see Yap & Balota, 2015, for a review on visual word recognition). Within this account, attempting to identify a perceptually disfluent word evokes increased higher-level (i.e., lexical/semantic) processing to aid in word recognition. It is the increased interaction between lower and higher levels during word processing that enhances memory for disfluent stimuli. Although the accounts differ in the point at which it occurs, in both cases, some type of additional processing is thought to be responsible for engendering better memory for perceptually disfluent items.

Failed replications of the disfluency effect

Although the aforementioned suggest that disfluency effect is a veridical phenomenon, it is important to note there have been several failed demonstrations of the disfluency effect. For instance, in simple memory paradigms, perceptual disfluency manipulations such as small font size (Rhodes & Castel, 2008), low color contrast (Hirshman et al., 1994), low-level blurring (Yue, Castel, & Bjork, 2013), atypical font style (Magreehan, Serra, Schwartz, & Narciss, 2016), and hard-to-hear auditory information (Rhodes & Castel, 2009) have all failed to enhance memory retention. These failures have also been observed with more complex study materials, such as learning about the mechanisms of a toilet flush (e.g., Eitel, Kuhl, Scheiter, & Gerjets, 2014) or how a plane achieves lift off (Strukelj, Scheiter, Nyström, & Holmqvist, 2015).

When a finding fails to replicate, it is important to examine why. The fact that perceptual disfluency does not always promote positive learning outcomes raises the question as to whether disfluency is a desirable difficulty. Given the mixed evidence, it is important to systematically examine the conditions producing and not producing the disfluency effect, as this may have important theoretical implications (Oppenheimer & Alter, 2014).

The discrepant findings may reflect the precise way in which disfluency manipulations affect word processing. For instance, during word recognition, manipulations can affect lower (e.g., featural and orthographic) and higher (e.g., lexical and semantic) levels of processing. Two perceptual manipulations that have shown the disfluency effect, masking (Hirshman et al., 1994; Mulligan, 1996) and word inversion (Sungkhasettee et al., 2011), seem to evoke increased top-down, higher-level (lexical) processing during word recognition. By comparison, two lower-level perceptual manipulations failed to demonstrate the disfluency effect. Specifically, Hirshman et al. (1994, Experiment 5) showed that a low luminance manipulation (i.e., words presented in dark-gray vs. white font on a black background) resulted in processing costs, denoted by longer naming latencies, but did not confer a memory benefit. Similarly, Yue et al. (2013) used a low-level blurring manipulation across several studies and did not find a disfluency effect. Critically, luminance (Hirshman et al., 1994) and low-level blurring (Yue et al., 2013) have been shown to exert an additive influence on word recognition at lower levels of processing, but not necessarily higher levels (e.g., Gomez & Perea, 2014; Reingold & Rayner, 2006; Sanchez & Jaeger, 2015; Yap & Balota, 2007). It is possible, then, that the perceptual disfluency effect occurs only when the perceptual manipulation leads to increased higher-level processing.

The suggestion is that, given the putative theoretical mechanism(s) of the disfluency effect, one potential moderating factor of the disfluency effect is the nature of the disruption produced by the disfluency manipulation. It may be that a manipulation affecting higher levels of processing (i.e., lexical or semantic) is required to elicit the disfluency effect. For word stimuli, the perceptual disfluency manipulation must elicit increased feedback from lexical or semantic levels either during (e.g., Mulligan, 1996) or after (e.g., Alter et al., 2007) word recognition. If the disfluency effect reflects additional higher-level processing, it might explain why some designs (particularly those that rely on lower-level manipulations of disfluency) consistently fail to find the effect. This possibility is examined in the present studies by manipulating the format in which words are presented—clear type print, easy-to-read cursive, and hard-to-read cursive.

Educational realism of cursive text

It has been argued (see Sierra, 2016) that fluency manipulations used in the laboratory rarely occur in a naturalistic setting such as the classroom, making them educationally unrealistic (but see Toftness et al., 2017). One solution to this problem is to use perceptual manipulations that are more likely to occur in an educational setting. One novel perceptual disfluency manipulation that is more educationally realistic is handwritten cursive. In a classroom setting it is quite common for instructors to present students with cursive written information, either on the chalkboard/whiteboard or on the projector. Furthermore, it has been shown that students benefit more from writing and studying handwritten notes (Mueller & Oppenheimer, 2014).

So what makes cursive disfluent? Handwritten cursive differs in two keys ways from type-print: it is more ambiguous and it is nonsegmented. In cursive writing, the same nominal characters change their physical forms across contexts; very similar forms may even signal different intended characters in different contexts. When individuals write in cursive, their letters connect, creating potential problems in letter segmentation. These perceptual features create considerable difficulty in recognizing handwritten words. This is evinced by the fact that there is additional recruitment of frontal lobe areas responsible for attention when attending to handwritten stimuli (Qiao et al., 2010), as well as increased response latencies to handwritten stimuli compared with type-printed stimuli (Barnhart & Goldinger, 2010, 2015; De Zuniga, Humphreys, & Evett, 1991; Perea, Gil-López, Beléndez, & Carreiras, 2016). Given that cursive is both educationally realistic and naturally disfluent, it seems like an ideal ecologically valid manipulation to examine the disfluency effect. The question from a theoretical perspective then becomes, at what level does the disfluency produced by cursive writing occur?

The comparative difficulty associated with recognizing a word in cursive relative to type-print is thought to reflect increased higher-level processing (Barnhart & Goldinger, 2010). Indeed, De Zuniga et al. (1991, Experiments 3 and 4) and Barnhart and Goldinger (2010, 2015) provided evidence for increased higher-level contributions by showing that certain lexical and semantic effects (i.e., word frequency, imageability, regularity, bidirectional consistency, and orthographic neighborhood size) are magnified during the recognition of handwritten cursive words compared with recognition of type-printed words. This suggests a strong top-down component to recognizing cursive words.

Perea, Gil-López, et al. (2016) along with Perea, Marcet, Uixera, and Vergara-Martínez (2016) further suggested that the recognition of handwritten words is moderated by legibility, with hard-to-read cursive recruiting more top-down processing during word recognition than easy-to-read cursive. Examining RTs using quantile analyses, Perea, Gil-López, et al. observed not only shifts in the RT distribution, but also skewing (i.e., higher RTs and more errors) in the cursive conditions. However, hard-to-read cursive affected higher-level processing more—the word frequency effect was larger for hard-to-read cursive words compared with easy-to-read cursive words. These legibility differences have also been observed during normal reading. Using eye tracking, Perea, Marcet et al. (2016b) found a substantial reading cost (i.e., more fixations, longer fixations, and longer reading times) for sentences written in cursive compared with type-print. However, at the local (word) level, hard-to-read cursive words (compared with easy-to-read cursive) more strongly affected late measures of eye movements—such as total reading time and go past times—that are presumed to reflect lexical and postlexical processes.

The purpose of the current study was to examine how the processing of perceptually disfluent stimuli affects later memory using an educationally realistic fluency manipulation: cursive handwriting. In the disfluency literature, it is quite common for only one level of disfluency to be used. It is possible, then, that examining multiple levels of disfluency might present a more nuanced view of the disfluency effect. Indeed, Diemand-Yauman et al. (2011) posited that differing levels of disfluency might follow a U-shaped curve insofar that illegible or hard-to-read stimuli might lead to a memory disadvantage, while more moderate disfluency would be the best for learning. Toward this aim, Experiment 1 examined whether a disfluency effect can be obtained for cursive writing, and whether memory outcomes differ between different levels of cursive legibility (i.e., easy-to-read vs. hard-to-read). Experiments 2 and 3 replicated and extended the findings from Experiment 1.

Experiment 1

In Experiment 1, three script types (easy-to-read cursive, hard-to-read cursive, and type-print) were used to examine whether cursive produces a disfluency effect. At study, we included two objective measures of disfluency (accuracy and naming latency) to serve as a manipulation check to ensure that the cursive manipulation was disfluent. We also included a metacognitive measure of disfluency (aggregate judgments of learning [JOLs]). Aggregate JOLs required participants to give a subjective estimate of the strength of their learning associated with each script type immediately after the study phase. The inclusion of aggregate JOLs also allowed us to examine the interplay between subjective disfluency and control and regulatory processes leading to better memory (Pieger, Mengelkamp, & Bannert, 2016). Moreover, aggregate JOLs do not interfere with processing that occurs during encoding (Besken & Mulligan, 2013, 2014).

At test, an old-new recognition task assessed memory differences across the three script types.

An old-new recognition test was used as the explicit memory task because it appears to be more sensitive to the effects of disfluency (e.g., Hirshman & Mulligan, 1991; Mulligan,1996, 1998; Rosner et al., 2015), thus increasing the odds of detecting a disfluency effect, if it exists.

While extant accounts of the disfluency effect would predict an overall disfluency effect for cursive words, they would predict a different pattern of results for easy-to-read and hard-to-read cursive words. Within the compensatory processing account (Mulligan, 1996), the disfluency effect reflects increased lexical and/or semantic processing. The memory benefit for processing hard-to-read cursive words would be larger than for easy-to-read cursive words, because hard-to-read cursive not only slows down reading, it also elicits increased interactivity between lower and higher levels during word recognition (Perea, Gil-López, et al., 2016; Perea, Marcet, et al., 2016). In contrast, it is less clear what postlexical accounts would predict. On the surface, it seems that subjectively, hard-to-read cursive should be more disfluent and therefore result in better memory. But as hypothesized by Diemand-Yauman et al. (2011) “it seems possible that the influence of disfluency on retention follows a U-shaped curve, and the exact parameters of this function remain to be determined. With this in mind, the most effective disfluency manipulations would likely be those that are within the bounds of the normal variation of fonts and materials that could reasonably appear in a classroom” (p. 4). From this perspective, it could be predicted that easy-to-read cursive would produce better memory than hard-to-read cursive.

Method

Participants

Thirty undergraduate students from Iowa State University participated for course credit. All were native speakers of English, with self-reported normal or corrected-to-normal vision.

Stimuli and apparatus

Stimuli were 198 nouns drawn from the English Lexicon Project database (Balota et al., 2007). Both frequency (all words were high frequency; mean log HAL frequency = 9.2) and length (all words were four letters in length) were controlled (see https://osf.io/td5h7/ for the stimulus list).

To create the cursive stimuli, we asked seven individuals to write five sentences in cursive to be judged for penmanship. Forty students rated the level of penmanship of the seven individuals on a 1–7 Likert scale (1 = legible; 7 = illegible). The individuals with the best and worst penmanship (average score of 2.2 vs. 4.0, respectively) were chosen to write the stimuli for the experiment (see Perea, Gil-López, et al., 2016a, for a similar procedure).

The hard-to-read and easy-to-read cursive stimuli were created using a Livescribe digital pen. The writing instrument resembles a normal pen, with the addition of a small camera protruding from the tip. The camera reads a fine dot pattern on special paper, generating a digital trace of each pen stroke. The stimulus images were matched in size as closely as possible to the computer-generated words (44-point Courier New font). Figure 1 depicts an example of each of the three script types. All stimuli used for this experiment can be found on the Open Science Framework (OSF; https://osf.io/td5h7/).

Fig. 1
figure 1

Examples of type-print (left), easy-to-read cursive (center), and hard-to-read cursive (right) used across experiments

Stimuli were presented via E-Prime experimental software (Version 1.1; Schneider, Eschman, & Zuccolotto, 2002) on a Dell computer with an LCD monitor. A microphone attached to the E-Prime SR response box recorded participants’ responses and naming latencies. Participants used a keyboard to record their JOLs and the SR response box to record their old/new recognition responses.

Design and procedure

Type of script (type-print, easy-to-read cursive, and hard-to-read cursive) was manipulated within-subjects and aggregate JOLs were collected under incidental learning conditions. Participants were tested individually in a small, well-lit room, seated approximately 65 cm from the computer screen. Before the experiment, all participants were informed that they would be naming words and that some of the stimuli would appear as hard-to-read (perceptually disfluent) words and some as easy-to-read (perceptually fluent) words. Instructions made no mention of a final recognition memory test. Participants engaged in four practice trials to familiarize themselves with the procedure before the experiment proper.

A total of 198 words were presented, 99 at study (33 in each script condition) and 198 at test (99 old and 99 new). This resulted in six counterbalanced lists. Lists were assigned to participants so that across participants each word occurred equally often in the six possible conditions: type-print old, easy-to-read cursive old, hard-to-read cursive old, type-print new, easy-to-read cursive new, and hard-to-read cursive new.

Each trial began with a fixation cross appearing at the center of the screen for 1,000 ms. The fixation cross was then replaced by a word in the same location for 2,000 ms. To ensure that all items were identified and to assess whether perceptually disfluent items were indeed disfluent, participants were instructed to name each word as quickly and accurately as possible. Naming latency and actual response were recorded on each trial. After the naming response, there was a 1,000 ms blank interstimulus interval before the next trial.

After all 99 words were named, participants completed aggregate JOLs wherein they were told that 33 easy-to-read cursive, 33 hard-to-read cursive, and 33 type-print words had appeared in the list, and were asked to estimate how many words from each condition they expected to recognize on a later memory test. The order of the memory judgments (easy-to-read cursive vs. hard-to-read cursive vs type-print) was counterbalanced across participants.

After the study phase, a short 3-minute distractor task was administered in which participants wrote down as many United States capitals as they could. Afterward, participants took an old-new recognition test. At test, a fixation cross appeared in the center of the screen for 1,000 ms and was followed by a word that either had been presented during study (“old”) or had not been presented during study (“new”). Old words occurred in their original script, and following the counterbalancing procedure, each new word was presented in one of the three scripts. For each word presented, participants used the SR button box to input their responses, using a button labeled “old” to indicate that they had named the word during study, and a button labeled “new” to indicate they did not remember naming the word. Words stayed on the screen until participants gave an “old” or “new” response. All words were individually randomized for each participant during both the study and test phases. After the experiment, participants were debriefed. The entire experiment took about 30 minutes to complete.

Analysis strategy

The data for all experiments reported here are located on OSF (https://osf.io/bstfd/). For all experiments, an alpha level of .05 was used for significance testing. Cohen’s d, eta-squared (η2), and partial eta-squared (ηp2) are reported as effect-size measures. In addition, alongside traditional analyses that utilize null hypothesis significance testing, Bayes factors, calculated with the freeware software program JASP (Version 0.8.6; https://jasp-stats.org) for null findings (noted as BF01) are reported (see Jarosz & Wiley, 2014, for a review on Bayes factors). A Bayes factors of 3 or greater is indicative of strong or positive evidence in favor of the null (Jefferys, 1961).

Results and discussion

In each experiment, items with more than 40% error rates across participants were not included in any analyses. Any participant who identified fewer than 75% of the items during study was replaced. Trials associated with microphone malfunctions (trials where no audio was detected) were excluded from the latency analysis. Trials associated with naming latencies faster than 150 ms or slower than 2.5 times the standard deviation for each participant were excluded from the latency analysis.

In Experiment 1, no participants were replaced. Two items (vise and liar) were removed from all analyses (4% of the data) due to high error rates. Microphone malfunctions occurred on 8% of trials. For all analyses, the main effect of script type was further divided into two orthogonal contrasts: One assessed whether cursive produces a disfluency effect and the other whether the memory benefit was larger for easy-to-read or hard-to-read cursive.

Study phase

Naming accuracy and latency

Mean naming accuracy and latency are shown in Table 1. Examining accuracy, a one-way repeated-measures ANOVA indicated an effect of script, F(2, 52) = 56.35, p < .001, η2 = .66. Participants were less accurate when naming cursive words compared with type-print words, F(1, 29) = 74.76, p < .001, η2 = .72. Further, participants were less accurate naming hard-to-read cursive compared with easy-to-read cursive words, F(1, 29) = 42.52, p < .001, η2 = .60.

Table 1. Mean naming accuracy (in proportions), naming latency (in milliseconds), and JOLs (in proportions) as a function of script type in Experiment 1

A similar analysis of naming latency indicated an effect of script, F(2, 52) = 72.26, p < .001, η2 = .71. Participants named cursive words more slowly than type-print words, F(1, 29) = 107.92, p < .001, η2 = .79. Further, hard-to-read cursive words were named more slowly than easy-to-read cursive words were, F(1, 29) = 22.41, p < .001, η2 = .44. Together, the naming accuracy and latency results confirm that the cursive stimuli were perceptually disfluent, with hard-to-read cursive being most disfluent.

JOLs

Two participants were excluded from the JOL analyses because they did not provide JOLs for each script type. Results indicated a significant effect of script, F(2, 54) = 16.07, p < .001, η2 = .37. Participants predicted that they would recognize fewer cursive words than type-print words, F(1, 27) = 7.45, p = .011, η2 = .22. Further, participants predicted that they would recognize fewer hard-to-read cursive words than easy-to-read cursive words, F(1, 27) = 39.42, p < .001, η2 = .59.

Test phase

Conditionalized analysis

Recognition memory was calculated only for words that were correctly named during encoding. Full ANOVA results for the unconditionalized d' analysis across each experiment is presented in Appendix A. It is important to note that the unconditionalized analysis closely matched our conditionalized analysis. The mean proportion of “old” responses across item type and script type are presented in Fig. 2. Memory sensitivity (d') values are displayed in Fig. 3 as a function of script type. A one-way repeated-measures ANOVA on d' values indicated a significant effect of script type, F(2, 58) = 5.36, p = .007, η2 = .16. Performance was better for cursive compared to type-print words, F(1, 29) = 7.86, p = .009, η2 = .21 There was no significant difference between easy-to-read cursive and hard-to-read cursive words, F(1, 29) = 2.15, p = .153, η2 = .07, BF01 = 1.96, however, the Bayes factor indicates only weak support for the null hypothesis.

Fig. 2
figure 2

Mean proportion of “old” responses for Experiment 1. Error bars reflect the standard error of the mean, corrected for between-subject variability (Morey, 2008)

Fig. 3
figure 3

Memory sensitivity (d') as a function of script. Error bars reflect the within-subject standard error of the mean (Morey, 2008)

Experiment 1 examined whether cursive script, a type of educationally relevant perceptual disfluency, would produce a disfluency effect. The results were clear: cursive words were better remembered than type-print words, indicating that cursive script serves as a desirable difficulty. There was less clarity when considering theoretical explanations of the disfluency effect because there was no recognition memory difference between easy-to-read and hard-to-read cursive. This is problematic for both the postlexical and compensatory processing accounts of disfluency. While hard-to-read cursive was objectively and subjectively more disfluent, the additional disfluency did not produce better recognition memory. The compensatory processing account would predict that hard-to-read cursive words should produce better memory, while the postlexical account would predict better memory for easy-to-read cursive. Experiment 2 examined whether the pattern of results in Experiment 1 can be replicated, and if so, whether factors other than disfluency may be contributing.

Experiment 2

As noted earlier, the disfluency effect is thought to arise as a result of deeper processing given to perceptually disfluent items. However, it has previously been noted that disfluency is confounded with distinctiveness (e.g., Diemand-Yauman et al., 2011; Rummer et al., 2016). That is, not only are cursive words disfluent compared with type-printed words, but they are also unusual. Thus, it could be that the memory benefit for cursive words is due to perceptual distinctiveness and not additional higher-level processing driven by disfluency.

Most studies examining the disfluency effect have used mixed designs in which a single list contains two levels of memoranda—a fluent level and disfluent level (e.g., Hirshman et al., 1994; Mulligan, 1996; Nairne, 1988; Sungkhasettee et al., 2011). In a mixed list, individuals may use a naïve (subjective) theory about what they believe requires more processing, and employ a strategy that involves preferential processing of the more disfluent stimuli over the more fluent stimuli, resulting in better memory for the former. It may be that only when disfluent words occur among fluent words, and thus stand out as distinctive, do they receive the additional processing that produces better memory.

One way to test this is by manipulating the design of the lists. If primary distinctiveness is removed as a cue by presenting fluent versus disfluent items in blocked (as opposed to mixed) fashion, distinctiveness cannot be used to support differential processing, and the disfluency effect should be attenuated. By contrast, if a disfluency effect is still observed when the possibility of using distinctiveness as a cue is minimized by mixing the presentation of items (as in Experiment 1), this would support an account based on deeper processing. Footnote 2

In Experiment 2, two groups were compared: a mixed-list group and a blocked-list group. The mixed-list group is an exact procedural replication of Experiment 1. We expected to replicate the disfluency effect for cursive words but were unsure as to whether a memory difference between hard-to-read and easy-to-read cursive would be found. Additionally, if list composition is indeed important, then an interaction should be observed between script (easy-to-read, hard-to-read, and type-print) and list-type (mixed vs. blocked). The presence of an interaction would indicate that visual distinctiveness may contribute to the disfluency effect.

Method

Participants

A one-tailed power analysis was conducted (G*Power; Faul, Erdfelder, Lang, & Buchner, 2007) using the effect size obtained from the comparison between cursive and type-print in Experiment 1 (dz = 0.46), and indicated that 31 participants were needed to detect an effect with 80% power. Given the six counterbalanced lists, 36 participants were assigned to each group and received course credit for participating.

Materials, procedure, and design

The materials were identical to Experiment 1, and the procedure for the mixed-list group was identical. In the blocked-list group the presentation order of items during study was randomized by participant, with the constraint that all type-print, easy-to-read cursive, and hard-to-read cursive were shown in separate blocks. Block order was counterbalanced across participants. Blocks were separated by a short break, in which the participants were told to take a break before proceeding to the next block. Thus, list type (blocked vs. mixed) was manipulated as a between-subjects variable and script (type-print vs. easy-to-read cursive vs hard-to-read cursive) as a within-subjects variable. During the recognition test for both groups, all items were intermixed randomly for each individual.

Results and discussion

No participants were replaced due to low accuracy. Three items (4% of the data; 2% in the blocked group and 2% in the mixed group) had to be discarded due to low accuracy (noon, liar, and flux). Microphone errors occurred on 11% of the trials.

Study phase

Naming accuracy and naming latency

Mean naming accuracy and latency are shown in Table 2. These were submitted to 2 × 3 mixed-factor ANOVAs, with list type (blocked vs. mixed) as a between-subjects variable and script (type-print vs. easy-to-read cursive vs hard-to-read cursive) as a within-subjects variable.

Table 2. Mean naming accuracy (in proportions), naming latency (in milliseconds), and JOLs (in proportions) as a function of script type and list type in Experiment 2

The accuracy analysis (Greenhouse–Geisser corrected) revealed a main effect for script, F(2, 140) = 179.97, p < .001, ηp2 = .72. As in Experiment 1, participants were less accurate naming cursive words compared with type-print words, F(1, 70) = 230.00, p < .001, ηp2 = .77, and also less accurate naming hard-to-read cursive words compared with easy-to-read cursive words, F(1, 70) = 147.09, p < .001, ηp2 = .68. The main effect for list type and the interaction between script and list type were not significant, both Fs < 1.50, both ps > .10, and BF01 > 3.

The latency analysis (Greenhouse–Geisser corrected) revealed a significant effect for script type, F(2, 140) = 215.16, p < .001, ηp2 = .76. Cursive words were named more slowly than type-print words, F(1, 70) = 312.18, p < .001, ηp2 = .82, and hard-to-read cursive words were named more slowly than easy-to-read cursive words, F(1, 70) = 77.29, p < .001, ηp2 = .53. There was no main effect of list type, F(1, 70) = .541, p = .464, η2p = .01, BF01 = 4.18. There was a marginal interaction between list type and script, F(2, 140) = 2.58, p = 0.079, ηp2 = .04, with somewhat faster naming of type-print in the blocked list. Together, the naming accuracy and latency confirm that the cursive stimuli were perceptually disfluent, with hard-to-read cursive being most perceptually disfluent.

JOLs

Mean aggregate JOLs are shown in Table 2. One participant was removed in this analysis for not providing JOLs for each script type. The JOLs analysis revealed a main effect of script, F(2, 138) = 66.53, p < .001, ηp2 = .49. As in Experiment 1, participants predicted that they would recognize fewer cursive words compared to type-print words, F(1, 69) = 68.04, p < .001, η2 = .50, and that they would recognize fewer hard-to-read cursive compared to easy-to-read cursive words, F(1, 69) = 65.00, p < .001, η2 = .49. The main effect of list type and the interaction between list type and script were not significant, both Fs < 0.913, ps > .34, and BFs01 > 2.03. The model with script type was weakly preferred over a model with list type (BF = 2.03), but strongly preferred over a model with the interaction between script and list type (BF = 21.34).

Test phase

Conditionalized analysis

The mean proportions of “old” responses across item type and script type are presented in Fig. 4. Memory sensitivity (d') values are displayed in Fig. 5. An analysis of d' was conducted using the same 2 × 3 mixed-factor ANOVA described earlier. The analysis revealed a main effect of script, F(2, 140) = 4.92, p = .009, ηp2 = .07. Recognition memory for cursive words was better than for type-print, F(1, 70) = 7.52, p = .008, η2 = .10. As in Experiment 1, memory for easy-to-read cursive was numerically higher than for hard-to-read cursive, but the difference was not significant, F(1, 70) = 1.95, p = .167, ηp2 = .03, BF01 = 3.01. The main effect of list type and the interaction between list type and script were not significant, both Fs < 0.13, ps > .72, BF01 > 3. The model with script type was preferred over a model with list type (BF = 3.10) and to a model with the interaction between script and list type (BF = 35.34).

Fig. 4
figure 4

Mean proportion of “old” responses for Experiment 2. Error bars reflect the within-subject standard error of the mean (Morey, 2008)

Fig. 5
figure 5

Memory sensitivity (d') as a function of list type and script in Experiment 2. Error bars reflect the within-subject standard error of the mean (Morey, 2008)

The results of Experiment 2 showed the same pattern found in Experiment 1 for all measures, and there was no effect of mixed versus blocked lists. Cursive words were more disfluent than type-print words, with hard-to-read cursive being the most disfluent. Cursive stimuli were also better recognized than type-print were. There was no significant difference between memory for easy-to-read and hard-to-read cursive, although we note that, once again, the easy-to-read stimuli showed the numerically highest memory.

The statistically indistinguishable disfluency effect for blocked and mixed lists suggests that perceptual distinctiveness is not driving the cursive disfluency effect. Ruling out an alternative explanation provides indirect support for an explanation in which depth of processing induced by the difficulty of processing at encoding produces the memory benefit. The results are in accord with other studies demonstrating that distinctiveness plays only a small role in the disfluency effect (Hirshman et al., 1994; Sungkhasettee et al., 2011). However, it remains possible that the mnemonic benefit of cursive words is due to encoding surface-level features that aid later recognition memory. In this study, context was reinstated at test (i.e., cursive items named at study were presented in cursive at test). This makes it difficult to completely rule out a memory benefit based on visuospatial characteristics. To test this in future research, one could present words aurally at test and follow this with a source monitoring test that asks participants to identify the format of the words studied at test (see Yue et al., 2013, Experiment 2b). This would control for visual-spatial characteristics serving as superior memory cues.

Experiments 1 and 2 showed that cursive words can act as a desirable difficulty for an immediate recognition memory test. In order to be a true desirable difficulty, however, an effect must persist across longer retention intervals. Indeed, some desirable difficulties (e.g., spacing) become more robust after longer delays (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). Experiment 3 assessed whether the disfluency effect persists across a longer retention interval of 24 hours.

Experiment 3

A major limitation in the research on perceptual disfluency is the use of short retention intervals (RIs). In simple list-learning paradigms, the disfluency effect is commonly examined without a significant time delay between study and test (~3 min; e.g., Hirshman et al., 1994; Sungkhasettee et al., 2011; Yue et al., 2013). Immediate performance on a test, however, is not an accurate indicator of long-term retention (Bjork, 1994). Thus, it is important to examine the disfluency effect across more educationally realistic RIs.

In an educational context, the time lapse between study and test could be hours, days, or weeks. Other desirable difficulties that promote learning (e.g., testing and spacing) persist and become more robust across longer RIs, spanning weeks (Carpenter, Pashler, Wixted, & Vul, 2008) or months (Carpenter, Pashler, & Cepeda, 2009; Larsen, Butler, & Roediger, 2009). Using more complex materials, a few studies have examined the disfluency effect across more ecologically valid RIs (e.g., Diemand-Yauman et al., 2011; French et al., 2013; Weltman & Eakin, 2014). The longest retention interval was used by Diemand-Yauman et al. (2011) when they tested the effects of disfluency by putting PowerPoint presentations in a hard-to-read font. In their Experiment 2, the final test occurred weeks or months after participants were exposed to the material. However, subsequent research has not been able to replicate Diemand-Yauman et al.’s initial findings (Rummer et al., 2016), leaving open the question of whether the disfluency effect persists over time, or is a short-lived phenomenon. In Experiment 3, we examined whether the cursive disfluency effect can extend to a more educationally realistic 24-hour RI.

Method

Participants

Thirty-six undergraduate students from Iowa State University participated for course credit. All participants were native speakers of English and self-reported normal or corrected-to-normal vision.

Materials, procedure, and design

The materials and procedure were identical to Experiment 1, except that the length of the RI was extended to 24 hours. After the study phase, individuals were told to come back 24 hours later for the second part of the experiment. Participants were not debriefed until the end of the second session.

Results and discussion

Two participants were replaced due to accuracy less than 75%. One word (noon) was discarded due to error rates greater than 40%, resulting in the exclusion of 4% of the data. Microphone errors occurred on 7% of the trials.

Study phase

Naming accuracy and naming latency

Naming accuracy and latency are shown in Table 3. Examining accuracy, the ANOVA (Greenhouse–Geisser corrected) indicated a main effect of script, F(2, 70) = 79.19, p < .001,η2 = .69. Participants were less accurate naming cursive words compared with type-print words, F(1, 35) = 78.46, p < .001, η2 = .69, and less accurate naming hard-to-read cursive words than easy-to-read cursive words, F(1, 35) = 80.13, p < .001, η2 = .70.

Table 3. Mean naming accuracy (in proportions), naming latency (in milliseconds), and JOLs (in proportions) as a function of script type in Experiment 3

Examining naming latency, the analysis (Greenhouse–Geisser corrected) revealed a significant effect of script, F(2, 68) = 128.09, p < .001, η2 = .79. Participants took longer to name cursive words compared with type-print words, F(1, 34) = 171.18, p < .001, η2 = .83, and took longer to name hard-to-read cursive words than easy-to-read cursive words, F(1, 34) = 46.50, p < .001, η2 = .58.Footnote 3

JOLs

Mean JOLs are shown in Table 3. The ANOVA (Greenhouse–Geisser corrected) indicated a significant effect of script, F(2, 70) = 27.59, p < .001, η2 = .44. Participants predicted that they would recognize fewer cursive words than type-printed words, F(1, 35) = 18.79, p < .001, η2 = .35, and predicted that they would recognize fewer hard-to-read cursive words compared with easy-to-read cursive words, F(1, 35) = 37.20, p < .001, η2 = .52.

Overall, consistent with the results of the previous two experiments, the cursive stimuli used in Experiment 3 were objectively and subjectively disfluent, with hard-to-read cursive being most disfluent.

Test phase

Conditionalized analysis

The mean proportion of “old” responses across item type and script type are presented in Fig. 6. Memory sensitivity (d') for each script type is displayed in Fig. 7. As would be expected with a 24-hour delay, accuracy was lower than in previous experiments. The ANOVA on d' values indicated a significant effect of script type, F(2, 70) = 9.21, p < .001, η2 = .21. Participants had better recognition memory for cursive than type-print words, F(1,35) = 14.60, p < .001, η2 = .29. There was no reliable difference between easy-to-read cursive words and hard-to-read cursive words, F(1, 35) = 2.51, p = .122, η2 = .07, BF01 = 1.79.

Fig. 6
figure 6

Mean proportion of “old” responses for Experiment 3. Error bars reflect the standard error of the mean, corrected for between-subject variability (Morey, 2008)

Fig. 7
figure 7

Memory sensitivity (d') as a function of script after 24 hours. Error bars reflect the within-subject standard error of the mean (Morey, 2008)

The results from Experiment 3 are straightforward. Even after 24 hours, studying cursive words had a desirable effect on memory. There was not a reliable memory difference between easy-to-read cursive and hard-to-read cursive words. Similar to Experiments 1 and 2, there was a trend for easy-to-read cursive words to be better remembered than hard-to-read cursive words. Overall, it appears the cursive disfluency effect persists over longer delays.

Meta-analysis of Experiments 1–3

In all three experiments, presenting words in cursive for 2 s served as a desirable difficulty. That is, cursive words were better remembered than type-print words were. Furthermore, in all reported experiments, there was a tendency for easy-to-read cursive words to be better remembered than hard-to-read cursive words. In Experiments 1, 2, and 3, while the difference was not significant, the Bayes analyses did not support a strong conclusion that there was no difference. What is interesting is that this pattern is predicted by a postlexical account, but not by the compensatory processing account.

Given the theoretical importance of this comparison, a meta-analysis was done to examine the true nature of this effect.Footnote 4 Because meta-analyses pool evidence from multiple studies, they are thought to be a more powerful indicator of true effects (McShane & Böckenholt. 2017). In light of this and recent recommendations (e.g., Braver, Thoemmes, & Rosenthal, 2014; Cumming, 2014; Lakens & Etz, 2017; McShane & Böckenholt), a small-scale meta-analysis in R was conducted using the metafor package (Viechtbauer, 2010).Footnote 5 In addition to examining the effect size between easy-to-read and hard-to-read cursive, an overall meta-analytic effect size is computed for the difference between easy-to-read cursive and type print and hard-to-read cursive and type-print. For each comparison, a random effects model integrated effect sizes across Experiments 1, 2, and 3 (k = 3, N = 144).

The average effect sizes and the confidence intervals (CIs) for each comparison of interest are reported in Fig. 8. The meta-analysis revealed that the disfluency effect was present with easy-to-read cursive, Estimate = 0.47, SE = 0.10, z = 4.48, p < .001, and hard-to-read cursive, Estimate = 0.28, SE = 0.08, z = 3.29, p < .001. Further, easy-to-read cursive had a larger effect on memory than did hard-to-read cursive, Estimate = 0.20, SE = 0.08, z = 2.43, p < .05. The magnitude of the effects fluctuated across studies. However, given that they were within the CIs, the fluctuations are likely due to sampling error (Cumming, 2014). Indeed, the heterogeneity of each effect of interest was not statistically significant (see Fig. 8). Overall, the meta-analysis showed a disfluency effect for both easy-to-read and hard-to-read cursive words and further that the difference between easy-to-read and hard-to-read cursive words was small, but nonzero, with easy-to-read cursive words producing better memory.

Fig. 8
figure 8

Meta-analysis of Experiments 1, 2, and 3. a Standardized mean change between easy-to read cursive and type-print. b Standardized mean change between hard-to-read cursive and type, print. c Standardized mean change between easy-to-read cursive and hard-to-read cursive

General discussion

In the memory literature, an unexpected way to improve retention of material is to make the material more disfluent by changing the perceptual characteristics to make it hard to read. The afforded mnemonic benefit is called the disfluency effect (Diemand-Yauman et al., 2011). Research on perceptual disfluency has yielded equivocal results, so it is unclear whether perceptual disfluency could function as an educationally relevant desirable difficulty. A search for potential boundary conditions of the disfluency effect could clarify the situation (Oppenheimer & Alter, 2014).

The current experiments examined the influence of perceptual disfluency on recognition memory using cursive handwriting, an educationally relevant disfluency manipulation. We examined under what conditions cursive does and does not constitute a desirable difficulty. To this end, level of disfluency (i.e., legibility) was manipulated along with list composition and retention interval. All three experiments examined (1) whether there was an overall mnemonic benefit of cursive and (2) whether the memory benefit varied as a function of legibility of the cursive. Two types of cursive were chosen (easy-to-read and hard-to-read) because while both slow down processing and affect lexical processing, they do so to varying degrees. Hard-to-read cursive words take longer to recognize and are more error prone than easy-to-read cursive words, suggesting that hard-to-read cursive words exert a stronger influence on lexical and semantic processing (Perea, Gil-López, et al., 2016a; Perea, Marcet, et al., 2016b). In all three experiments, presenting words in cursive produced a disfluency effect. This did not differ as a function of list type (Experiment 2) or retention interval (Experiment 3). Corroborating this pattern of results, a small-scale meta-analysis showed a significant disfluency effect for easy-to-read (d = .46) and hard-to-read cursive (d = .24) words. This finding supports the hypothesis that perceptual disfluency can produce a disfluency effect. In the context of research showing that low-level perceptual disfluency like blurring (Yue et al., 2013) and luminance (Hirshman et al., 1994) does not produce a disfluency effect, we concluded that the perceptual disfluency manipulation must elicit an increase in higher-level processing to engender a disfluency effect. Further, the meta-analysis showed better memory for easy-to-read compared to hard-to-read cursive words. It appears that all perceptually disfluent stimuli are not created equal. The discovery that both easy-to-read and hard-to-read cursive can produce better memory, but that the effect for easy-to-read cursive is larger in magnitude, suggests that how disfluency is manipulated matters. These results provide important insights into the mechanisms behind the disfluency effect.

Mechanisms of the disfluency effect

Extant accounts of the disfluency effect place the phenomenon at one of two processing loci. According to postlexical accounts (e.g., Alter et al., 2007), the disfluency effect occurs after word recognition has taken place as a result of metacognitive control processes that enhance memory by way of deeper or more effortful processing. Alternatively, the compensatory processing account (e.g., Mulligan, 1996) posits that the disfluency effect arises during word recognition as a result of increased interaction between the letter level and word/semantic level. Both accounts predict that, as was found, naming words in cursive should produce better memory than naming words in type-print. However, they predict different pattern of results in regard to easy-to-read and hard-to-read cursive words. The compensatory processing account would predict a larger memory benefit for hard-to-read cursive compared to easy-to-read cursive due to the fact that hard-to-read cursive engenders greater top-down processing than easy-to-read cursive (Perea, Gil-López, et al., 2016a; Perea, Marcet, et al., 2016b). Contrary to this, in all three experiments and the meta-analysis, there was a memory advantage for easy-to-read cursive over hard-to-read cursive. Postlexical accounts, on the other hand, would posit that the results should follow a U-shaped curve (Diemand-Yauman et al., 2011), which was observed.

While the current results do provide some evidence for the post-lexical account of disfluency, there are some issues within the current instantiation of the theory that do not align well with the present findings. First, within a postlexical framework, misattribution as a result of experiential processing is thought to engender the mnemonic benefit of disfluency (Alter & Oppenheimer, 2009). That is, disfluency experienced during processing is used as a cue to allocate extra resources to the task. In the current experiments, participants were warned ahead of time about the varying levels of disfluency. Given this, they should have discounted disfluency as a valid cue (e.g., Oppenheimer, 2008; Oppenheimer & Frank, 2008). Despite this, a disfluency effect still held. Second, the postlexical account of disfluency is thought to be dependent on subjective evaluations of disfluency, which guide monitoring and control processes (Pieger et al., 2016). In the current experiments, hard-to-read cursive words were judged as subjectively more disfluent, but were not better remembered, while easy-to-read cursive words were judged as less disfluent, but were better remembered. Thus, rather than metamemory judgements guiding performance, it is possible that processing disfluency and not perceived disfluency mediates better recognition memory performance. Supporting this possibility, Sanchez and Jaeger (2015) showed that reading times of a passage (on-line measure of difficulty) presented in an atypical font mediated performance whereas ratings of difficulty (i.e., perceived difficulty) did not. A simple metacognitive theory of disfluency is not tenable to explain the current pattern of results.

Both theoretical accounts of disfluency effects face challenges in explaining the current pattern of findings. Cognitive load theory (Sweller, van Merrienber, & Paas, 1998), a different theoretical framework not developed to explain disfluency, could be useful. Within this framework, there are three types of cognitive load: intrinsic, extraneous, and germane. Intrinsic cognitive load is influenced by the inherent difficulty associated with the material that is to be learned. For example, low-frequency (infrequent) words are harder to read than high-frequency (frequent) words. Intrinsic cognitive load cannot be altered by the researcher. By contrast, extraneous cognitive load is influenced by how the information is presented and can be manipulated by the researcher. Cursive would influence extraneous cognitive load, as the only difference between words is what script it is in. Germane cognitive load is associated with the processes undertaken to process material at a deeper level. This type of cognitive load is thought to be beneficial for learning. Learning is thought to occur when processing capacity outweighs cognitive load. In our experiments, it is possible that easy-to-read cursive imposed a moderate level of extraneous load compared to hard-to-read cursive words, thereby allowing more beneficial types of cognitive load (germane) to facilitate learning.

Another way to consider the enhancement of memory due to perceptual disfluency is to place it within the conflict monitoring framework of cognitive control (Rosner et al., 2015). In the conflict monitoring framework (Botvinick, Braver, Barch, Carter, & Cohen, 2001), the up regulation and down regulation of attentional resources is modulated by the amount of conflict experienced during processing. Attentional control is carried out in the brain via activation of the anterior cingulate cortex (ACC), which detects conflict during processing. The activation of the ACC engenders increased activation of the dorsolateral prefrontal cortex—an area important for sustained or selective attention. The conflict monitoring model has been primarily used within the context of response inhibition (e.g., Stroop task) and the detection of errors.

Applied to the current results, it is possible that during the course of word processing, perceptual disfluency increases response ambiguity (i.e., not knowing what the perceptual input is), which in turn activates the ACC and increases cognitive control mechanisms leading to a memory benefit. Thus, the disfluency effect arises from an interaction between the quality of the sensory input that affects word recognition and the utilization of postlexical resources. With handwritten cursive, there appears to be an encoding cost that arises during word recognition itself. This disfluency experienced during word recognition serves a metacognitive cue to engage in increased postlexical processing, leading to the overall memory benefit found. However, because cognitive control mechanisms are finite (e.g., working memory and attentional control; Evans & Stanovich, 2013), if the word is too hard to recognize, more resources will be utilized (see Geller, Still, & Morris, 2015), leading to a smaller memory benefit. Thus, according to the conflict-monitoring framework, if something is too ambiguous or hard to recognize, especially under time constraints, memory for that particular stimulus might be hindered, leading to poorer memory. Indeed, two recent studies (Seufert, Wagner, & Westphal, 2017; Weissgerber & Reinhard, 2017) showed that stimuli that are too disfluent (e.g., words presented in Haettenschweiler, 15% grayscale, 14-pt. font, or that are scrambled) produce poorer memory than minimally disfluent stimuli. In other words, perceptual disfluency seems most beneficial to memory when the perceptual disfluency manipulation does not require too many cognitive resources. Future research should more explicitly test the role of cognitive control in disfluency.

Conclusion

We believe that the current research makes important contributions to research on disfluency and desirable difficulties. The findings, which show the importance of type and level of disfluency, help to illuminate inconsistencies in the literature. Whether or when disfluency is desirable depends upon how disfluent the manipulation is. Thus, the disfluency effect is not as straightforward as placing something in a hard-to-read font. The conflict monitoring framework provides a useful explanation for the obtained disfluency effect. However, if extant theories were modified to take into account difficulty that arises during word recognition and the interaction with cognitive control mechanisms involved in memory formation, they could also account for the findings.

For the first time, cursive writing has been shown to act as a desirable difficultly, with better immediate and delayed recognition memory performance for cursive words than for type-printed words. However, there is an important caveat: cursive cannot be too disfluent. This aligns well other findings in the learning and memory domain suggesting a “sweet spot” for efficacious learning. To achieve the most desirable results, learning cannot be too easy, nor too hard. For example, when spacing out study, individuals benefit more from lags that are neither too short nor too long (see Cepeda, Vul, Rohrer, Wixted, & Pashler, 2008, for an optimal lag-to-final-test ratio). Further, during study, it is most beneficial for individuals to study moderately difficult items as opposed to well-learned or difficult items (region of proximal learning; Kornell & Metcalfe, 2006).

As a recommendation to researchers, we feel it is important to ensure that the perceptual manipulation used be sufficiently strong to make encoding difficult, but not so strong that it consumes too many attentional resources needed for postlexical processing. To control for this, researchers should employ objective behavioral measures of difficulty (e.g., RTs and accuracy). Alternatively, more fine-grained methods such as eye tracking or mathematical modeling (e.g., distributional analyses) could be employed to examine levels affected by a perceptual manipulation (Balota & Yap, 2011; Reingold & Rayner, 2006). Future research should examine other moderating factors of the disfluency effect, and whether the cursive disfluency effect generalizes under more educationally realistic conditions, such as various testing formats and different encoding instructions.